8000 Snowball 2: Update supported languages by Simran-B · Pull Request #374 · arangodb/docs · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Snowball 2: Update supported languages #374

Merged
merged 7 commits into from
Mar 19, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions 3.5/aql/functions-arangosearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,14 @@ Match documents where the attribute at **path** is greater than (or equal to)
*low* and *high* can be numbers or strings (technically also `null`, `true`
and `false`), but the data type must be the same for both.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression):
the path of the attribute to test in the document
- **low** (number\|string): minimum value of the desired range
Expand Down Expand Up @@ -406,6 +414,14 @@ is processed by a tokenizing Analyzer (type `"text"` or `"delimiter"`) or if it
is an array, then a single token/element starting with the prefix is sufficient
to match the document.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression): the path of the attribute to compare
against in the document
- **prefix** (string): a string to search at the start of the text
Expand Down
8 changes: 8 additions & 0 deletions 3.5/aql/operations-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,14 @@ are supported:
- `!=`
- `IN` (array or range), also `NOT IN`

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

```js
FOR doc IN viewName
SEARCH ANALYZER(doc.text == "quick" OR doc.text == "brown", "text_en")
Expand Down
43 changes: 37 additions & 6 deletions 3.5/arangosearch-analyzers.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ attributes:
- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).

### Norm

Expand All @@ -147,7 +147,7 @@ attributes:
- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).
- `accent` (boolean, _optional_):
- `true` to preserve accented characters (default)
- `false` to convert accented characters to their base characters
Expand Down Expand Up @@ -194,16 +194,13 @@ An Analyzer capable of breaking up strings into individual words while also
optionally filtering out stop-words, extracting word stems, applying
case conversion and accent removal.

Stemming support is provided by
[Snowball](https://snowballstem.org/){:target="_blank"}.

The *properties* allowed for this Analyzer are an object with the following
attributes:

- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).
- `accent` (boolean, _optional_):
- `true` to preserve accented characters
- `false` to convert accented characters to their base characters (default)
Expand Down Expand Up @@ -281,3 +278,37 @@ Name | Type | Language
`text_ru` | `text` | Russian
`text_sv` | `text` | Swedish
`text_zh` | `text` | Chinese

Supported Languages
-------------------

Analyzers rely on [ICU](http://site.icu-project.org/){:target="_blank"} for
language-dependent tokenization and normalization. The ICU data file
`icudtl.dat` that ArangoDB ships with contains information for a lot of
languages, which are technically all supported.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](release-notes-known-issues35.html#arangosearch).
{% endhint %}

Stemming support is provided by [Snowball](https://snowballstem.org/){:target="_blank"},
which supports the following languages:

Code | Language
------|-----------
`de` | German
`en` | English
`es` | Spanish
`fi` | Finnish
`fr` | French
`it` | Italian
`nl` | Dutch
`no` | Norwegian
`pt` | Portuguese
`ru` | Russian
`sv` | Swedish
`zh` | Chinese
1 change: 1 addition & 0 deletions 3.5/release-notes-known-issues35.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ ArangoSearch
| **Date Added:** 2018-12-03 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Using a loop variable in expressions within a corresponding SEARCH condition is not supported <br> **Affected Versions:** 3.4.x, 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#318](https://github.com/arangodb/backlog/issues/318){:target="_blank"} (internal) |
| **Date Added:** 2019-06-25 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** The `primarySort` attribute in ArangoSearch View definitions can not be set via the web interface. The option is immutable, but the web interface does not allow to set any View properties upfront (it creates a View with default parameters before the user has a chance to configure it). <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** N/A |
| **Date Added:** 2019-11-06 <br> **Component:** ArangoSearch <br> **Deployment Mode:** Cluster <br> **Description:** There is a possibility to get into deadlocks during Coordinator execution if a custom Analyzer was created (and is present in the `_analyzers` system collection). It is recommended not to use custom Analyzers in production environments in affected versions. <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** 3.5.3 <br> **Reference:** [arangodb/backlog#651](https://github.com/arangodb/backlog/issues/651){:target="_blank"} (internal) |
| **Date Added:** 2020-03-19 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Operators and functions in `SEARCH` clauses of AQL queries which compare values such as `>`, `>=`, `<`, `<=`, `IN_RANGE()` and `STARTS_WITH()` neither take the server language (`--default-language`) nor the Analyzer locale into account. The alphabetical order of characters as defined by a language is thus not honored and can lead to unexpected results in range queries. <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#679](https://github.com/arangodb/backlog/issues/679){:target="_blank"} (internal) |

AQL
---
Expand Down
16 changes: 16 additions & 0 deletions 3.6/aql/functions-arangosearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,14 @@ Match documents where the attribute at **path** is greater than (or equal to)
*low* and *high* can be numbers or strings (technically also `null`, `true`
and `false`), but the data type must be the same for both.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression):
the path of the attribute to test in the document
- **low** (number\|string): minimum value of the desired range
Expand Down Expand Up @@ -438,6 +446,14 @@ is processed by a tokenizing Analyzer (type `"text"` or `"delimiter"`) or if it
is an array, then a single token/element starting with the prefix is sufficient
to match the document.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression): the path of the attribute to compare
against in the document
- **prefix** (string): a string to search at the start of the text
Expand Down
8 changes: 8 additions & 0 deletions 3.6/aql/operations-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,14 @@ are supported:
- `!=`
- `IN` (array or range), also `NOT IN`

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

F438 ```js
FOR doc IN viewName
SEARCH ANALYZER(doc.text == "quick" OR doc.text == "brown", "text_en")
Expand Down
42 changes: 36 additions & 6 deletions 3.6/arangosearch-analyzers.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ attributes:
- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).

### Norm

Expand All @@ -148,7 +148,7 @@ attributes:
- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).
- `accent` (boolean, _optional_):
- `true` to preserve accented characters (default)
- `false` to convert accented characters to their base characters
Expand Down Expand Up @@ -215,16 +215,13 @@ An Analyzer capable of breaking up strings into individual words while also
optionally filtering out stop-words, extracting word stems, applying
case conversion and accent removal.

Stemming support is provided by
[Snowball](https://snowballstem.org/){:target="_blank"}.

The *properties* allowed for this Analyzer are an object with the following
attributes:

- `locale` (string): a locale in the format
`language[_COUNTRY][.encoding][@variant]` (square brackets denote optional
parts), e.g. `"de.utf-8"` or `"en_US.utf-8"`. Only UTF-8 encoding is
meaningful in ArangoDB.
meaningful in ArangoDB. Also see [Supported Languages](#supported-languages).
- `accent` (boolean, _optional_):
- `true` to preserve accented characters
- `false` to convert accented characters to their base characters (default)
Expand Down Expand Up @@ -367,3 +364,36 @@ Name | Type | Language
`text_ru` | `text` | Russian
`text_sv` | `text` | Swedish
`text_zh` | `text` | Chinese

Supported Languages
-------------------

Analyzers rely on [ICU](http://site.icu-project.org/){:target="_blank"} for
language-dependent tokenization and normalization. The ICU data file
`icudtl.dat` that ArangoDB ships with contains information for a lot of
languages, which are technically all supported.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](release-notes-known-issues36.html#arangosearch).
{% endhint %}

Stemming support is provided by [Snowball](https://snowballstem.org/){:target="_blank"},
which supports the following languages:

Code | Language
------|-----------
`de` | German
`en` | English
`es` | Spanish
`fi` | Finnish
`fr` | French
`it` | Italian
`nl` | Dutch
`no` | Norwegian
`pt` | Portuguese
`ru` | Russian
`sv` | Swedish
1 change: 1 addition & 0 deletions 3.6/release-notes-known-issues35.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ ArangoSearch
| **Date Added:** 2018-12-03 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Using a loop variable in expressions within a corresponding SEARCH condition is not supported <br> **Affected Versions:** 3.4.x, 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#318](https://github.com/arangodb/backlog/issues/318){:target="_blank"} (internal) |
| **Date Added:** 2019-06-25 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** The `primarySort` attribute in ArangoSearch View definitions can not be set via the web interface. The option is immutable, but the web interface does not allow to set any View properties upfront (it creates a View with default parameters before the user has a chance to configure it). <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** N/A |
| **Date Added:** 2019-11-06 <br> **Component:** ArangoSearch <br> **Deployment Mode:** Cluster <br> **Description:** There is a possibility to get into deadlocks during Coordinator execution if a custom Analyzer was created (and is present in the `_analyzers` system collection). It is recommended not to use custom Analyzers in production environments in affected versions. <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** 3.5.3 <br> **Reference:** [arangodb/backlog#651](https://github.com/arangodb/backlog/issues/651){:target="_blank"} (internal) |
| **Date Added:** 2020-03-19 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Operators and functions in `SEARCH` clauses of AQL queries which compare values such as `>`, `>=`, `<`, `<=`, `IN_RANGE()` and `STARTS_WITH()` neither take the server language (`--default-language`) nor the Analyzer locale into account. The alphabetical order of characters as defined by a language is thus not honored and can lead to unexpected results in range queries. <br> **Affected Versions:** 3.5.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#679](https://github.com/arangodb/backlog/issues/679){:target="_blank"} (internal) |

AQL
---
Expand Down
1 change: 1 addition & 0 deletions 3.6/release-notes-known-issues36.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ ArangoSearch
| **Date Added:** 2018-12-03 <br> **Component:** ArangoSearch <br> **Deployment Mode:** Cluster <br> **Description:** Score values evaluated by corresponding score functions (BM25/TFIDF) may differ in single-server and cluster with a collection having more than 1 shard <br> **Affected Versions:** 3.4.x, 3.5.x, 3.6.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#508](https://github.com/arangodb/backlog/issues/508){:target="_blank"} (internal) |
| **Date Added:** 2018-12-03 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Using a loop variable in expressions within a corresponding SEARCH condition is not supported <br> **Affected Versions:** 3.4.x, 3.5.x, 3.6.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#318](https://github.com/arangodb/backlog/issues/318){:target="_blank"} (internal) |
| **Date Added:** 2019-06-25 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** The `primarySort` attribute in ArangoSearch View definitions can not be set via the web interface. The option is immutable, but the web interface does not allow to set any View properties upfront (it creates a View with default parameters before the user has a chance to configure it). <br> **Affected Versions:** 3.5.x, 3.6.x <br> **Fixed in Versions:** - <br> **Reference:** N/A |
| **Date Added:** 2020-03-19 <br> **Component:** ArangoSearch <br> **Deployment Mode:** All <br> **Description:** Operators and functions in `SEARCH` clauses of AQL queries which compare values such as `>`, `>=`, `<`, `<=`, `IN_RANGE()` and `STARTS_WITH()` neither take the server language (`--default-language`) nor the Analyzer locale into account. The alphabetical order of characters as defined by a language is thus not honored and can lead to unexpected results in range queries. <br> **Affected Versions:** 3.5.x, 3.6.x <br> **Fixed in Versions:** - <br> **Reference:** [arangodb/backlog#679](https://github.com/arangodb/backlog/issues/679){:target="_blank"} (internal) |

AQL
---
Expand Down
16 changes: 16 additions & 0 deletions 3.7/aql/functions-arangosearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -260,6 +260,14 @@ Match documents where the attribute at **path** is greater than (or equal to)
*low* and *high* can be numbers or strings (technically also `null`, `true`
and `false`), but the data type must be the same for both.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression):
the path of the attribute to test in the document
- **low** (number\|string): minimum value of the desired range
Expand Down Expand Up @@ -438,6 +446,14 @@ is processed by a tokenizing Analyzer (type `"text"` or `"delimiter"`) or if it
is an array, then a single token/element starting with the prefix is sufficient
to match the document.

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues35.html#arangosearch).
{% endhint %}

- **path** (attribute path expression): the path of the attribute to compare
against in the document
- **prefix** (string): a string to search at the start of the text
Expand Down
8 changes: 8 additions & 0 deletions 3.7/aql/operations-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,14 @@ are supported:
- `IN` (array or range), also `NOT IN`
- `LIKE` (introduced in v3.7.0), also `NOT LIKE`

{% hint 'warning' %}
The alphabetical order of characters is not taken into account by ArangoSearch,
i.e. range queries in SEARCH operations against Views will not follow the
language rules as per the defined Analyzer locale nor the server language
(startup option `--default-language`)!
Also see [Known Issues](../release-notes-known-issues37.html#arangosearch).
{% endhint %}

```js
FOR doc IN viewName
SEARCH ANALYZER(doc.text == "quick" OR doc.text == "brown", "text_en")
Expand Down
Loading
0