8000 ArangoSearch: Upgrade to Snowball 2 (additional stemmer languages) by Simran-B · Pull Request #10973 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

ArangoSearch: Upgrade to Snowball 2 (additional stemmer languages) #10973

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 18, 2020

Conversation

Simran-B
Copy link
Contributor
@Simran-B Simran-B commented Jan 24, 2020

Scope & Purpose

Upgrade Snowball dependency and add/enable stemming support for additional languages:

  • Arabic (ar)
  • Basque (eu)
  • Catalan (ca)
  • Danish (da)
  • Greek (el)
  • Hindi (hi)
  • Hungarian (hu)
  • Indonesian (id)
  • Irish (ga)
  • Lithuanian (lt)
  • Nepali (ne)
  • Romanian (ro)
  • Serbian (sr)
  • Tamil (ta)
  • Turkish (tr)

Example:

RETURN TOKENS("αυτοκινητουσ πρωταγωνιστούσαν", "text_el")
// [ [ "αυτοκινητ", "πρωταγωνιστ" ] ]

Related Information

Testing & Verification

Added js test to verify new stemmer is available and working

.gitignore not added
Note: This gets copied over to build dir, replacing the otherwise perl-generated file
This makes the new languages available as Analyzers, e.g. 'text_el' for Greek
Fix/add generators, add algorithms, remove modules_utf8 stuff (Snowball 2 has a single modules.txt only)

TODO: remove fallback in gen_stem macro?
@Simran-B Simran-B added 1 Feature 9 WIP 3 Search IResearch / Fulltext index / Analyzers 3 ArangoSearch Views labels Jan 24, 2020
@Simran-B Simran-B added this to the devel milestone Jan 24, 2020
@Simran-B Simran-B self-assigned this Jan 24, 2020
@Dronplane Dronplane requested a review from gnusi March 18, 2020 07:35
@Dronplane
Copy link
Contributor

@Dronplane Dronplane marked this pull request as ready for review March 18, 2020 07:35
@Dronplane
Copy link
Contributor

@gnusi gnusi merged commit b14a7c8 into devel Mar 18, 2020
@gnusi gnusi deleted the feature/snowball-2 branch March 18, 2020 11:27
@Simran-B
Copy link
Contributor Author

Docs PR: arangodb/docs#374

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Feature 3 ArangoSearch Views 3 Search IResearch / Fulltext index / Analyzers 9 WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0