-
Notifications
You must be signed in to change notification settings - Fork 852
[Question] ArangoSearch "==" same as phrase() when "text_en" analyzer used? #7488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @dvstans. Can you please provide your data where results are supposed to be equal? Generally, |
@KVS85 Given a document where 'field' contains: "some words: apple red bike letter." running the two queries with "red" as the value results in a match to this document for both; however, if I run the queries with "apple" as the value, only query #2 (using phrase function) matches the document. This behavior is independent of which indexed field I use and also word order in the field. I thought perhaps my index was corrupt, but after rebuilding it this behavior persisted. I have seen this selectivity with different words as well - not just "apple". :) |
@dvstans Thank you for clarification. Now I see that everything works as expected here. Actually, in Therefore, since "text_en" analyzer use stemming, these queries are not identical for different words. For "apple" stemmed value is "appl" while for "red" it's still "red". In order to make these queries similar (for a single word), you can use the following approach:
The Please notice also that possibility of search on indexed data using specific analyzer depends on whether this data was indexed with it. By default, only "identity" analyzer is applied. |
@KVS85 Ah OK! I didn't really understand the distinction between "==" and phrase() when wrapped in an analyzer, so this makes sense now. Thanks! |
@KVS85 Do you think your explanation is needed/useful for the documentation? |
I'm unsure whether the two following ArangoSearch queries should be equivalent (where the "something" field has been indexed using the "text_en" analyzer):
for i in myview search analyzer( i.field == "something", "text_en") return i
for i in myview search analyzer( phrase( i.field, "something"), "text_en" ) return i
In my testing, I've found cases where these two return the same result, but then I've also found cases where they do not. In the cases where they differ, the first form fails to return a match, but the second does. More specifically, the second form using "phrase" always returns what I expect, but the first form occasionally does not (depending on the value of "something"). If these are supposed to be equivalent, I can work up a test case.
The text was updated successfully, but these errors were encountered: