-
Notifications
You must be signed in to change notification settings - Fork 852
Sorting in SEARCH by TFIDF or BM25 in combination with LIMIT no longer works correctly #14427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Does not work: Result: 22 elements
|
Work: Result: 40 elements
|
Hi @admtech, Could you please check whether below query still shows only 22 results when manually disabling the optimizer rule
|
hmm, this is a productive system. Can I turn this on and off without the end users noticing. Do I need to restart ArangoDB after the change? There runs the complete https://administrator.de page. What is the exact command in the console? off: on: |
I am still waiting for an answer? |
Hi @admtech, Please note that there are not guaranteed SLA for ArangoDB Community Support. Optimizer rules can either be:
In order to run the Turning off an optimizer rule can be done by prefixing with a
Once that is done, you can run:
in order to explain the query with the given disabled optimizer rule. Similarly, you can execute the query with the disabled optimizer rule:
I am wondering whether the correct result set is returned once optimizer rule |
Hi @maxkernbach I think you are confusing something fundamental here. You are not helping me with my bug, I am helping you to find the bugs so you can fix them. Working out the issue was a lot of work and took my time, so it makes absolutely no sense to counter with SLA now (besides, one of your employees asked me to do that). I thought it is in your interest to fix such basic bugs in your database as soon as possible. Since this worked fine in the previous version, I don't think I made a mistake.
I will try it out tonight. Thanks for the explanation. |
So I have now made a few queries via the shell: Test 1:
Result:
Test 2:
Result:
Interestingly, I now have a different result than before. But still a wrong result (result: 38), without the additional sorting with a non-existing field "doc.blabla DESC" (result: 40). Now let's put the Optimizer Rule to sleep:
on again now:
Now we have one result more (result 39). When called again, there are 38 results again!? here the output from the explain:
Any other ideas? Were you able to identify the bug? Is there anything else I can do for the ArangoDB team? |
So that the question does not arise, that possibly not enough results are available here still another test with limit 100:
Result:
If I take out the filter "AND doc.art IN ['tutorial','report','tip','info','imho']", I get a different result again:
Here is the correct result with the workaround on sorting (doc.blabla DESC):
As mentioned above, the sorting is always slightly different for the same score. I hope I could help. |
Hi @admtech, Thanks for your reply. We tried to reproduce your issue on a different dataset with an analogous query. However, independent of which sorting mechanism was used, the same result set was returned. One possible cause of seeing a different result set could be that the indexing of your view "con_create" is broken. Could you try to create a new view using the same properties as view "con_create" and re-run the queries (replacing "con_create" with the newly created view)? In case the issue is still occurring with a new view, would you able to share a data set which reproduces the problem with the queries you stated? You can send us a message to hackers@arangodb.com (this ML is not public) and attach the dump in that email. This way we can try to reproduce and find the root cause. Please reference the number of this issue in your email. |
I created a new view and re-run the queries, but unfortunately the behavior remained the same. I will switch to ArangoDB 3.8 tonight and test it all again. |
Uh oh!
There was an error while loading. Please reload this page.
My Environment
Component, Query & Data
Affected feature: Server
Size of your Dataset on disk: 6.2 GB
Problem
When sorting a view only by TFIDF or BM25 in combination with the LIMIT command, some rows are simply not displayed. If I additionally sort by some fantasy field it works again.
AQL query (if applicable):
Simplified:
Does not work (only a few results):
Works (40 results and all correctly sorted):
If I leave out the additional sorting by "doc.anyfieldname", the FullCount tells me that 763 elements were found, but it is not interested in the limit and only 27 are really displayed. But "doc.anyfieldname" is a field that does not exist in the collection.
The search was performed for a term that occurred very often, so the result very often has the same score.
I first noticed the error with version 3.7.12.
Real example:
Search term: query = "Windows 11"
Result: 40 articles
If I leave out the "doc.blabla DESC," then I only ..
.. get 27 results and there are missing many articles.
Do I omit the limit completely:
Are correctly displayed 763 elements
No idea where the error is. But the fact is, if I put a fantasy field in front of the sorting, it works again (workaround)
greeting
Frank
The text was updated successfully, but these errors were encountered: