-
Notifications
You must be signed in to change notification settings - Fork 852
Using mixed sort directions ignores indexes #9451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @erayhanoglu, Thanks for reporting, I was able to reproduce this issue. |
However this behavior is expected and documented at: https://www.arangodb.com/docs/stable/indexing-index-basics.html#skiplist-index
Therefore using mixed SORT directions requires the creation of mixed-order indexes, which would be a new feature. |
Thank you for your reply. Documentation says |
The documentation states that it is not possible to run your query on a single skiplist index that is created on two attributes Sorting by two skiplist indexes, which are created on one attribute each, (exactly your use case) does not work. Since index 1 is not related to index 2, it is not possible to apply the SORT with a second index to a set, which was created using the first index. This applies to databases in general. |
@maxkernbach Out of curiosity, if you did have an asc/asc compound index (let's call the attributes a and b in index order) and wanted to sort a asc and b desc, would it be possible to add a feature where ADB finds the first match for a and then does a binary search to find the lowest value of b, and traverses the index in the opposite direction to do the sort on b? Then move on to the next value of a and so on. That would make the sort O(n log n) vs O(n) but wouldn't increase memory use, which to me is the main reason to always sort on indexes rather than in memory. E.g. if the index looks like
Then the algorithm, assuming there were no filters on a, would find Note I'm not necessarily making a request for this feature, I'm just wondering if it's feasible. It would mean you could do mixed direction sorts without having to have multiple indexes (one for each set of directions you wish to sort on). |
@MrCreosote I don't think that this approach is feasible. But I do think that we can utilize the index on The index's selectivity estimation is already a good indicator of the average subset size. E.g., with a selectivity estimation of 20%, the average subset size is 5. This means, that for each item the max average distance from its current position to its final position is only 4. This is the perfect use case for insertion sort. In case we have a combined index for @maxkernbach what do you think? Mixed-ordered indexes would certainly be nice, but I would assume that they would definitely require more effort than a simple rule for the query optimizer. |
@mpoeter Thanks for the response!
Out of curiosity, why is that? I don't really understand the index structure but I'm def. interested.
Aren't there edge cases that could blow up though? For instance, I have a prod Mongo DB with a compound index on |
@MrCreosote it is mainly a gut feeling. To be honest, I don't see how the second index would be of much help. Yes, it does give you the order of items for In the worst case, insertion sort is O(n²), so you are right, there are certainly edge cases where the insertion sort could blow up. But I think these cases could easily be recognized and handled accordingly. But this cannot happen for the combined index with reverse. A reverse operation can be done in linear time with constant memory overhead. |
Sure, but that could be an extremely large memory overhead if there are, say, 100s of millions of objects for one of the
Again, I don't understand the index structure, but naively I'd expect you could perform a binary search to find the last |
Yes, but traversing the index on
No, a reverse operation can be done with constant memory overhead (you only need a single temporary object) and in linear time, i.e., O(n). Therefore, the complete query (enumerating the items using the compound index + reversing the query) has runtime complexity O(n). |
That would mean that for each entry in the index, every other entry in the index is visited, which isn't the case in the algorithm I proposed. I've realized that it can be simplified to a linear time case once the first item in the returned range is found. Assuming a compound index
In this case other than finding
Isn't this true only if you have a mutable array in which the data is stored so you can perform the reversal in place? In the case we're talking about you start with a position in an index, so to perform the reversal the data in the index would have to be loaded into another mutable data structure which could require substantial memory depending on the result set. My assumption (and maybe I'm wrong) is that this doesn't happen for a standard query and the results are loaded from the index in batches and returned to the user, not requiring a memory allocation for the full result set. |
@MrCreosote I should have read your first posting more thoroughly. The initial question was about two separate indexes for the two attributes, and somehow your sketch with with the two columns led me to believe that you were also talking about two separate indexes.
This basically the same as my reverse approach. I see two different ways to do this - either in a single pass while traversing the index (which is basically the same algorithm you describe), or in a separate pass. Obviously, doing this in a single pass would be more efficient. However, the version with a separate pass might be easier to introduce with the existing code.
Unless you are using streaming cursors, Arango builds the complete query result in-memory before sending the response. |
@mpoeter Sounds like we were talking past each other for a bit there :P
So if I have a 100M 10kB / node collection and make a request to traverse and return all the nodes in the collection the server's OOMing? Sorry if this is a newb question, I just started using Arango coming from Mongo where AFAIK the result set is not loaded into memory (where the drawback is that the cursor can return the same document more than once if it is changed while iteration via the cursor is in progress). |
Uh oh!
There was an error while loading. Please reload this page.
My Environment
Component, Query & Data
Affected feature:
Optimizer
AQL query (if applicable):
FOR c IN Patients
SORT c.name_given, c.name_family desc
RETURN c
AQL explain (if applicable):
Steps to reproduce
execute a query which sorts this two properties with same direction.
etc:
You will see, index for first sort property will be used.
etc:
You will see, no index used.
Problem:
Optimizer ignores indexes if different sort directions used.
Expected result:
The text was updated successfully, but these errors were encountered: