8000 [3.2] [RocksDB] Slow response when use LIMIT · Issue #2948 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

[3.2] [RocksDB] Slow response when use LIMIT #2948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
6 tasks done
amurchick opened this issue Aug 4, 2017 · 8 comments
Closed
6 tasks done

[3.2] [RocksDB] Slow response when use LIMIT #2948

amurchick opened this issue Aug 4, 2017 · 8 comments
Labels
1 Feature 1 Question 2 Solved Resolution 3 AQL Query language related 3 RocksDB Storage engine related performance

Comments

@amurchick
Copy link
amurchick commented Aug 4, 2017

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

  • 3.2.0

Mode:

  • Single-Server

Storage-Engine:

  • rocksdb

On this operating system:

  • Linux
    • Debian .deb

I'm issuing AQL via:

  • arangosh

I've run db._explain("<my aql query>") and it didn't shed more light on this.
The AQL query in question is: FOR d IN onem LIMIT 999970, 10 RETURN d

The issue can be reproduced using this dataset: FOR uid IN 1..1000000 INSERT {uid} IN onem

These are the steps to reproduce:

  1. Create collection with 1 millions documents: "FOR uid IN 1..1000000 INSERT {uid} IN onem"
  2. Make queries:
127.0.0.1:8529@test> console.time('query'); db._query("FOR d IN onem LIMIT 70, 10 RETURN d"); console.timeEnd('query') 
2017-08-04T05:24:00Z [3281] INFO query: 2ms                

127.0.0.1:8529@test> console.time('query'); db._query("FOR d IN onem LIMIT 999970, 10 RETURN d"); console.timeEnd('query')                                                                                                                    
2017-08-04T05:24:09Z [3281] INFO query: 208ms              

Whats wrong: The larger offset in LIMITthe more slow response.

@amurchick amurchick changed the title Slow response when use LIMIT [3.2] [RocksDB] Slow response when use LIMIT Aug 4, 2017
@jsteemann
Copy link
Contributor

Whats wrong: The larger offset in LIMIT the more slow response.

I honestly do not have any idea how to fix this.
The RocksDB storage engine in ArangoDB intentionally does not keep any indexes or documents in RAM except for caching purposes. That means a collection scan like FOR doc IN collection LIMIT offset, x will actually have to go through the documents one by one and skip them until the offset value specified in offset has been reached.
That means it has a complexity of O(offset).

I am not saying that the skipping code cannot be optimized, I am just unsure how to solve the problem in general given the algorithmic complexity of the operation.

@dothebart dothebart added 1 Question 3 AQL Query language related performance 3 RocksDB Storage engine related 1 Feature labels Aug 31, 2017
@sleto-it sleto-it self-assigned this Sep 22, 2017
@sleto-it sleto-it removed their assignment Oct 12, 2018
@pocketwalker
Copy link

any progress on this issue? "LIMIT 5000" is slow enough...

@jsteemann
Copy link
Contributor

@pocketwalker : can you elaborate on "LIMIT 5000 is slow enough"?
The original issue was about using a high offset, e.g. "LIMIT x, y" (where x is a high number). A high offset requires a lot of scanning of data in the collection, so the execution time will be proportional to the value of x. This is somewhat unavoidable, and I would be surprised if other databases do behave much different here.
Or is your question more about a query that doesn't use an offset at all? I am asking because "LIMIT 5000" is effectively "LIMIT 0, 5000", i.e. a query with an offset of 0, so it may be a completely different thing.
Anyway, more information is needed to give you a sound answer.
Apart from that, this issue was opened for ArangoDB version 3.2, and the currently released version is now 3.4.5. There may have been other improvements since 3.2 that could have led to better execution times, but I cannot answer this more specifically as I don't know your exact use case, query, indexing etc.

@pocketwalker
Copy link

@jsteemann Thanks. In arangodb, I create a big graph with collections(nodes & edges) and run my queries on it. and I created primary and hash index on all collections.
My question here is that, what's the best practice if I want to get 5000 nodes/edges from billions, like blablabla LIMIT 5000 quickly?

@jsteemann
Copy link
Contributor

@pocketwalker : the key thing in a query that uses LIMIT 5000 will not be the performance of the skipping (as LIMIT 5000 will perform absolutely no skipping) but if the query can make use of any indexes (and which).
Without any knowledge about the actual query and the setup that is all I can say right now.

@pocketwalker
Copy link

@jsteemann could you take a look at this issue, ? I put the query and also explain() into it.

@TimNZ
Copy link
TimNZ commented May 6, 2020

This issue should be closed, @jsteemann explained it perfectly.

@pocketwalker:
If you don't use indexes smartly, your queries will get slower if a full scan is required, which always has to start at Document zero in the collection - all LIMIT does is skip processing X first documents, which it first has to scan regardless.

@dothebart
Copy link
Contributor

since the referenced issue has been marked as solved, closing this as solved too.

@dothebart dothebart added the 2 Solved Resolution label Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Feature 1 Question 2 Solved Resolution 3 AQL Query language related 3 RocksDB Storage engine related performance
Projects
None yet
Development

No branches or pull requests

6 participants
0