[3.2] [RocksDB] Slow response when use LIMIT #2948

amurchick · 2017-08-04T05:34:12Z

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

3.2.0

Mode:

Single-Server

Storage-Engine:

rocksdb

On this operating system:

Linux
- Debian .deb

I'm issuing AQL via:

arangosh

I've run db._explain("<my aql query>") and it didn't shed more light on this.
The AQL query in question is: FOR d IN onem LIMIT 999970, 10 RETURN d

The issue can be reproduced using this dataset: FOR uid IN 1..1000000 INSERT {uid} IN onem

These are the steps to reproduce:

Create collection with 1 millions documents: "FOR uid IN 1..1000000 INSERT {uid} IN onem"
Make queries:

127.0.0.1:8529@test> console.time('query'); db._query("FOR d IN onem LIMIT 70, 10 RETURN d"); console.timeEnd('query') 
2017-08-04T05:24:00Z [3281] INFO query: 2ms                

127.0.0.1:8529@test> console.time('query'); db._query("FOR d IN onem LIMIT 999970, 10 RETURN d"); console.timeEnd('query')                                                                                                                    
2017-08-04T05:24:09Z [3281] INFO query: 208ms

Whats wrong: The larger offset in LIMITthe more slow response.

The text was updated successfully, but these errors were encountered:

jsteemann · 2017-08-04T07:27:39Z

Whats wrong: The larger offset in LIMIT the more slow response.

I honestly do not have any idea how to fix this.
The RocksDB storage engine in ArangoDB intentionally does not keep any indexes or documents in RAM except for caching purposes. That means a collection scan like FOR doc IN collection LIMIT offset, x will actually have to go through the documents one by one and skip them until the offset value specified in offset has been reached.
That means it has a complexity of O(offset).

I am not saying that the skipping code cannot be optimized, I am just unsure how to solve the problem in general given the algorithmic complexity of the operation.

pocketwalker · 2019-05-16T08:57:32Z

any progress on this issue? "LIMIT 5000" is slow enough...

jsteemann · 2019-05-16T09:30:16Z

@pocketwalker : can you elaborate on "LIMIT 5000 is slow enough"?
The original issue was about using a high offset, e.g. "LIMIT x, y" (where x is a high number). A high offset requires a lot of scanning of data in the collection, so the execution time will be proportional to the value of x. This is somewhat unavoidable, and I would be surprised if other databases do behave much different here.
Or is your question more about a query that doesn't use an offset at all? I am asking because "LIMIT 5000" is effectively "LIMIT 0, 5000", i.e. a query with an offset of 0, so it may be a completely different thing.
Anyway, more information is needed to give you a sound answer.
Apart from that, this issue was opened for ArangoDB version 3.2, and the currently released version is now 3.4.5. There may have been other improvements since 3.2 that could have led to better execution times, but I cannot answer this more specifically as I don't know your exact use case, query, indexing etc.

pocketwalker · 2019-05-19T17:58:43Z

@jsteemann Thanks. In arangodb, I create a big graph with collections(nodes & edges) and run my queries on it. and I created primary and hash index on all collections.
My question here is that, what's the best practice if I want to get 5000 nodes/edges from billions, like blablabla LIMIT 5000 quickly?

jsteemann · 2019-05-20T06:32:34Z

@pocketwalker : the key thing in a query that uses LIMIT 5000 will not be the performance of the skipping (as LIMIT 5000 will perform absolutely no skipping) but if the query can make use of any indexes (and which).
Without any knowledge about the actual query and the setup that is all I can say right now.

pocketwalker · 2019-05-20T06:39:23Z

@jsteemann could you take a look at this issue, ? I put the query and also explain() into it.

TimNZ · 2020-05-06T01:14:46Z

This issue should be closed, @jsteemann explained it perfectly.

@pocketwalker:
If you don't use indexes smartly, your queries will get slower if a full scan is required, which always has to start at Document zero in the collection - all LIMIT does is skip processing X first documents, which it first has to scan regardless.

dothebart · 2020-06-22T08:59:41Z

since the referenced issue has been marked as solved, closing this as solved too.

amurchick changed the title ~~Slow response when use LIMIT~~ [3.2] [RocksDB] Slow response when use LIMIT Aug 4, 2017

dothebart added 1 Question 3 AQL Query language related performance 3 RocksDB Storage engine related 1 Feature labels Aug 31, 2017

sleto-it self-assigned this Sep 22, 2017

sleto-it removed their assignment Oct 12, 2018

dothebart closed this as completed Jun 22, 2020

dothebart added the 2 Solved Resolution label Jun 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[3.2] [RocksDB] Slow response when use LIMIT #2948

[3.2] [RocksDB] Slow response when use LIMIT #2948

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[3.2] [RocksDB] Slow response when use LIMIT #2948

[3.2] [RocksDB] Slow response when use LIMIT #2948

Comments

Uh oh!

my environment running ArangoDB

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!