8000 Long query with 3 level transversal crashes ArangoDB · Issue #10412 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Long query with 3 level transversal crashes ArangoDB #10412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gustavo80b 8000 r opened this issue Nov 13, 2019 · 6 comments
Closed

Long query with 3 level transversal crashes ArangoDB #10412

gustavo80br opened this issue Nov 13, 2019 · 6 comments
Labels
1 Analyzing 3 AQL Query language related 3 Docker docker container integration workaround available

Comments

@gustavo80br
Copy link
gustavo80br commented Nov 13, 2019

My Environment

  • ArangoDB Version: 3.5.0
  • Storage Engine: RocksDB
  • Deployment Mode: Single Server
  • Deployment Strategy: ArangoDB Starter in Docker
  • Configuration: Docker container
  • Infrastructure: Own
  • Operating System: RancherOS
  • Total RAM in your machine: 4Gb
  • Disks in use: HDD
  • Used Package: Docker

Component, Query & Data

Affected feature:
AQL query using web interface

AQL query (if applicable):

FOR s IN Studies
    // Filtro data de estudo
    FILTER s.study_date < '2018-01-01 00:00'
    // Filtro instituição
    // Dados de Paciente
    LET patient_data = (FOR p IN 1..1 ANY s PatientStudies RETURN p._key)
    // Imagens
    FOR v, e, p IN 1..1 OUTBOUND s StudySeries, SeriesImages, ImageStores
        FILTER p.vertices[3] != null
        FILTER p.vertices[3]._key == '1'
        RETURN {
            'index_id': p.edges[2]._key,
            'file_path':  p.edges[2].file_path,
            'study_uid': p.vertices[0]._key,
            'series_uid': p.vertices[1]._key,
            'patient_id': patient_data[0]
        }

AQL explain (if applicable):

Execution plan:
 Id   NodeType            Est.   Comment
  1   SingletonNode          1   * ROOT
 18   IndexNode          56121     - FOR s IN Studies   /* persistent index scan */
  9   SubqueryNode       56121       - LET patient_data = ...   /* subquery */
  5   SingletonNode          1         * ROOT
  6   TraversalNode          1           - FOR p  /* vertex */ IN 1..1  /* min..maxPathDepth */ INBOUND s /* startnode */  PatientStudies
 17   LimitNode              1             - LIMIT 0, 1
  7   CalculationNode        1             - LET #12 = p.`_key`   /* attribute expression */
  8   ReturnNode             1             - RETURN #12
 10   TraversalNode     205379       - FOR v  /* vertex */, p  /* paths */ IN 3..3  /* min..maxPathDepth */ OUTBOUND s /* startnode */  StudySeries, OUTBOUND SeriesImages, OUTBOUND ImageStores
 11   CalculationNode   205379         - LET #16 = (p.`vertices`[3] != null)   /* simple expression */
 12   FilterNode        205379         - FILTER #16
 15   CalculationNode   205379         - LET #20 = { "index_id" : p.`edges`[2].`_key`, "file_path" : p.`edges`[2].`file_path`, "study_uid" : p.`vertices`[0].`_key`, "series_uid" : p.`vertices`[1].`_key`, "patient_id" : patient_data[0] }   /* simple expression */
 16   ReturnNode        205379         - RETURN #20

Indexes used:
 By   Name            Type         Collection       Unique   Sparse   Selectivity   Fields             Ranges
 18   idx_266263221   persistent   Studies          false    false        99.64 %   [ `study_date` ]   (s.`study_date` < "2018-01-01 00:00")
  6   edge            edge         PatientStudies   false    false            n/a   [ `_to` ]          base INBOUND
 10   edge            edge         ImageStores      false    false            n/a   [ `_from` ]        base OUTBOUND
 10   edge            edge         SeriesImages     false    false            n/a   [ `_from` ]        base OUTBOUND
 10   edge            edge         StudySeries      false    false            n/a   [ `_from` ]        base OUTBOUND

Traversals on graphs:
 Id  Depth  Vertex collections  Edge collections                        Options                                  Filter / Prune Conditions                                            
 6   1..1                       PatientStudies                          uniqueVertices: none, uniqueEdges: path                                                                       
 10  3..3                       StudySeries, SeriesImages, ImageStores  uniqueVertices: none, uniqueEdges: path  FILTER ((p.`vertices`[3] != null) && (p.`vertices`[3].`_key` == "1"))

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   optimize-subqueries
  4   move-calculations-up-2
  5   move-filters-up-2
  6   use-indexes
  7   remove-filter-covered-by-index
  8   optimize-traversals
  9   remove-filter-covered-by-traversal
 10   remove-unnecessary-calculations-2
 11   move-calculations-down

Dataset:
Graph Database. Images have 22mi items. On Study has multiple Series that have multiple Images.

Size of your Dataset on disk:

Steps to reproduce

  1. Run the query
  2. The query will take a long time, not complete
  3. ArangoDB crashes without any useful log

Problem:

The Query takes eternity to complete, than ArangoDB crashes.

Expected result:

The task could be long, but not expected to have ArangoDB crashing. I'm evaluating the product and unfortunately I cannot make it to work with my dataset, that would be like 10x bigger in production.

@OmarAyo
Copy link
Contributor
OmarAyo commented Nov 13, 2019

Hi @gustavo80br

It could be an OOM issue, then to rule this out would you kindly provide the followings:

  • arangod logs
  • the output of dmesg | grep arangod

Best

@OmarAyo OmarAyo added 3 AQL Query language related 3 Docker docker container integration 1 Analyzing Waiting User Reply labels Nov 13, 2019
@gustavo80br
Copy link
Author

Hi Omar, thanks for your time answering me. I managed to "solve" my problem, by spliting the query. In the first query I get all the Studies, and than in my application I run a for loop and for each item I run another query to get the images. My time is short, but as soon as possible I will get the logs and return to this issue. Thank you one more time!

@OmarAyo
Copy link
Contributor
OmarAyo commented Nov 14, 2019

Hi @gustavo80br

Nice that you were able to work around this somehow

For the time being, I am going to close this ticket, feel free to comment at any time here and we will reopen it

Best,

@gustavo80br
Copy link
Author

Hi Omar,

Follow the dmesg output:

[ 3207.446437] [ 5019]     0  5019  8995417  2846293    5889     131        0             0 arangod
[ 3207.446438] Out of memory: Kill process 5019 (arangod) score 940 or sacrifice child
[ 3207.447017] Killed process 5019 (arangod) total-vm:35981668kB, anon-rss:11385172kB, file-rss:0kB, shmem-rss:0kB
[ 3207.642163] oom_reaper: reaped process 5019 (arangod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I realize that every time I need to iterate over a very large result set, I get this OOM problem. Arango logs don't output any error, nothing happens there.

I reduced my query to the simple as possible. The Images collection has 25 million documents, a simple query just to return all of them trigger the OOM problem.

FOR i IN Images
    RETURN i

If I use LIMIT it's OK, but also have problems when the limit is like above:

FOR i IN Images
    LIMIT 20000000,10
    RETURN i

The bigger the starting point, slower the query. For example this query takes more than 10 seconds, but if the starting point is 0, is will take less than 1ms.

This kind of query in a system like Postgres is very simples, just do a SELECT and than iterate over the cursor to get the results. The results will take some time to retrieve, of course, but the the query will not trigger any memory issue.

If you can help me understanding how Arango works, I highly appreciate. I coded a lot of my application on Arango and now that I have the real data the queries don't work as expected. I raised memory to 16Gb and it just take more time before the OOM problem, never solve it. Also played with arangod.conf without any improvement.

Thanks in advance!

@graetzer
Copy link
Contributor

Use stream cursors, otherwise Arango builds the entire result in memory

@gustavo80br
Copy link
Author

Thank you Graetzer. Will try that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Analyzing 3 AQL Query language related 3 Docker docker container integration workaround available
Projects
None yet
Development

No branches or pull requests

3 participants
0