8000 upsert timed out · Issue #9174 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

upsert timed out #9174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
daoxian opened this issue Jun 3, 2019 · 4 comments
Closed

upsert timed out #9174

daoxian opened this issue Jun 3, 2019 · 4 comments

Comments

@daoxian
Copy link
daoxian commented Jun 3, 2019

RocksDB, 3.4.5,

When I ran:

FOR r IN site_xinhuanet_com limit 1
UPSERT {url: r.url} INSERT {domain: r.domain_name, url: r.url, path: r.path, taskid: "3669"  } UPDATE {path: r.path, taskid: "3669"} INTO suggested_urls OPTIONS { ignoreErrors: true, exclusive: true }
I got:
Query: AQL: timeout, cluster node: 'PRMR-88cdef99-ccf5-40f6-a911-0872b7830abf', endpoint: 'tcp://11.167.74.55:8530', error: 'timeout in cluster operation' (while optimizing plan)

Explains are as follows:
Query String:
 FOR r IN site_xinhuanet_com limit 1
 UPSERT {url: r.url} INSERT {domain: r.domain_name, url: r.url, path: r.path, taskid: "3669"  } 
 UPDATE {path: r.path, taskid: "3669"} INTO suggested_urls OPTIONS { ignoreErrors: true, exclusive: 
 true }

Execution plan:
 Id   NodeType                  Site    Est.   Comment
  1   SingletonNode             DBS        1   * ROOT
  2   EnumerateCollectionNode   DBS   232736     - FOR r IN site_xinhuanet_com   /* full collection scan, 1 shard(s) */
 12   CalculationNode           DBS   232736       - LET #9 = { "domain" : r.`domain_name`, "url" : r.`url`, "path" : r.`path`, "taskid" : "3669" }   /* simple expression */   /* collections used: r : site_xinhuanet_com */
 13   CalculationNode           DBS   232736       - LET #11 = { "path" : r.`path`, "taskid" : "3669" }   /* simple expression */   /* collections used: r : site_xinhuanet_com */
 21   RemoteNode                COOR  232736       - REMOTE
 22   GatherNode                COOR  232736       - GATHER 
  3   LimitNode                 COOR       1       - LIMIT 0, 1
 10   SubqueryNode              COOR       1       - LET #4 = ...   /* subquery */
  4   SingletonNode             DBS        1         * ROOT
  5   EnumerateCollectionNode   DBS      296           - FOR #2 IN suggested_urls   /* full collection scan, 1 shard(s) */
  6   CalculationNode           DBS      296             - LET #7 = (#2.`url` == r.`url`)   /* simple expression */   /* collections used: r : site_xinhuanet_com */
  7   FilterNode                DBS      296             - FILTER #7
 25   RemoteNode                COOR     296             - REMOTE
 26   GatherNode                COOR     296             - GATHER 
  8   LimitNode                 COOR       1             - LIMIT 0, 1
  9   ReturnNode                COOR       1             - RETURN #2
 11   CalculationNode           COOR       1       - LET $OLD = #4[0]   /* simple expression */
 15   DistributeNode            COOR       1       - DISTRIBUTE  /* create keys: true, variable: $OLD */
 16   RemoteNode                DBS        1       - REMOTE
 14   UpsertNode                DBS        0       - UPSERT $OLD INSERT #9 UPDATE #11 IN suggested_urls
 17   RemoteNode                COOR       0       - REMOTE
 18   GatherNode                COOR       0       - GATHER 

Indexes used:
 By   Type      Collection       Unique   Sparse   Selectivity   Fields       Ranges
 14   primary   suggested_urls   true     false            n/a   [ `_key` ]   $OLD

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-calculations-up-2
  3   remove-data-modification-out-variables
  4   distribute-in-cluster
  5   scatter-in-cluster
  6   distribute-filtercalc-to-cluster
  7   remove-unnecessary-remote-scatter

Write query options:
 Option                   Value
 ignoreErrors             true
 waitForSync              false
 nullMeansRemove          false
 mergeObjects             true
 ignoreDocumentNotFound   false
 readCompleteInput        false
 useIsRestore             false
 consultAqlWriteFilter    false
 exclusive                true
 overwrite                false
 ignoreRevs               true

So what's the problem?

@daoxian
Copy link
Author
daoxian commented Jun 3, 2019

Maybe some bug caused the index of collection suggested_urls broken.

arangod.log:
2019-06-03T14:37:37Z [23304] ERROR {aql} failed to instantiate the query: Could not find index '31010411' in collection 's5010286'.

@rocketraman
Copy link
rocketraman commented Jan 28, 2020

@Simran-B I have exactly this same error from ArangoDB 3.5.3, upon trying to run a query which seems to be stuck running forever. The query is showing in the Slow Queries log with a runtime of 169,650 seconds (so far). The only errors I see in my application logs are:

Caused by: com.arangodb.ArangoDBException: Response: 500, Error: 1590 - AQL: error, cluster node: 'PRMR-e6e30ee8-02f5-4707-8361-89cbc0f69c69', endpoint: 'tcp://arangodb-db-2.arangodb-dbs.default.svc.cluster.local:8529', error: 'Could not find index '214730889' in collection 's3010751'.' (while optimizing plan)
at com.arangodb.async.internal.velocystream.VstCommunicationAsync.lambda$execute$0(VstCommunicationAsync.java:66) ~[arangodb-java-driver-6.4.1.jar:?]
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) ~[?:?]
at com.arangodb.async.internal.velocystream.VstConnectionAsync.lambda$write$0(VstConnectionAsync.java:48) ~[arangodb-java-driver-6.4.1.jar:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at com.arangodb.internal.velocystream.internal.MessageStore.consume(MessageStore.java:63) ~[arangodb-java-driver-6.4.1.jar:?]
at com.arangodb.internal.velocystream.internal.ChunkStore.checkCompleteness(ChunkStore.java:66) ~[arangodb-java-driver-6.4.1.jar:?]

Index '214730889' is indeed an index used by that query, but it definitely exists, so that message makes no sense whatsoever.

Note that in my case the query in question was a graph traversal relative to a specific object specified by _key that did ANY instead of INBOUND so it seems like it was scanning a massive dataset. Changing it to INBOUND and the query completes in less than half a second. However, this appears to be a problem in the graph query execution plan, because if the query just looked at the edges related to that vertex, it could have discarded the non-existing OUTBOUND connections immediately, so it should have been trivial extra work for this query execution to work with ANY.

So to summarize two issues here:

  1. Spurious error in the logs related to a missing index that is not missing at all, and

  2. A faulty execution plan for a graph traversal query resulting in seemingly infinite runtime, and the spurious error already mentioned.

@rocketraman
Copy link
rocketraman commented Jan 28, 2020

@Simran-B Its worse than that... somehow one of my data backends got into a state where it would just return this error on every call, even after the code was was altered to avoid the faulty execution plan and slow query previously mentioned. I had to restart the db server before ArangoDB would start responding correctly to queries.

Should I create a separate issue for this, or can this one be re-opened?

@rocketraman
Copy link

I created issue #11026.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants
0