8000 Upsert insert errror · Issue #2182 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Upsert insert errror #2182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 21 tasks
roman-aprias-ipf opened this issue Nov 18, 2016 · 6 comments
Open
3 of 21 tasks

Upsert insert errror #2182

roman-aprias-ipf opened this issue Nov 18, 2016 · 6 comments
Assignees
Labels
1 Bug 3 AQL Query language related 3 Documentation

Comments

@roman-aprias-ipf
Copy link
roman-aprias-ipf commented Nov 18, 2016

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

  • 2.8
  • 3.1
  • self-compiled devel branch

On this operating system:

  • DCOS on
    • AWS
    • Azure
    • own infrastructure
  • Linux
    • Debian .deb
    • Ubuntu .deb
    • SUSE .rpm
    • RedHat .rpm
    • Fedora .rpm
    • Gentoo
    • docker - official docker library
    • other:
  • Windows, version:
  • MacOS, version:

this is an AQL-related issue:

[x] I'm using graph features

I'm issuing AQL via:

  • web interface with this browser: chrome running on this OS: win10
  • arangosh
  • this Driver:

I have following query which returns four objects:

let vertex=(for dev in vDevice return dev)
let edge=(
for dev in vertex
for srcIntL2 in outbound dev eDevIntL2
for dstIntL2 in outbound srcIntL2 eCdp
for dstDev in inbound dstIntL2 eDevIntL2
filter dev._key != dstDev._key
filter dev._id == "vDevice/232472397" && dstDev._id == "vDevice/232481776"
let e = (dev._key<dstDev._key)?({ srcId:dev._id, dstId:dstDev._id, srcIntName:srcIntL2.name, dstIntName: dstIntL2.name, srcArrow: 0, dstArrow: 1 })
                              :({ srcId:dstDev._id, dstId:dev._id, srcIntName:dstIntL2.name, dstIntName: srcIntL2.name, srcArrow: 1, dstArrow: 0 })
collect srcId = e.srcId, dstId = e.dstId, srcIntName = e.srcIntName, dstIntName = e.dstIntName into g
return {srcId:srcId, dstId:dstId, srcIntName:srcIntName, dstIntName: dstIntName, srcArrow: sum(g[*].e.srcArrow), dstArrow: sum(g[*].e.dstArrow)})
return edge 
[
  [
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/0",
      "dstIntName": "Et1/0",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/1",
      "dstIntName": "Et1/1",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/2",
      "dstIntName": "Et1/2",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/3",
      "dstIntName": "Et1/3",
      "srcArrow": 0,
      "dstArrow": 1
    }
  ]
]

When I try to upsert these object with search on _from and _id (all four documents have it same), there shoudl be only one object inserted:

let vertex=(for dev in vDevice return dev)
let edge=(
for dev in vertex
for srcIntL2 in outbound dev eDevIntL2
for dstIntL2 in outbound srcIntL2 eCdp
for dstDev in inbound dstIntL2 eDevIntL2
filter dev._key != dstDev._key
filter dev._id == "vDevice/232472397" && dstDev._id == "vDevice/232481776"
let e = (dev._key<dstDev._key)?({ srcId:dev._id, dstId:dstDev._id, srcIntName:srcIntL2.name, dstIntName: dstIntL2.name, srcArrow: 0, dstArrow: 1 })
                              :({ srcId:dstDev._id, dstId:dev._id, srcIntName:dstIntL2.name, dstIntName: srcIntL2.name, srcArrow: 1, dstArrow: 0 })
collect srcId = e.srcId, dstId = e.dstId, srcIntName = e.srcIntName, dstIntName = e.dstIntName into g
return {srcId:srcId, dstId:dstId, srcIntName:srcIntName, dstIntName: dstIntName, srcArrow: sum(g[*].e.srcArrow), dstArrow: sum(g[*].e.dstArrow)})

for u in unique(edge)
UPSERT {_from: u.srcId, _to: u.dstId}
INSERT {_from: u.srcId, _to: u.dstId, type: 'cdp', srcIntName: u.srcIntName, dstIntName: u.dstIntName, srcArrow: u.srcArrow, dstArrow: u.dstArrow}
UPDATE {}
IN eLink

But it's not, there are four object inserted:

for e in eLink
return e
[
  {
    "_key": "232780061",
    "_id": "eLink/232780061",
    "_from": "vDevice/232472397",
    "_to": "vDevice/232481776",
    "_rev": "_UFdcl16--B",
    "dstArrow": 1,
    "dstIntName": "Et1/0",
    "srcArrow": 0,
    "srcIntName": "Et15/0",
    "type": "cdp"
  },
  {
    "_key": "232780057",
    "_id": "eLink/232780057",
    "_from": "vDevice/232472397",
    "_to": "vDevice/232481776",
    "_rev": "_UFdcl16--_",
    "dstArrow": 1,
    "dstIntName": "Et1/2",
    "srcArrow": 0,
    "srcIntName": "Et15/2",
    "type": "cdp"
  },
  {
    "_key": "232780053",
    "_id": "eLink/232780053",
    "_from": "vDevice/232472397",
    "_to": "vDevice/232481776",
    "_rev": "_UFdcl16---",
    "dstArrow": 1,
    "dstIntName": "Et1/3",
    "srcArrow": 0,
    "srcIntName": "Et15/3",
    "type": "cdp"
  },
  {
    "_key": "232780059",
    "_id": "eLink/232780059",
    "_from": "vDevice/232472397",
    "_to": "vDevice/232481776",
    "_rev": "_UFdcl16--A",
    "dstArrow": 1,
    "dstIntName": "Et1/1",
    "srcArrow": 0,
    "srcIntName": "Et15/1",
    "type": "cdp"
  }
]

When I try to supply four objects from previous query in variable, it works correctly:

let edge = [
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/0",
      "dstIntName": "Et1/0",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/1",
      "dstIntName": "Et1/1",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/2",
      "dstIntName": "Et1/2",
      "srcArrow": 0,
      "dstArrow": 1
    },
    {
      "srcId": "vDevice/232472397",
      "dstId": "vDevice/232481776",
      "srcIntName": "Et15/3",
      "dstIntName": "Et1/3",
      "srcArrow": 0,
      "dstArrow": 1
    }
  ]
for u in unique(edge)
UPSERT {_from: u.srcId, _to: u.dstId}
INSERT {_from: u.srcId, _to: u.dstId, type: 'cdp', srcIntName: u.srcIntName, dstIntName: u.dstIntName, srcArrow: u.srcArrow, dstArrow: u.dstArrow}
UPDATE {}
IN eLink

Only one object is inserted:

for e in eLink
return e
[
  {
    "_key": "232780365",
    "_id": "eLink/232780365",
    "_from": "vDevice/232472397",
    "_to": "vDevice/232481776",
    "_rev": "_UFdgnSe---",
    "dstArrow": 1,
    "dstIntName": "Et1/0",
    "srcArrow": 0,
    "srcIntName": "Et15/0",
    "type": "cdp"
  }
]

Here are explains for both queries:

Query string:

 let vertex=(for dev in vDevice return dev)
 let edge=(
 for dev in vertex
 for srcIntL2 in outbound dev eDevIntL2
 for dstIntL2 in outbound srcIntL2 eCdp
 for dstDev in inbound dstIntL2 eDevIntL2
 filter dev._key != dstDev._key
 filter dev._id == "vDevice/232472397" && dstDev._id == "vDevice/232481776"
 let e = (dev._key<dstDev._key)?({ srcId:dev._id, dstId:dstDev._id, srcIntName:srcIntL2.name, 
 dstIntName: dstIntL2.name, srcArrow: 0, dstArrow: 1 })
                               :({ srcId:dstDev._id, dstId:dev._id, srcIntName:dstIntL2.name, 
 dstIntName: srcIntL2.name, srcArrow: 1, dstArrow: 0 })
 collect srcId = e.srcId, dstId = e.dstId, srcIntName = e.srcIntName, dstIntName = e.dstIntName into 
 g
 return {srcId:srcId, dstId:dstId, srcIntName:srcIntName, dstIntName: dstIntName, srcArrow: 
 sum(g[*].e.srcArrow), dstArrow: sum(g[*].e.dstArrow)})
Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
 35   CalculationNode              1     - LET #54 = { }   /* json expression */   /* const assignment */
  5   SubqueryNode                 1     - LET vertex = ...   /* const subquery */
  2   SingletonNode                1       * ROOT
  3   EnumerateCollectionNode    569         - FOR dev IN vDevice   /* full collection scan */
  4   ReturnNode                 569           - RETURN dev
 23   SubqueryNode                 1     - LET edge = ...   /* subquery */
  6   SingletonNode                1       * ROOT
  7   EnumerateListNode          569         - FOR dev IN vertex   /* list iteration */
  8   TraversalNode             1120           - FOR srcIntL2  /* vertex */ IN 1..1  /* min..maxPathDepth */ OUTBOUND dev /* startnode */  eDevIntL2
  9   TraversalNode             1152           - FOR dstIntL2  /* vertex */ IN 1..1  /* min..maxPathDepth */ OUTBOUND srcIntL2 /* startnode */  eCdp
 10   TraversalNode             2269           - FOR dstDev  /* vertex */ IN 1..1  /* min..maxPathDepth */ INBOUND dstIntL2 /* startnode */  eDevIntL2
 11   CalculationNode           2269           - LET #34 = (dev.`_key` != dstDev.`_key`)   /* simple expression */
 12   FilterNode                2269           - FILTER #34
 13   CalculationNode           2269           - LET #36 = ((dev.`_id` == "vDevice/232472397") && (dstDev.`_id` == "vDevice/232481776"))   /* simple expression */
 14   FilterNode                2269           - FILTER #36
 15   CalculationNode           2269           - LET e = ((dev.`_key` < dstDev.`_key`) ? { "srcId" : dev.`_id`, "dstId" : dstDev.`_id`, "srcIntName" : srcIntL2.`name`, "dstIntName" : dstIntL2.`name`, "srcArrow" : 0, "dstArrow" : 1 } : { "srcId" : dstDev.`_id`, "dstId" : dev.`_id`, "srcIntName" : dstIntL2.`name`, "dstIntName" : srcIntL2.`name`, "srcArrow" : 1, "dstArrow" : 0 })   /* simple expression */
 16   CalculationNode           2269           - LET #38 = e.`srcId`   /* attribute expression */
 17   CalculationNode           2269           - LET #40 = e.`dstId`   /* attribute expression */
 18   CalculationNode           2269           - LET #42 = e.`srcIntName`   /* attribute expression */
 19   CalculationNode           2269           - LET #44 = e.`dstIntName`   /* attribute expression */
 37   SortNode                  2269           - SORT #38 ASC, #40 ASC, #42 ASC, #44 ASC
 20   CollectNode               1815           - COLLECT srcId = #38, dstId = #40, srcIntName = #42, dstIntName = #44 INTO g   /* sorted*/
 21   CalculationNode           1815           - LET #46 = { "srcId" : srcId, "dstId" : dstId, "srcIntName" : srcIntName, "dstIntName" : dstIntName, "srcArrow" : SUM(g[*].`e`.`srcArrow`), "dstArrow" : SUM(g[*].`e`.`dstArrow`) }   /* simple expression */
 22   ReturnNode                1815           - RETURN #46
 24   CalculationNode              1     - LET #48 = UNIQUE(edge)   /* simple expression */
 25   EnumerateListNode          100     - FOR u IN #48   /* list iteration */
 32   SubqueryNode               100       - LET #25 = ...   /* subquery */
 26   SingletonNode                1         * ROOT
 38   IndexNode                    1           - FOR #23 IN eLink   /* edge index scan */
 39   CalculationNode              1             - LET #50 = (#23.`_to` == u.`dstId`)   /* simple expression */
 29   FilterNode                   1             - FILTER #50
 30   LimitNode                    1             - LIMIT 0, 1
 31   ReturnNode                   1             - RETURN #23
 33   CalculationNode            100       - LET $OLD = #25[0]   /* simple expression */
 34   CalculationNode            100       - LET #52 = { "_from" : u.`srcId`, "_to" : u.`dstId`, "type" : "cdp", "srcIntName" : u.`srcIntName`, "dstIntName" : u.`dstIntName`, "srcArrow" : u.`srcArrow`, "dstArrow" : u.`dstArrow` }   /* simple expression */
 36   UpsertNode                   0       - UPSERT $OLD INSERT #52 UPDATE #54 IN eLink

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields               Ranges
 38   edge   eLink        false    false       100.00 %   [ `_from`, `_to` ]   (#23.`_from` == u.`srcId`)

Traversals on graphs:
 Id   Depth   Vertex collections   Edge collections   Filter conditions
  8   1..1                         eDevIntL2          
  9   1..1                         eCdp               
 10   1..1                         eDevIntL2          

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   move-filters-up
  3   move-calculations-up-2
  4   move-filters-up-2
  5   remove-data-modification-out-variables
  6   use-indexes
  7   remove-filter-covered-by-index
  8   move-calculations-down

Write query options:
 Option                   Value
 ignoreErrors             false
 waitForSync              false
 nullMeansRemove          false
 mergeObjects             true
 ignoreDocumentNotFound   false
 readCompleteInput        true
 useIsRestore             false
 consultAqlWriteFilter    false
Query string:
 let edge = [
     {
       "srcId": "vDevice/232472397",
       "dstId": "vDevice/232481776",
       "srcIntName": "Et15/0",
       "dstIntNam...

Execution plan:
 Id   NodeType            Est.   Comment
  1   SingletonNode          1   * ROOT
  3   CalculationNode        1     - LET #8 = [ { "dstArrow" : 1, "dstId" : "vDevice/232481776", "dstIntName" : "Et1/0", "srcArrow" : 0, "srcId" : "vDevice/232472397", "srcIntName" : "Et15/0" }, { "dstArrow" : 1, "dstId" : "vDevice/232481776", "dstIntName" : "Et1/1", "srcArrow" : 0, "srcId" : "vDevice/232472397", "srcIntName" : "Et15/1" }, { "dstArrow" : 1, "dstId" : "vDevice/232481776", "dstIntName" : "Et1/2", "srcArrow" : 0, "srcId" : "vDevice/232472397", "srcIntName" : "Et15/2" }, { "dstArrow" : 1, "dstId" : "vDevice/232481776", "dstIntName" : "Et1/3", "srcArrow" : 0, "srcId" : "vDevice/232472397", "srcIntName" : "Et15/3" } ]   /* json expression */   /* const assignment */
 14   CalculationNode        1     - LET #14 = { }   /* json expression */   /* const assignment */
  4   EnumerateListNode      4     - FOR u IN #8   /* list iteration */
 11   SubqueryNode           4       - LET #5 = ...   /* subquery */
  5   SingletonNode          1         * ROOT
 16   IndexNode              1           - FOR #3 IN eLink   /* edge index scan */
 17   CalculationNode        1             - LET #10 = (#3.`_to` == u.`dstId`)   /* simple expression */
  8   FilterNode             1             - FILTER #10
  9   LimitNode              1             - LIMIT 0, 1
 10   ReturnNode             1             - RETURN #3
 12   CalculationNode        4       - LET $OLD = #5[0]   /* simple expression */
 13   CalculationNode        4       - LET #12 = { "_from" : u.`srcId`, "_to" : u.`dstId`, "type" : "cdp", "srcIntName" : u.`srcIntName`, "dstIntName" : u.`dstIntName`, "srcArrow" : u.`srcArrow`, "dstArrow" : u.`dstArrow` }   /* simple expression */
 15   UpsertNode             0       - UPSERT $OLD INSERT #12 UPDATE #14 IN eLink

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields               Ranges
 16   edge   eLink        false    false       100.00 %   [ `_from`, `_to` ]   (#3.`_from` == u.`srcId`)

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   remove-unnecessary-calculations
  3   move-calculations-up-2
  4   remove-data-modification-out-variables
  5   use-indexes
  6   remove-filter-covered-by-index
  7   move-calculations-down

Write query options:
 Option                   Value
 ignoreErrors             false
 waitForSync              false
 nullMeansRemove          false
 mergeObjects             true
 ignoreDocumentNotFound   false
 readCompleteInput        false
 useIsRestore             false
 consultAqlWriteFilter    false
@baslr
Copy link
Contributor
baslr commented Nov 18, 2016
let edge = [
   ...
  ]
for u in unique(edge)
UPSERT {_from: u.srcId, _to: u.dstId}
INSERT {_from: u.srcId, _to: u.dstId, type: 'cdp', srcIntName: u.srcIntName, dstIntName: u.dstIntName, srcArrow: u.srcArrow, dstArrow: u.dstArrow}
UPDATE {}
IN eLink

produces 4 writes. Since return length(unique(edge)) is 4.

@Simran-B
Copy link
Contributor

@baslr: I think his point is that UPSERT should de-duplicate it to 1 write. I'm not fully sure, but the reason might be that there is no such edge before the query, so INSERTing is chosen, and it doesn't read from the collection it inserts into during the loop, thus not seeing any duplicates from its point of view and inserting all 4.

@Simran-B Simran-B added the 3 AQL Query language related label Nov 18, 2016
@baslr
Copy link
Contributor
baslr commented Nov 18, 2016
for x in [1,2,3,4]

upsert {_from:'a/foo', _to:'b/foo'}
insert {_from:'a/foo', _to:'b/foo'}
update {} in test

indeed this query inserts only one edge. Normal collections dropping _from and _to silently.

@roman-aprias-ipf
Copy link
Author

@baslr: I did similar query to reproduce issue, but in such a case it works as expected. As you can see from steps which I posted above:

  1. I have query which fills edge variable let edge = ....
  2. This query returns four similar object, they all have same srcId a dstId properties which I use as _from and _to and as search object in upsert
  3. Now when I follow with upsert, there are four documents inserted instead of one. This is the issue.
  4. But when I print four object from query and put them manually to let edge = .... and then run upsert, only one document is inserted. Which is correct.

So upsert is somehow affected by previous query which gets objects from database. If you can't debug the issue from explains I provided, I can give you database dump so you can try to replicate it.

@jsteemann
Copy link
Contributor

The difference is the value of readCompleteInput in the section Write query options of the explain.
In case an AQL query reads and writes from the same collection or if the optimizer cannot prove the query will not read from the collection that is being upserted, the UPSERT executes its document lookup "too early" and not take into account the modifications done by the query itself.

To simplify, the UPSERT part of query will execute as follows (pseudo code):

FOR doc IN whatever
  LET isThere = lookup(exampleDocument)
  if (!isThere)
     INSERT newDoc
  else 
     UPDATE doc

In AQL all operations are executed in batches, with batchSize = n (the default value for n is 1000).
With readCompleteInput = true, there will first be n lookups (which will not find the document) that will set isThere to false (or more precise to an array with n false values).
So there will follow n INSERT operations later on.

With readCompleteInput = false, the lookups are performed in lockstep with the INSERTs, so after each lookup isThere will be an array with a single boolean. Most important, that lookup will have taken into account previous insert operations performed by the query. So the later decision whether to insert or update is correct in this case.

The readCompleteInput variable is an internal flag that is controlled by the query optimizer. It will be set to false only if

  • there are no reads and writes to the same collection in a query and
  • if the query does not use AQL functions that could read documents and
  • if there is no dynamic collection access (which could be triggered by a traversal as in the original example query).

And that's the reason it works when UPSERTing values from a static array but does not when used in the complex query (which features traversals).

The obvious fix would be to set readCompleteInput to false always, but this would make queries fail that update/upsert collections from which they read. Setting it to false in all cases would make queries like the following break, because the FOR would then probably see documents that the INSERT just created.

FOR doc IN collection
  INSERT { } INTO collection 

We are currently planning some changes for the transactions and document lookup internals for a future version, likely 3.3. These changes would likely fix this issue, but they are not there yet.

At the moment there is no easy fix or workaround for this, apart from making the original query simpler or splitting it into multiple queries (one that's creating the input data for the UPSERT and one that executes it).

@roman-aprias-ipf
Copy link
Author

Thanks for your explanation. It's not big issue for us, I rewrote code logic to use only insert. I would be good idea to add some warning about this behavior to your documentation to UPSERT cmd until it's fixed.

@Simran-B Simran-B self-assigned this Dec 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Bug 3 AQL Query language related 3 Documentation
Projects
None yet
Development

No branches or pull requests

4 participants
0