8000 Aggeragate function MAX() returns empty values · Issue #6728 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Aggeragate function MAX() returns empty values #6728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michlschmid opened this issue Oct 4, 2018 · 6 comments
Closed

Aggeragate function MAX() returns empty values #6728

michlschmid opened this issue Oct 4, 2018 · 6 comments
Labels
1 Question 2 Solved Resolution 3 AQL Query language related

Comments

@michlschmid
Copy link
michlschmid commented Oct 4, 2018

My Environment

  • ArangoDB Version: 3.3.14
  • Storage Engine: MMFiles
  • Deployment Mode: Single Server
  • Deployment Strategy: Manual Start in Docker
  • Configuration:
  • Infrastructure: own
  • Operating System: Debian
  • Used Package: Docker official

Component, Query & Data

Affected feature: AQL

AQL query (if applicable):

FOR ke IN knoten_extern
    COLLECT AGGREGATE
    hwMin = MIN(ke.hw),
    hwMax = MAX(ke.hw),
    rwMin = MIN(ke.rw),
    rwMax = MAX(ke.rw)
RETURN { hwMin, hwMax, rwMin, rwMax }

Result:

[
  {
    "hwMin": 47.43814309,
    "hwMax": "",
    "rwMin": 10.2337726,
    "rwMax": ""
  }
]

AQL explain (if applicable):

Query string:
 FOR ke IN knoten_extern
     COLLECT AGGREGATE
     hwMin = MIN(ke.hw),
     hwMax = MAX(ke.hw),
     rwMin = MIN(ke.rw),
     rwMax = MAX(ke.rw)
 RETURN { hwMin, hwMax, rwMin, rwMax }

Execution plan:
 Id   NodeType                   Est.   Comment
  1   SingletonNode                 1   * ROOT
  2   EnumerateCollectionNode   27253     - FOR ke IN knoten_extern   /* full collection scan */
  3   CalculationNode           27253       - LET #5 = ke.`hw`   /* attribute expression */   /* collections used: ke : knoten_extern */
  5   CalculationNode           27253       - LET #9 = ke.`rw`   /* attribute expression */   /* collections used: ke : knoten_extern */
  7   CollectNode               21802       - COLLECT AGGREGATE hwMin = MIN(#5), hwMax = MAX(#5), rwMin = MIN(#9), rwMax = MAX(#9)   /* sorted */
  8   CalculationNode           21802       - LET #13 = { "hwMin" : hwMin, "hwMax" : hwMax, "rwMin" : rwMin, "rwMax" : rwMax }   /* simple expression */
  9   ReturnNode                21802       - RETURN #13

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   remove-redundant-calculations
  3   remove-unnecessary-calculations
  4   move-calculations-up-2

Dataset:

KnotenExtern holds GeoCoords stored as floats in "hw" and "rw" fields.

Replication Factor & Number of Shards (Cluster only):

Steps to reproduce

Execute the the query against datasets.

Problem:
MAX() results are empty.

Expected result:
MAX() should be filled at least with the MIN() values (if they were the same by accident).

@jsteemann
Copy link
Contributor

I tried to reproduce the problem with 3.3.14 (and 3.3.16) with MMFiles and the following test setup:

db._create("knoten_extern");
for (i = 0; i < 27253; ++i) { 
  db.knoten_extern.insert({ hw: -100 + Math.random() * 200, rw: -80 + Math.random() * 160 }); 
}

This produces the same query execution plan:

 Id   NodeType                   Est.   Comment
  1   SingletonNode                 1   * ROOT
  2   EnumerateCollectionNode   27253     - FOR ke IN knoten_extern   /* full collection scan */
  3   CalculationNode           27253       - LET #5 = ke.`hw`   /* attribute expression */   /* collections used: ke : knoten_extern */
  5   CalculationNode           27253       - LET #9 = ke.`rw`   /* attribute expression */   /* collections used: ke : knoten_extern */
  7   CollectNode               21802       - COLLECT AGGREGATE hwMin = MIN(#5), hwMax = MAX(#5), rwMin = MIN(#9), rwMax = MAX(#9)   /* sorted */
  8   CalculationNode           21802       - LET #13 = { "hwMin" : hwMin, "hwMax" : hwMax, "rwMin" : rwMin, "rwMax" : rwMax }   /* simple expression */
  9   ReturnNode                21802       - RETURN #13

Indexes used:
 none

Optimization rules applied:
 Id   RuleName
  1   move-calculations-up
  2   remove-redundant-calculations
  3   remove-unnecessary-calculations
  4   move-calculations-up-2

The results are as expected for me however:

[ 
  { 
    "hwMin" : -99.98547059997951, 
    "hwMax" : 99.99388712042543, 
    "rwMin" : -79.99387021400236, 
    "rwMax" : 79.99551531481356 
  } 
]

Can you confirm that the problem is reproducible with your dataset? How does the wrong query result actually look like, i.e. what values are returned for the MAX values in your case?
Is there anything special about the dataset, e.g. do all documents in the underlying collection have numeric values in hw and rw? What are the upper and lower bounds for the numbers inside hw and rw? Is it possible that any of these numbers is an invalid or "special" floating point number (NaN, +inf, -inf)?

@michlschmid
8000
Copy link
Author

@jsteemann I updated issue description with the result.

The weird thing for me is that MIN() returns a valid result and MAX() is simply empty.

@michlschmid
Copy link
Author

@jsteemann Question: How can I check if there are invalid values in my collection with AQL? Could you provide a code snippet?

Thanks a lot.

@jsteemann
Copy link
Contributor

Ah ok, got it. It looks like the results for MAX functions are empty strings.
This may actually be "expected" in case the dataset contains string values in hw and rw. MAX will also return the maximum value for strings or any other data type in AQL.
I am getting the same results as you after inserting a document as follows into the collection:

db.knoten_extern.insert({ hw: "", rw: "" });

This is because the empty string compares higher than any floating point value in AQL (please refer to the manual for the exact type and value order).

If you add some filter conditions to the query, e.g.

FILTER TYPENAME(ke.hw) == 'number' && TYPENAME(ke.rw) == 'number'  

it will filter out the non-numeric values and will not use them for the min/max calculation.

@jsteemann jsteemann added 3 AQL Query language related 1 Question 2 Solved Resolution labels Oct 5, 2018
@michlschmid
Copy link
Author

OK - thanks for the explanation. I was totally focused on the description of the collect and aggregat usage here:

Probably a hint on that manual page regarding the mentioned type order behaviour would be helpful to avoid wrong expectations.

Maybe helpful for other users...
:-)

Thanks for the filter statement. I added the code to my query as follows now the result is as expected.

The complete query as reference for the solution of that issue:

FOR ke IN knoten_extern
    FILTER TYPENAME(ke.hw) == 'number' && TYPENAME(ke.rw) == 'number'
    COLLECT AGGREGATE
    hwMin = MIN(ke.hw),
    hwMax = MAX(ke.hw),
    rwMin = MIN(ke.rw),
    rwMax = MAX(ke.rw)
RETURN { hwMin, hwMax, rwMin, rwMax }

@michlschmid
Copy link
Author

Info: Inverted the provide filter to select all the none number values an found three datasets that had no hw and rw value set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Question 2 Solved Resolution 3 AQL Query language related
Projects
None yet
Development

No branches or pull requests

2 participants
0