8000 Unexpected behavior with undefined keys in AQL filters with numeric range expressions · Issue #4828 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Unexpected behavior with undefined keys in AQL filters with numeric range expressions #4828

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 31 tasks
dvstans opened this issue Mar 12, 2018 · 5 comments
Closed
5 of 31 tasks
Labels
1 Question 2 Solved Resolution 3 AQL Query language related

Comments

@dvstans
Copy link
dvstans commented Mar 12, 2018

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

  • 2.8
  • 3.0
  • 3.1
  • 3.2
  • 3.3
  • self-compiled devel branch

Mode:

  • Cluster
  • Single-Server

Storage-Engine:

  • mmfiles
  • rocksdb

On this operating system:

  • DCOS on
    • AWS
    • Azure
    • own infrastructure
  • Linux
    • Debian .deb
    • Ubuntu .deb
    • SUSE .rpm
    • RedHat .rpm
    • Fedora .rpm
    • Gentoo
    • docker - official docker library
    • other:
  • Windows, version:
  • MacOS, version:

this is an AQL-related issue:

[ ] I'm using graph features

I'm issuing AQL via:

  • web interface with this browser: running on this OS:
  • arangosh
  • this Driver:

I've run db._explain("<my aql query>") and it didn't shed more light on this.
The AQL query in question is:

"for i in docs filter i.x < 10 return i"

The issue can be reproduced using this dataset:

db.docs.save({x:-10,y:-5})
db.docs.save({y:5})
db.docs.save({x:5})
db.docs.save({x:20,y:10})

Please provide a way to create the dataset to run the above query on; either by a gist with an arangodump, or `db.collection.save({my: "values"}) statements. If it can be reproduced with one of the ArangoDB example datasets, it's a plus.

Foxx

this is a web interface-related issue: NO

I'm using the web interface with this browser: running on this OS:

  • authentication is enabled?
  • using the cluster?
  • _system database?

These are the steps to reproduce:

  1. open the browser on http://127.0.0.1:8529
  2. log in as ...
  3. use database [ ] _system [ ] other:
  4. click to ...
    ...

The following problem occurs: Result includes expected documents with 'x' < 10 (#1, and #3), but also returns the document without 'x' defined (doc #2).

Instead I would be expecting: Only documents that have 'x' defined and that meet the filter criteria (i.e. not NULL).

If the provided query is changed to "for i in docs filter i.x > -10 return i", the expected results are returned (i.e. it ignores documents where 'x' is not defined).

this is an installation-related issue: N/A

Describe which steps you carried out, what you expected to happen and what actually happened.

@jsteemann
Copy link
Contributor

It is not necessarily unexpected result, at least when taking this comparison order into account
(from https://docs.arangodb.com/3.3/AQL/Fundamentals/TypeValueOrder.html):

null < bool < number < string < array/list < object/document

In the AQL type system, null compares less than any numeric value, and it is included in ranges.
This is in contrast to other type systems such as in SQL, where null is excluded everywhere and is infectious in the sense that any null value in an SQL expression turns the expression result into null as well.
In ArangoDB it is a bit more common to refer to non-existing attributes and such because there is no fixed schema for documents. Documents may be heterogenous, so referring to an attribute that is not present is not extremly uncommon. Referring to a non-existing attribute (e.g. foo in query FOR doc IN collection RETURN doc.foo) will produce a value of null in AQL

In order to check for documents that have a specific attribute defined and exclude all instances where the value is null, there are the following options:

  • FILTER doc.value < -10 && doc.value != null
  • FILTER doc.value < -10 && doc.value > null
  • FILTER HAS(doc, "value") && doc.value < -10

@jsteemann jsteemann added 3 AQL Query language related 1 Question labels Mar 12, 2018
@dvstans
Copy link
Author
dvstans commented Mar 13, 2018

I understand; however, this behavior makes it very difficult to compose complex numeric queries against highly variable document collections. We have a specific use case that requires running user-composed, complex scientific queries against schema-less documents. It seems I would need to pre-process these queries, and place "!null" terms where needed, but I don't see that as practical. Is it possible to add a feature to disable or alter the NULL evaluation behavior per query? If not, I think we will likely have to use a different metadata store/search engine.

@jsteemann
Copy link
Contributor

It will not be easy to adjust the behavior of null for all the operators and AQL functions.
That would require changing a lot of code and tests, and I do not consider that practical to do soon.
I am sorry it does not fit your use case, but how about wrapping the user-composed filter conditions in some application code that automatically adds the missing checks for non-null where required?

@dvstans
Copy link
Author
dvstans commented Mar 13, 2018

Yes, that's what I meant by "pre-processing" the query. This would work, but since we're allowing user's to compose their own queries, this would require me to write some rather complex parsing code. I feel an existing search engine may be a better choice for time to solution with consistent/expected behavior plus a significant performance gain (which isn't a requirement for us now, but will be in the future). Thanks for the feedback, though!

@dothebart dothebart added the 2 Solved Resolution label Apr 16, 2018
@dothebart
Copy link
Contributor

Closing as answered. Don't hesitate to ask if more aspects are unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 Question 2 Solved Resolution 3 AQL Query language related
Projects
None yet
Development

No branches or pull requests

3 participants
0