8000 Indexes not used when boolean operation is not explicit · Issue #1413 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Indexes not used when boolean operation is not explicit #1413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
andreib1 opened this issue Jul 14, 2015 · 4 comments
Closed

Indexes not used when boolean operation is not explicit #1413

andreib1 opened this issue Jul 14, 2015 · 4 comments
Labels
3 AQL Query language related

Comments

@andreib1
Copy link

It seems that the query optimizer doesn't understand that a boolean variable is the equivalent of an equality operation, thereby ignoring valid indexes under the following circumstances.

I have a collection 'clients' with the following index:
indexes: [{type: 'hash', fields: ['online','userid'], minLength: 3, unique : false}]

(online is a boolean, and userid is a string)

when I call:

LET userId = "users/e8c7dd8f4d5cfecaf19620d21dd13d92d7c2a4b5"
FOR y IN clients
FILTER y.online == true && y.userid == userId
RETURN y._id`

the plan uses the correct index like this:

Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
8 IndexRangeNode 1 - FOR y IN clients /* hash index scan /
4 CalculationNode 1 - LET #2 = y.online == true && y.userid == > "users/e8c7dd8f4d5cfecaf19620d21dd13d92d7c2a4b5" /
simple expression / / collections used: y : clients /
5 FilterNode 1 - FILTER #2
6 CalculationNode 1 - LET #4 = y._id /
attribute expression / / collections used: y : clients /
7 ReturnNode 1 - RETURN #4
/

However when I call:

LET userId = "users/e8c7dd8f4d5cfecaf19620d21dd13d92d7c2a4b5"
FOR y IN clients
FILTER y.online && y.userid == userId
RETURN y._id`

The execution plan resorts to a full collection scan:

Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateCollectionNode 748 - FOR y IN clients /* full collection scan /
4 CalculationNode 748 - LET #2 = y.online && y.userid == "users/e8c7dd8f4d5cfecaf19620d21dd13d92d7c2a4b5" /
simple expression / / collections used: y : clients /
5 FilterNode 748 - FILTER #2
6 CalculationNode 748 - LET #4 = y._id /
attribute expression / / collections used: y : clients */
7 ReturnNode 748 - RETURN #4

@jsteemann
Copy link
Contributor

I can confirm that for a condition such as y.online no index will be used.
This is because y.online is not the same as y.online == true.

Consider this example:

// create a test collection and insert a few values into it
db._drop("test");
db._create("test");
var values = [ null, false, true, -1, 0, 1, 2, "", " ", "1", "foo", [ ], [ 0 ], [ 1 ] ];
values.forEach(function(value) { 
  db.test.insert({ value: value }); 
});

// now query the collection using a boolean condition
db._query("FOR doc IN idx FILTER doc.value SORT doc.value RETURN doc.value").toArray();

This will match the following values and not only the boolean true:

[ 
  true, 
  -1, 
  1, 
  2, 
  " ", 
  "1", 
  "foo", 
  [ ], 
  [ 
    0 
  ], 
  [ 
    1 
  ] 
]

The values that are excluded here are null, false, 0 and the empty string ("").

If you need to compare the value to just a boolean true, you should write this explicitly in the AQL query, and the index should be used then.

@jsteemann jsteemann added the 3 AQL Query language related label Jul 14, 2015
@andreib1
Copy link
Author

Thanks. I was seeing the problem only within the context of my current use case.
You explanation makes sense, as document property values can be null, or unconstrained by a model.

@CoDEmanX
Copy link
Contributor

If you do FILTER TO_BOOL(doc.value) == true, or more concisely, FILTER !!doc.value, no index is used either - although it could be, couldn't it?

@jsteemann
Copy link
Contributor

@CoDEmanX : I don't think so.
TO_BOOL(doc.value) will evaluate to true for lots of different values, e.g. null, true, -1, 1, 42, 42e-42, "foobar", [ ], [ 42 ], { } etc. I don't see a way to make the optimizer aware of this, especially if the index is a hash index and index values are stored unsorted. The same is true for !! doc.value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 AQL Query language related
Projects
None yet
Development

No branches or pull requests

3 participants
0