8000 Optimization for index post-filtering (early pruning) (#1049) · arangodb/docs@28499ee · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 13, 2023. It is now read-only.

Commit 28499ee

Browse files
jsteemannSimran-B
andauthored
Optimization for index post-filtering (early pruning) (#1049)
* docs for arangodb/arangodb#16481 * Review Co-authored-by: Simran Spiller <simran@arangodb.com>
1 parent 92ec574 commit 28499ee

File tree

2 files changed

+131
-6
lines changed

2 files changed

+131
-6
lines changed

3.10/aql/execution-and-performance-optimizer.md

Lines changed: 100 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -785,17 +785,21 @@ in the optimizer pipeline.
785785

786786
### Additional optimizations applied
787787

788+
#### Scan-Only Optimization
789+
788790
If a query iterates over a collection (for filtering or counting) but does not need
789791
the actual document values later, the optimizer can apply a "scan-only" optimization
790-
for *EnumerateCollectionNode*s and *IndexNode*s. In this case, it will not build up
792+
for `EnumerateCollectionNode` and `IndexNode` node types. In this case, it will not build up
791793
a result with the document data at all, which may reduce work significantly.
792794
In case the document data is actually not needed later on, it may be sensible to remove
793795
it from query strings so the optimizer can apply the optimization.
794796

795-
If the optimization is applied, it will show up as "scan only" in an AQL
796-
query's execution plan for an *EnumerateCollectionNode* or an *IndexNode*.
797+
If the optimization is applied, it will show up as `scan only` in an AQL
798+
query's execution plan for an `EnumerateCollectionNode` or an `IndexNode`.
799+
800+
#### Index-Only Optimization
797801

798-
Additionally, the optimizer can apply an "index-only" optimization for AQL queries that
802+
The optimizer can apply an "index-only" optimization for AQL queries that
799803
can satisfy the retrieval of all required document attributes directly from an index.
800804

801805
This optimization will be triggered if an index is used
@@ -809,5 +813,95 @@ on in the query.
809813
The optimization is currently available for the following index types: primary,
810814
edge, and persistent.
811815

812-
If the optimization is applied, it will show up as "index only" in an AQL
813-
query's execution plan for an *IndexNode*.
816+
If the optimization is applied, it will show up as `index only` in an AQL
817+
query's execution plan for an `IndexNode`.
818+
819+
#### Filter Projections Optimizations
820+
821+
<small>Introduced: v3.10.0</small>
822+
823+
If an index is used that does not cover all required attributes for the query,
824+
but if it is followed by filter conditions that only access attributes that are
825+
part of the index, then an optimization is applied, to only fetch matching
826+
documents. "Part of the index" here means, that all attributes referred to in
827+
the post-filter conditions are contained in the `fields` or `storedValues`
828+
attributes of the index definition.
829+
830+
For example, the optimization is applied in the following case:
831+
- There is a persistent index on the attributes `[ "value1", "value2" ]`
832+
(in this order), or there is a persistent index on just `["value1"]` and
833+
with a `storedValues` definition of `["value2"]`.
834+
- There is a filter condition on `value1` that can use the index, and a filter
835+
condition on `value2` that cannot use the index (post-filter condition).
836+
837+
Example query:
838+
839+
```aql
840+
FOR doc IN collection
841+
FILTER doc.value1 == @value1 /* uses the index */
842+
FILTER ABS(doc.value2) != @value2 /* does not use the index */
843+
RETURN doc
844+
```
845+
846+
This query's execution plan looks as follows:
847+
848+
```aql
849+
Execution plan:
850+
Id NodeType Est. Comment
851+
1 SingletonNode 1 * ROOT
852+
8 IndexNode 0 - FOR doc IN collection /* persistent index scan (filter projections: `value2`) */ FILTER (ABS(doc.`value2`) != 2) /* early pruning */
853+
7 ReturnNode 0 - RETURN doc
854+
855+
Indexes used:
856+
By Name Type Collection Unique Sparse Cache Selectivity Fields Ranges
857+
8 idx_1737498319258648576 persistent collection false false false 99.96 % [ `value1`, `value2` ] (doc.`value1` == 1)
858+
```
859+
860+
The first filter condition is transformed to an index lookup, as you can tell
861+
from the `persistent index scan` comment and the `Indexes used` section that
862+
shows the range `` doc.`value` == 1 ``. The post-filter condition
863+
`FILTER ABS(doc.value2) != 2` can be recognized as such by the `early pruning`
864+
comment that follows it.
865+
866+
The `filter projections` mentioned in the above execution plan is an indicator
867+
of the optimization being triggered.
868+
869+
Instead of fetching the full documents from the storage engine for all index
870+
entries that matched the index lookup condition, only those that also satisfy
871+
the index lookup post-filter condition are fetched.
872+
If the post-filter condition filters out a lot of documents, this optimization
873+
can significantly speed up queries that produce large result sets from index
874+
lookups but filter many of the documents away with post-filter conditions.
875+
876+
Note that the optimization can also be combined with regular projections, e.g.
877+
for the following query that returns a specific attribute from the documents
878+
only:
879+
880+
```aql
881+
FOR doc IN collection
882+
FILTER doc.value1 == @value1 /* uses the index */
883+
FILTER ABS(doc.value2) != @value2 /* does not use the index */
884+
RETURN doc.value3
885+
```
886+
887+
That query's execution plan combines projections from the index for the
888+
post-filter condition (`filter projections`) as well as regular projections
889+
(`projections`) for the processing parts of the query that follow the
890+
post-filter condition:
891+
892+
```aql
893+
Execution plan:
894+
Id NodeType Est. Comment
895+
1 SingletonNode 1 * ROOT
896+
9 IndexNode 5000 - FOR doc IN collection /* persistent index scan (filter projections: `value2`) (projections: `value3`) */ FILTER (ABS(doc.`value2`) != 2) /* early pruning */
897+
7 CalculationNode 5000 - LET #5 = doc.`value3` /* attribute expression */ /* collections used: doc : collection */
898+
8 ReturnNode 5000 - RETURN #5
899+
900+
Indexes used:
901+
By Name Type Collection Unique Sparse Cache Selectivity Fields Ranges
902+
9 idx_1737498319258648576 persistent collection false false false 99.96 % [ `value1`, `value2` ] (doc.`value1` == 1)
903+
```
904+
905+
The optimization is most effective for queries in which many documents would
906+
be selected by the index lookup condition, but many are filtered out by the
907+
post-filter condition.

3.10/release-notes-new-features310.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,37 @@ Query Statistics:
138138
0 0 0 71000 689 / 11 10300 98304 0.15389
139139
```
140140

141+
### Index Lookup Optimization
142+
143+
ArangoDB 3.10 features a new optimization for index lookups that can help to
144+
speed up index accesses with post-filter conditions. The optimization is
145+
triggered if an index is used that does not cover all required attributes for
146+
the query, but the index lookup post-filter conditions only access attributes
147+
that are part of the index.
148+
149+
For example, if you have a collection with an index on `value1` and `value2`,
150+
a query like below can only partially utilize the index for the lookup:
151+
152+
```aql
153+
FOR doc IN collection
154+
FILTER doc.value1 == @value1 /* uses the index */
155+
FILTER ABS(doc.value2) != @value2 /* does not use the index */
156+
RETURN doc
157+
```
158+
159+
In this case, previous versions of ArangoDB always fetched the full documents
160+
from the storage engine for all index entries that matched the index lookup
161+
conditions. 3.10 will now only fetch the full documents from the storage engine
162+
for all index entries that matched the index lookup conditions, and that satisfy
163+
the index lookup post-filter conditions, too.
164+
165+
If the post-filter conditions filter out a lot of documents, this optimization
166+
can significantly speed up queries that produce large result sets from index
167+
lookups but filter many of the documents away with post-filter conditions.
168+
169+
See [Filter Projections Optimization](aql/execution-and-performance-optimizer.html#filter-projections-optimizations)
170+
for details.
171+
141172
### Lookahead for Multi-Dimensional Indexes
142173

143174
The multi-dimensional index type `zkd` (experimental) now supports an optional

0 commit comments

Comments
 (0)
0