-
Notifications
You must be signed in to change notification settings - Fork 869
Open
Description
Hello,
Description of the issue
Running query Q1 below to check the behavior of geo2
index with legacyPolygons: true
on a test document (hence the uncommon combination of filters), a few problems appeared:
geo2
index i 8000 s selected overprimary
index even though there is aFILTER
equality clause on._key
(Q1). In this specific test case this saved me from having to add anindexHint
forgeo2
index, but this is surprising and probably not wanted in the general case- the test document is not retrieved when
LIMIT
clause is applied (Q1) - the test document is retrieved as expected when removing the
LIMIT
clause (Q2) but the execution time is significantly higher due to the high number of documents matching the geo filter - forcing the
primary
index using anindexHint
, the test document is retrieved as expected (Q3)
Expected result
primary
index is selected by the optimizer- even with
geo2
index andLIMIT
clause, the document is retrieved
Thank you,
Best regards,
Mathias
Queries
Q1: original query − AQL
LET small_geometry = {
type: "Polygon",
coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}
FOR q IN queries
FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
FILTER q._key == "11342402739"
LIMIT 1
RETURN {
type: "Point",
coordinates: [ q.geoloc.lon, q.geoloc.lat ]
}
Q2: without limit − AQL
LET small_geometry = {
type: "Polygon",
coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}
FOR q IN queries
FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
FILTER q._key == "11342402739"
RETURN {
type: "Point",
coordinates: [ q.geoloc.lon, q.geoloc.lat ]
}
Q3: primary index hint − AQL
LET small_geometry = {
type: "Polygon",
coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}
FOR q IN queries OPTIONS { indexHint: 'primary', forceIndexHint: true }
FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
FILTER q._key == "11342402739"
LIMIT 1
RETURN {
type: "Point",
coordinates: [ q.geoloc.lon, q.geoloc.lat ]
}
Explain
Q1: original query − explain
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
11 IndexNode 13188422 - FOR q IN queries /* geo index scan, index scan + document lookup (filter projections: `_key`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */ LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon` FILTER (q.`_key` == "11342402739") /* early pruning */ LIMIT 1 /* early reducing results */
8 LimitNode 1 - LIMIT 0, 1
9 CalculationNode ✓ 1 - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] } /* simple expression */
10 ReturnNode 1 - RETURN #4
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
11 idx_10070812794 geo queries false true false n/a [ `geoloc.lat`, `geoloc.lon` ] [ ] GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ])
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 5 move-filters-up-2 9 reduce-extraction-to-projection
2 move-filters-up 6 geo-index-optimizer 10 optimize-projections
3 remove-unnecessary-calculations 7 remove-unnecessary-calculations-2 11 async-prefetch
4 move-calculations-up-2 8 move-filters-into-enumerate
60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.02115
Q2: without limit − explain
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
10 IndexNode ✓ 13188433 - FOR q IN queries /* geo index scan, index scan + document lookup (filter projections: `_key`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */ LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon` FILTER (q.`_key` == "11342402739") /* early pruning */
8 CalculationNode ✓ 13188433 - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] } /* simple expression */
9 ReturnNode 13188433 - RETURN #4
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
10 idx_10070812794 geo queries false true false n/a [ `geoloc.lat`, `geoloc.lon` ] [ ] GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ])
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 5 move-filters-up-2 9 reduce-extraction-to-projection
2 move-filters-up 6 geo-index-optimizer 10 optimize-projections
3 remove-unnecessary-calculations 7 remove-unnecessary-calculations-2 11 async-prefetch
4 move-calculations-up-2 8 move-filters-into-enumerate
60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00116
Q3: primary index hint − explain
Execution plan:
Id NodeType Par Est. Comment
1 SingletonNode 1 * ROOT
11 IndexNode 1 - FOR q IN queries /* primary index scan, index scan + document lookup (filter projections: `geoloc`.`lat`, `geoloc`.`lon`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */ LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon` FILTER GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ]) /* early pruning */
8 LimitNode 1 - LIMIT 0, 1
9 CalculationNode ✓ 1 - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] } /* simple expression */
10 ReturnNode 1 - RETURN #4
Indexes used:
By Name Type Collection Unique Sparse Cache Selectivity Fields Stored values Ranges
11 primary primary queries true false false 100.00 % [ `_key` ] [ ] (q.`_key` == "11342402739")
Optimization rules applied:
Id Rule Name Id Rule Name Id Rule Name
1 move-calculations-up 5 move-filters-up-2 9 move-filters-into-enumerate
2 move-filters-up 6 use-indexes 10 reduce-extraction-to-projection
3 remove-unnecessary-calculations 7 remove-filter-covered-by-index 11 optimize-projections
4 move-calculations-up-2 8 remove-unnecessary-calculations-2 12 async-prefetch
Optimization rules with highest execution times:
RuleName Duration [s]
use-indexes 0.00026
60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00143
Setup
Version 3.12.5
Enterprise
Single physical server, Ubuntu 22.04, 256G RAM, SSD
Data
collection queries
with 1.3B docs.
geo2
index description:
{
"bestIndexedLevel": 17,
"fields": [
"geoloc.lat",
"geoloc.lon"
],
"figures": {
"memory": 22029898071,
"cacheInUse": false,
"cacheSize": 0,
"cacheUsage": 0
},
"geoJson": false,
"id": "queries/10070812794",
"legacyPolygons": true,
"maxNumCoverCells": 8,
"name": "idx_10070812794",
"sparse": true,
"type": "geo",
"unique": false,
"worstIndexedLevel": 4
}
test document 11342402739
:
{
"geoloc": {
"lon": 2.2274945983286045,
"lat": 48.84231988478629
},
"deleted": true
}
HugoGresseAntoineAA
Metadata
Metadata
Assignees
Labels
No labels