8000 geo2 index selected by optimizer over primary index, misses document when LIMIT is applied · Issue #21919 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

geo2 index selected by optimizer over primary index, misses document when LIMIT is applied #21919

@matcho

Description

@matcho< 8000 /div>

Hello,

Description of the issue

Running query Q1 below to check the behavior of geo2 index with legacyPolygons: true on a test document (hence the uncommon combination of filters), a few problems appeared:

  1. geo2 index is selected over primary index even though there is a FILTER equality clause on ._key (Q1). In this specific test case this saved me from having to add an indexHint for geo2 index, but this is surprising and probably not wanted in the general case
  2. the test document is not retrieved when LIMIT clause is applied (Q1)
  3. the test document is retrieved as expected when removing the LIMIT clause (Q2) but the execution time is significantly higher due to the high number of documents matching the geo filter
  4. forcing the primary index using an indexHint, the test document is retrieved as expected (Q3)

Expected result

  • primary index is selected by the optimizer
  • even with geo2 index and LIMIT clause, the document is retrieved

Thank you,
Best regards,

Mathias

Queries

Q1: original query − AQL

LET small_geometry = {
	type: "Polygon",
	coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}

FOR q IN queries
	FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
	FILTER q._key == "11342402739"
	LIMIT 1
  RETURN {
  	type: "Point",
  	coordinates: [ q.geoloc.lon, q.geoloc.lat ]
  }

Q2: without limit − AQL

LET small_geometry = {
	type: "Polygon",
	coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}

FOR q IN queries
	FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
	FILTER q._key == "11342402739"
  RETURN {
  	type: "Point",
  	coordinates: [ q.geoloc.lon, q.geoloc.lat ]
  }

Q3: primary index hint − AQL

LET small_geometry = {
	type: "Polygon",
	coordinates: [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ]
}

FOR q IN queries OPTIONS { indexHint: 'primary', forceIndexHint: true }
	FILTER GEO_CONTAINS(small_geometry, [ q.geoloc.lon, q.geoloc.lat ])
	FILTER q._key == "11342402739"
	LIMIT 1
  RETURN {
  	type: "Point",
  	coordinates: [ q.geoloc.lon, q.geoloc.lat ]
  }

Explain

Q1: original query − explain

Execution plan:
 Id   NodeType          Par       Est.   Comment
  1   SingletonNode                  1   * ROOT 
 11   IndexNode               13188422     - FOR q IN queries   /* geo index scan, index scan + document lookup (filter projections: `_key`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */    LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon`   FILTER (q.`_key` == "11342402739")   /* early pruning */   LIMIT 1 /* early reducing results */   
  8   LimitNode                      1       - LIMIT 0, 1
  9   CalculationNode     ✓          1       - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] }   /* simple expression */
 10   ReturnNode                     1       - RETURN #4

Indexes used:
 By   Name              Type   Collection   Unique   Sparse   Cache   Selectivity   Fields                           Stored values   Ranges
 11   idx_10070812794   geo    queries      false    true     false           n/a   [ `geoloc.lat`, `geoloc.lon` ]   [  ]            GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ])

Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   move-calculations-up                       5   move-filters-up-2                          9   reduce-extraction-to-projection  
  2   move-filters-up                            6   geo-index-optimizer                       10   optimize-projections             
  3   remove-unnecessary-calculations            7   remove-unnecessary-calculations-2         11   async-prefetch                   
  4   move-calculations-up-2                     8   move-filters-into-enumerate      

60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.02115

Q2: without limit − explain

Execution plan:
 Id   NodeType          Par       Est.   Comment
  1   SingletonNode                  1   * ROOT 
 10   IndexNode           ✓   13188433     - FOR q IN queries   /* geo index scan, index scan + document lookup (filter projections: `_key`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */    LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon`   FILTER (q.`_key` == "11342402739")   /* early pruning */   
  8   CalculationNode     ✓   13188433       - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] }   /* simple expression */
  9   ReturnNode              13188433       - RETURN #4

Indexes used:
 By   Name              Type   Collection   Unique   Sparse   Cache   Selectivity   Fields                           Stored values   Ranges
 10   idx_10070812794   geo    queries      false    true     false           n/a   [ `geoloc.lat`, `geoloc.lon` ]   [  ]            GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ])

Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   move-calculations-up                       5   move-filters-up-2                          9   reduce-extraction-to-projection  
  2   move-filters-up                            6   geo-index-optimizer                       10   optimize-projections             
  3   remove-unnecessary-calculations            7   remove-unnecessary-calculations-2         11   async-prefetch                   
  4   move-calculations-up-2                     8   move-filters-into-enumerate      

60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00116

Q3: primary index hint − explain

Execution plan:
 Id   NodeType          Par   Est.   Comment
  1   SingletonNode              1   * ROOT 
 11   IndexNode                  1     - FOR q IN queries   /* primary index scan, index scan + document lookup (filter projections: `geoloc`.`lat`, `geoloc`.`lon`) (projections: `geoloc`.`lat`, `geoloc`.`lon`) */    LET #5 = q.`geoloc`.`lat`, #6 = q.`geoloc`.`lon`   FILTER GEO_CONTAINS({ "type" : "Polygon", "coordinates" : [ [ [ 2.2294454567920354, 48.842749907537495 ], [ 2.2259337340106526, 48.8433379513919 ], [ 2.225504292565205, 48.841482626767004 ], [ 2.2289813829725063, 48.8410860251179 ], [ 2.2294454567920354, 48.842749907537495 ] ] ] }, [ q.`geoloc`.`lon`, q.`geoloc`.`lat` ])   /* early pruning */   
  8   LimitNode                  1       - LIMIT 0, 1
  9   CalculationNode     ✓      1       - LET #4 = { "type" : "Point", "coordinates" : [ #6, #5 ] }   /* simple expression */
 10   ReturnNode                 1       - RETURN #4

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Cache   Selectivity   Fields       Stored values   Ranges
 11   primary   primary   queries      true     false    false      100.00 %   [ `_key` ]   [  ]            (q.`_key` == "11342402739")

Optimization rules applied:
 Id   Rule Name                                 Id   Rule Name                                 Id   Rule Name                        
  1   move-calculations-up                       5   move-filters-up-2                          9   move-filters-into-enumerate      
  2   move-filters-up                            6   use-indexes                               10   reduce-extraction-to-projection  
  3   remove-unnecessary-calculations            7   remove-filter-covered-by-index            11   optimize-projections             
  4   move-calculations-up-2                     8   remove-unnecessary-calculations-2         12   async-prefetch                   

Optimization rules with highest execution times:
 RuleName                                    Duration [s]
 use-indexes                                      0.00026

60 rule(s) executed, 1 plan(s) created, peak mem [b]: 0, exec time [s]: 0.00143

Setup

Version 3.12.5 Enterprise

Single physical server, Ubuntu 22.04, 256G RAM, SSD

Data

collection queries with 1.3B docs.

geo2 index description:

{
  "bestIndexedLevel": 17,
  "fields": [
    "geoloc.lat",
    "geoloc.lon"
  ],
  "figures": {
    "memory": 22029898071,
    "cacheInUse": false,
    "cacheSize": 0,
    "cacheUsage": 0
  },
  "geoJson": false,
  "id": "queries/10070812794",
  "legacyPolygons": true,
  "maxNumCoverCells": 8,
  "name": "idx_10070812794",
  "sparse": true,
  "type": "geo",
  "unique": false,
  "worstIndexedLevel": 4
}

test document 11342402739 :

{
  "geoloc": {
    "lon": 2.2274945983286045,
    "lat": 48.84231988478629
  },
  "deleted": true
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0