Fixed geo point block loader slowness #136147

Kubik42 · 2025-10-07T23:55:36Z

This addresses #135891

TODO: write up explanation of what went wrong and his this PR fixes it

elasticsearchmachine · 2025-10-07T23:56:23Z

Hi @Kubik42, I've created a changelog YAML for you.

martijnvg · 2025-10-08T06:57:51Z

server/src/main/java/org/elasticsearch/index/mapper/GeoPointFieldMapper.java


        @Override
        public BlockLoader blockLoader(BlockLoaderContext blContext) {
-            if (blContext.fieldExtractPreference() == DOC_VALUES && hasDocValues()) {


Wouldn't a smaller change be to just change:

if ((blContext.fieldExtractPreference() == DOC_VALUES || isSyntheticSource)&& hasDocValues()) {

?

yes, but shouldn't we be taking the FieldExtractPreference = NONE into account as well? Also, we're falling back to loading values from _source when store = true, that seems like a bug as well.

Kubik42 · 2025-10-08T22:40:25Z

The reason for failing tests is here. Whenever there is no FieldExtractPreference (ie. FieldExtractPreference.NONE, just like in the SDH), geo_points are mapped to BytesRef. This means that when the block loader sanity checker runs, it expects the underlying element type to be BytesRef. However, since we've now started to rely on doc_values, the type is actually LONG.

Perhaps the bigger problem here is that NONE is being assigned when DOC_VALUES should be the one being assigned. This would normally not be an issue since field types generally use the same underlying types for storing values, regardless of whether they're using doc_values vs stored fields. However, geo_points are special and the way we store them changes. More specifically, if doc_values are available, we store them as Longs and when the stored flag is toggled, we store them as BytesRef; although technically its Strings, but more on that later.

Kubik42 · 2025-10-08T23:21:17Z

In my rabbit hole of understanding how BlockLoaders work, I found a somewhat of a bug thats in main. When a geo_point is specified with store = true and doc_values = false, we end up returning an imprecise result.

For example, the following command fails:

./gradlew ":x-pack:plugin:logsdb:yamlRestTest" --tests "org.elasticsearch.xpack.logsdb.LogsdbTestSuiteIT.test {yaml=/51_esql_synthetic_source/Simple from}" -Dtests.seed=76E814C77C112204 -Dtests.locale=xh -Dtests.timezone=Europe/Saratov -Druntime.java=25

with:

        Expected: "POINT (-74.00600004941225 40.712799984030426)"
             but: was "POINT (-74.006 40.7128)"
            at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
            at org.junit.Assert.assertThat(Assert.java:964)
            at org.junit.Assert.assertThat(Assert.java:930)
            at org.elasticsearch.test.rest.yaml.section.MatchAssertion.doAssert(MatchAssertion.java:100)
            at org.elasticsearch.test.rest.yaml.section.Assertion.execute(Assertion.java:68)
            at org.elasticsearch.test.rest.yaml.ESClientYamlSuiteTestCase.executeSection(ESClientYamlSuiteTestCase.java:522)
            ... 41 more

Based on what I've read in the code, this precision loss is expected. That being said, its not documented publicly.

Kubik42 · 2025-10-08T23:33:26Z

There is also a problem with using BlockStoredFieldsReader.BytesFromStringsBlockLoader(name()) even when the field is stored. This is despite the fact that during indexing, the StoredField is created with a String representation of the geo_point; code.

When block loading, I get the error: Unknown geometry type: 825699888, which comes from here.

My guess is that it has something to do with the geo_point being stored as a String, rather than being converted to WKT first. That said, even when I used WellKnownText.toWKT() I still received a similar error - Unknown geometry type: 1414416719.

For now, I've just followed what we do in GeoShapeWithDocValuesFieldMapper and changed the block loader to BlockStoredFieldsReader.BytesFromBytesRefsBlockLoader, which has worked.

This needs further discussion.

martijnvg · 2025-10-09T07:50:31Z

server/src/main/java/org/elasticsearch/index/mapper/GeoPointFieldMapper.java

        }
        if (store) {
-            context.doc().add(new StoredField(fieldType().name(), geometry.toString()));
+            context.doc()


I don't think we should be changing how geo points are stored as stored fields.

If compute engine is expecting a different type with NONE field extract preference, maybe we should just give that without falling back to source based block loader? By implementing a different block loader for this case that mimics what its synthetic source support does?

I think want to implement a block loader that return WKB binary format that is converted from numeric doc values. This is kind of what happens now with source based block loader for geoip field type (but then by synthesizing all fields).

Kubik42 · 2025-10-09T21:55:06Z

I did a little more investigating and found that the reason we end up with FieldExtractPreference.NONE is because we initialize FieldExtractExec here, where both docValuesAttributes and boundsAttributes are empty sets. This leads to the following check failing, which ultimately returns defaultPreference, which is FieldExtractPreference.NONE.

This got me thinking, why are we setting the preference to NONE for attributes with doc_values? Shouldn't it be DOC_VALUES? If we think about it, NONE implies we can load the values however we want, so we're not restricted to using doc_values for that. This made me notice that inside this function, the attribute itself contains a DataType, which in turns exposes hasDocValues(). So, ultimately, wouldn't it make more sense to just do:

if (docValuesAttributes.contains(attr) || attr.dataType().hasDocValues()) {
    return MappedFieldType.FieldExtractPreference.DOC_VALUES;
}

However, doing so causes some type inequality exceptions.

Nhat pointed out this code, which suggests that doc_values shouldn't be always be used. However, that conflicts with the definition of FieldExtractPreference.NONE, which states that we can use anything.

Kubik42 added >bug Team:StorageEngine :StorageEngine/Mapping The storage related side of mappings labels Oct 7, 2025

elasticsearchmachine added the v9.3.0 label Oct 7, 2025

Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from e7afa0b to 1f168f2 Compare October 8, 2025 03:10

martijnvg reviewed Oct 8, 2025

View reviewed changes

Fixed geo point block loader slowness

750ad28

Revert this commit

b586f4c

Kubik42 force-pushed the kubik-geo-point-block-loader-fix branch from 1f168f2 to b586f4c Compare October 9, 2025 01:08

[CI] Auto commit changes from spotless

e9ba832

martijnvg reviewed Oct 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixed geo point block loader slowness #136147

Fixed geo point block loader slowness #136147

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fixed geo point block loader slowness #136147

Are you sure you want to change the base?

Fixed geo point block loader slowness #136147

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!