8000 Disable abbreviated keys for string-sorting in non-C locales. · prmdeveloper/postgres@8aa6e97 · GitHub
[go: up one dir, main page]

Skip to content

Commit 8aa6e97

Browse files
committed
Disable abbreviated keys for string-sorting in non-C locales.
Unfortunately, every version of glibc thus far tested has bugs whereby strcoll() ordering does not match strxfrm() ordering as required by the standard. This can result in, for example, corrupted indexes. Disabling abbreviated keys in these cases slows down non-C-collation string sorting considerably, but there seems to be no practical alternative. Users who are confident that their libc implementations are solid in this regard can re-enable the optimization by compiling with TRUST_STRXFRM. Users who have built indexes using PostgreSQL 9.5 or PostgreSQL 9.5.1 should REINDEX if there is a possibility that they may have been affected by this problem. Report by Marc-Olaf Jaschke. Investigation mostly by Tom Lane, with help from Peter Geoghegan, Noah Misch, Stephen Frost, and me. Patch by me, reviewed by Peter Geoghegan and Tom Lane.
1 parent 1548c78 commit 8aa6e97

File tree

1 file changed

+23
-10
lines changed

1 file changed

+23
-10
lines changed

src/backend/utils/adt/varlena.c

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1806,17 +1806,30 @@ btsortsupport_worker(SortSupport ssup, Oid collid)
18061806
}
18071807

18081808
/*
1809-
* It's possible that there are platforms where the use of abbreviated
1810-
* keys should be disabled at compile time. Having only 4 byte datums
1811-
* could make worst-case performance drastically more likely, for example.
1812-
* Moreover, Darwin's strxfrm() implementations is known to not
1813-
* effectively concentrate a significant amount of entropy from the
1814-
* original string in earlier transformed blobs. It's possible that other
1815-
* supported platforms are similarly encumbered. However, even in those
1816-
* cases, the abbreviated keys optimization may win, and if it doesn't,
1817-
* the "abort abbreviation" code may rescue us. So, for now, we don't
1818-
* disable this anywhere on the basis of performance.
1809+
* Unfortunately, it seems that abbreviation for non-C collations is
1810+
* broken on many common platforms; testing of multiple versions of glibc
1811+
* reveals that, for many locales, strcoll() and strxfrm() do not return
1812+
* consistent results, which is fatal to this optimization. While no
1813+
* other libc other than Cygwin has so far been shown to have a problem,
1814+
* we take the conservative course of action for right now and disable
1815+
* this categorically. (Users who are certain this isn't a problem on
1816+
* their system can define TRUST_STRXFRM.)
1817+
*
1818+
* Even apart from the risk of broken locales, it's possible that there
1819+
* are platforms where the use of abbreviated keys should be disabled at
1820+
* compile time. Having only 4 byte datums could make worst-case
1821+
* performance drastically more likely, for example. Moreover, Darwin's
1822+
* strxfrm() implementations is known to not effectively concentrate a
1823+
* significant amount of entropy from the original string in earlier
1824+
* transformed blobs. It's possible that other supported platforms are
1825+
* similarly encumbered. So, if we ever get past disabling this
1826+
* categorically, we may still want or need to disable it for particular
1827+
* platforms.
18191828
*/
1829+
#ifndef TRUST_STRXFRM
1830+
if (!collate_c)
1831+
abbreviate = false;
1832+
#endif
18201833

18211834
/*
18221835
* If we're using abbreviated keys, or if we're using a locale-aware

0 commit comments

Comments
 (0)
0