8000 Make row compares robust during nbtree array scans. · postgres/postgres@4cb889d · GitHub
[go: up one dir, main page]

Skip to content

Commit 4cb889d

Browse files
Make row compares robust during nbtree array scans.
Recent nbtree bugfix commit 5f4d98d added a special case to the code that sets up a page-level prefix of keys that are definitely satisfied by every tuple on the page: whenever _bt_set_startikey reached a row compare key, we'd refuse to apply the pstate.forcenonrequired behavior in scans where that usually happens (scans with a higher-order array key). That hack made the scan avoid essentially the same infinite cycling behavior that also affected nbtree scans with redundant keys (keys that preprocessing could not eliminate) prior to commit f09816a. There are now serious doubts about this row compare workaround. Testing has shown that a scan with a row compare key and an array key could still read the same leaf page twice (without the scan's direction changing), which isn't supposed to be possible following the SAOP enhancements added by Postgres 17 commit 5bf748b. Also, we still allowed a required row compare key to be used with forcenonrequired mode when its header key happened to be beyond the pstate.ikey set by _bt_set_startikey, which was complicated and brittle. The underlying problem was that row compares had inconsistent rules around how scans start (which keys can be used for initial positioning purposes) and how scans end (which keys can set continuescan=false). Quals with redundant keys that could not be eliminated by preprocessing also had that same quality to them prior to today's bugfix f09816a. It now seems prudent to bring row compare keys in line with the new charter for required keys, by making the start and end rules symmetric. This commit fixes two points of disagreement between _bt_first and _bt_check_rowcompare. Firstly, _bt_check_rowcompare was capable of ending the scan at the point where it needed to compare an ISNULL-marked row compare member that came immediately after a required row compare member. _bt_first now has symmetric handling for NULL row compares. Secondly, _bt_first had its own ideas about which keys were safe to use for initial positioning purposes. It could use fewer or more keys than _bt_check_rowcompare. _bt_first now uses the same requiredness markings as _bt_check_rowcompare for this. Now that _bt_first and _bt_check_rowcompare agree on how to start and end scans, we can get rid of the forcenonrequired special case, without any risk of infinite cycling. This approach also makes row compare keys behave more like regular scalar keys, particularly within _bt_first. Fixing these inconsistencies necessitates dealing with a related issue with the way that row compares were marked required by preprocessing: we didn't mark any lower-order row members required following 2016 bugfix commit a298a1e. That approach was over broad. The bug in question was actually an oversight in how _bt_check_rowcompare dealt with tuple NULL values that failed to satisfy a scan key marked required in the opposite scan direction (it was a bug in 2011 commits 6980f81 and 882368e, not a bug in 2006 commit 3a0a16c). Go back to marking row compare members as required using the original 2006 rules, and fix the 2016 bug in a more principled way: by limiting use of the "set continuescan=false with a key required in the opposite scan direction upon encountering a NULL tuple value" optimization to the first/most significant row member key. While it isn't safe to use an implied IS NOT NULL qualifier to end the scan when it comes from a required lower-order row compare member key, it _is_ generally safe for such a required member key to end the scan -- provided the key is marked required in the _current_ scan direction. This fixes what was arguably an oversight in either commit 5f4d98d or commit 8a51027. It is a direct follow-up to today's commit f09816a. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Heikki Linnakangas <heikki.linnakangas@iki.fi> Discussion: https://postgr.es/m/CAH2-Wz=pcijHL_mA0_TJ5LiTB28QpQ0cGtT-ccFV=KzuunNDDQ@mail.gmail.com Backpatch-through: 18
1 parent 7c365eb commit 4cb889d

File tree

5 files changed

+385
-202
lines changed

5 files changed

+385
-202
lines changed

src/backend/access/nbtree/nbtpreprocesskeys.c

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -792,12 +792,25 @@ _bt_mark_scankey_required(ScanKey skey)
792792
if (skey->sk_flags & SK_ROW_HEADER)
793793
{
794794
ScanKey subkey = (ScanKey) DatumGetPointer(skey->sk_argument);
795+
AttrNumber attno = skey->sk_attno;
795796

796797
/* First subkey should be same column/operator as the header */
797-
Assert(subkey->sk_flags & SK_ROW_MEMBER);
798-
Assert(subkey->sk_attno == skey->sk_attno);
798+
Assert(subkey->sk_attno == attno);
799799
Assert(subkey->sk_strategy == skey->sk_strategy);
800-
subkey->sk_flags |= addflags;
800+
801+
for (;;)
802+
{
803+
Assert(subkey->sk_flags & SK_ROW_MEMBER);
804+
if (subkey->sk_attno != attno)
805+
break; /* non-adjacent key, so not required */
806+
if (subkey->sk_strategy != skey->sk_strategy)
807+
break; /* wrong direction, so not required */
808+
subkey->sk_flags |= addflags;
809+
if (subkey->sk_flags & SK_ROW_END)
810+
break;
811+
subkey++;
812+
attno++;
813+
}
801814
}
802815
}
803816

src/backend/access/nbtree/nbtsearch.c

Lines changed: 143 additions & 102 deletions
Original file line numberDiff line numberDiff line change
@@ -1016,8 +1016,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
10161016
* traversing a lot of null entries at the start of the scan.
10171017
*
10181018
* In this loop, row-comparison keys are treated the same as keys on their
1019-
* first (leftmost) columns. We'll add on lower-order columns of the row
1020-
* comparison below, if possible.
1019+
* first (leftmost) columns. We'll add all lower-order columns of the row
1020+
* comparison that were marked required during preprocessing below.
10211021
*
10221022
* _bt_advance_array_keys needs to know exactly how we'll reposition the
10231023
* scan (should it opt to schedule another primitive index scan). It is
@@ -1261,139 +1261,179 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
12611261
Assert(keysz <= INDEX_MAX_KEYS);
12621262
for (int i = 0; i < keysz; i++)
12631263
{
1264-
ScanKey cur = startKeys[i];
1264+
ScanKey bkey = startKeys[i];
12651265

1266-
Assert(cur->sk_attno == i + 1);
1266+
Assert(bkey->sk_attno == i + 1);
12671267

1268-
if (cur->sk_flags & SK_ROW_HEADER)
1268+
if (bkey->sk_flags & SK_ROW_HEADER)
12691269
{
12701270
/*
12711271
* Row comparison header: look to the first row member instead
12721272
*/
1273-
ScanKey subkey = (ScanKey) DatumGetPointer(cur->sk_argument);
1273+
ScanKey subkey = (ScanKey) DatumGetPointer(bkey->sk_argument);
1274+
bool loosen_strat = false,
1275+
tighten_strat = false;
12741276

12751277
/*
12761278
* Cannot be a NULL in the first row member: _bt_preprocess_keys
12771279
* would've marked the qual as unsatisfiable, preventing us from
12781280
* ever getting this far
12791281
*/
12801282
Assert(subkey->sk_flags & SK_ROW_MEMBER);
1281-
Assert(subkey->sk_attno == cur->sk_attno);
1283+
Assert(subkey->sk_attno == bkey->sk_attno);
12821284
Assert(!(subkey->sk_flags & SK_ISNULL));
12831285

1286+
/*
1287+
* This is either a > or >= key (during backwards scans it is
1288+
* either < or <=) that was marked required during preprocessing.
1289+
* Later so->keyData[] keys can't have been marked required, so
1290+
* our row compare header key must be the final startKeys[] entry.
1291+
*/
1292+
Assert(subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD));
1293+
Assert(i == keysz - 1);
1294+
12841295
/*
12851296
* The member scankeys are already in insertion format (ie, they
12861297
* have sk_func = 3-way-comparison function)
12871298
*/
12881299
memcpy(inskey.scankeys + i, subkey, sizeof(ScanKeyData));
12891300

12901301
/*
1291-
* If the row comparison is the last positioning key we accepted,
1292-
* try to add additional keys from the lower-order row members.
1293-
* (If we accepted independent conditions on additional index
1294-
* columns, we use those instead --- doesn't seem worth trying to
1295-
* determine which is more restrictive.) Note that this is OK
1296-
* even if the row comparison is of ">" or "<" type, because the
1297-
* condition applied to all but the last row member is effectively
1298-
* ">=" or "<=", and so the extra keys don't break the positioning
1299-
* scheme. But, by the same token, if we aren't able to use all
1300-
* the row members, then the part of the row comparison that we
1301-
* did use has to be treated as just a ">=" or "<=" condition, and
1302-
* so we'd better adjust strat_total accordingly.
1302+
* Now look to later row compare members.
1303+
*
1304+
* If there's an "index attribute gap" between two row compare
1305+
* members, the second member won't have been marked required, and
1306+
* so can't be used as a starting boundary key here. The part of
1307+
* the row comparison that we do still use has to be treated as a
1308+
* ">=" or "<=" condition. For example, a qual "(a, c) > (1, 42)"
1309+
* with an omitted intervening index attribute "b" will use an
1310+
* insertion scan key "a >= 1". Even the first "a = 1" tuple on
1311+
* the leaf level might satisfy the row compare qual.
1312+
*
1313+
* We're able to use a _more_ restrictive strategy when we reach a
1314+
* NULL row compare member, since they're always unsatisfiable.
1315+
* For example, a qual "(a, b, c) >= (1, NULL, 77)" will use an
1316+
* insertion scan key "a > 1". All tuples where "a = 1" cannot
1317+
* possibly satisfy the row compare qual, so this is safe.
13031318
*/
1304-
if (i == keysz - 1)
1319+
Assert(!(subkey->sk_flags & SK_ROW_END));
1320+
for (;;)
13051321
{
1306-
bool used_all_subkeys = false;
1322+
subkey++;
1323+
Assert(subkey->sk_flags & SK_ROW_MEMBER);
13071324

1308-
Assert(!(subkey->sk_flags & SK_ROW_END));
1309-
for (;;)
1325+
if (subkey->sk_flags & SK_ISNULL)
13101326
{
1311-
subkey++;
1312-
Assert(subkey->sk_flags & SK_ROW_MEMBER);
1313-
if (subkey->sk_attno != keysz + 1)
1314-
break; /* out-of-sequence, can't use it */
1315-
if (subkey->sk_strategy != cur->sk_strategy)
1316-
break; /* wrong direction, can't use it */
1317-
if (subkey->sk_flags & SK_ISNULL)
1318-
break; /* can't use null keys */
1319-
Assert(keysz < INDEX_MAX_KEYS);
1320-
memcpy(inskey.scankeys + keysz, subkey,
1321-
sizeof(ScanKeyData));
1322-
keysz++;
1323-
if (subkey->sk_flags & SK_ROW_END)
1324-
{
1325-
used_all_subkeys = true;
1326-
break;
1327-
}
1327+
/*
1328+
* NULL member key, can only use earlier keys.
1329+
*
1330+
* We deliberately avoid checking if this key is marked
1331+
* required. All earlier keys are required, and this key
1332+
* is unsatisfiable either way, so we can't miss anything.
1333+
*/
1334+
tighten_strat = true;
1335+
break;
13281336
}
1329-
if (!used_all_subkeys)
1337+
1338+
if (!(subkey->sk_flags & (SK_BT_REQFWD | SK_BT_REQBKWD)))
13301339
{
1331-
switch (strat_total)
1332-
{
1333-
case BTLessStrategyNumber:
1334-
strat_total = BTLessEqualStrategyNumber;
1335-
break;
1336-
case BTGreaterStrategyNumber:
1337-
strat_total = BTGreaterEqualStrategyNumber;
1338-
break;
1339-
}
1340+
/* nonrequired member key, can only use earlier keys */
1341+
loosen_strat = true;
1342+
break;
13401343
}
1341-
break; /* done with outer loop */
1344+
1345+
Assert(subkey->sk_attno == keysz + 1);
1346+
Assert(subkey->sk_strategy == bkey->sk_strategy);
1347+
Assert(keysz < INDEX_MAX_KEYS);
1348+
1349+
memcpy(inskey.scankeys + keysz, subkey,
1350+
sizeof(ScanKeyData));
1351+
keysz++;
1352+
if (subkey->sk_flags & SK_ROW_END)
1353+
break;
13421354
}
1343-
}
1344-
else
1345-
{
1346-
/*
1347-
* Ordinary comparison key. Transform the search-style scan key
1348-
* to an insertion scan key by replacing the sk_func with the
1349-
* appropriate btree comparison function.
1350-
*
1351-
* If scankey operator is not a cross-type comparison, we can use
1352-
* the cached comparison function; otherwise gotta look it up in
1353-
* the catalogs. (That can't lead to infinite recursion, since no
1354-
* indexscan initiated by syscache lookup will use cross-data-type
1355-
* operators.)
1356-
*
1357-
* We support the convention that sk_subtype == InvalidOid means
1358-
* the opclass input type; this is a hack to simplify life for
1359-
* ScanKeyInit().
1360-
*/
1361-
if (cur->sk_subtype == rel->rd_opcintype[i] ||
1362-
cur->sk_subtype == InvalidOid)
1355+
Assert(!(loosen_strat && tighten_strat));
1356+
if (loosen_strat)
13631357
{
1364-
FmgrInfo *procinfo;
1365-
1366-
procinfo = index_getprocinfo(rel, cur->sk_attno, BTORDER_PROC);
1367-
ScanKeyEntryInitializeWithInfo(inskey.scankeys + i,
1368-
cur->sk_flags,
1369-
cur->sk_attno,
1370-
InvalidStrategy,
1371-
cur->sk_subtype,
1372-
cur->sk_collation,
1373-
procinfo,
1374-
cur->sk_argument);
1358+
/* Use less restrictive strategy (and fewer member keys) */
1359+
switch (strat_total)
1360+
{
1361+
case BTLessStrategyNumber:
1362+
strat_total = BTLessEqualStrategyNumber;
1363+
break;
1364+
case BTGreaterStrategyNumber:
1365+
strat_total = BTGreaterEqualStrategyNumber;
1366+
break;
1367+
}
13751368
}
1376-
else
1369+
if (tighten_strat)
13771370
{
1378-
RegProcedure cmp_proc;
1379-
1380-
cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
1381-
rel->rd_opcintype[i],
1382-
cur->sk_subtype,
1383-
BTORDER_PROC);
1384-
if (!RegProcedureIsValid(cmp_proc))
1385-
elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
1386-
BTORDER_PROC, rel->rd_opcintype[i], cur->sk_subtype,
1387-
cur->sk_attno, RelationGetRelationName(rel));
1388-
ScanKeyEntryInitialize(inskey.scankeys + i,
1389-
cur->sk_flags,
1390-
cur->sk_attno,
1391-
InvalidStrategy,
1392-
cur->sk_subtype,
1393-
cur->sk_collation,
1394-
cmp_proc,
1395-
cur->sk_argument);
1371+
/* Use more restrictive strategy (and fewer member keys) */
1372+
switch (strat_total)
1373+
{
1374+
case BTLessEqualStrategyNumber:
1375+
strat_total = BTLessStrategyNumber;
1376+
break;
1377+
case BTGreaterEqualStrategyNumber:
1378+
strat_total = BTGreaterStrategyNumber;
1379+
break;
1380+
}
13961381
}
1382+
1383+
/* done adding to inskey (row comparison keys always come last) */
1384+
break;
1385+
}
1386+
1387+
/*
1388+
* Ordinary comparison key/search-style key.
1389+
*
1390+
* Transform the search-style scan key to an insertion scan key by
1391+
* replacing the sk_func with the appropriate btree 3-way-comparison
1392+
* function.
1393+
*
1394+
* If scankey operator is not a cross-type comparison, we can use the
1395+
* cached comparison function; otherwise gotta look it up in the
1396+
* catalogs. (That can't lead to infinite recursion, since no
1397+
* indexscan initiated by syscache lookup will use cross-data-type
1398+
* operators.)
1399+
*
1400+
* We support the convention that sk_subtype == InvalidOid means the
1401+
* opclass input type; this hack simplifies life for ScanKeyInit().
1402+
*/
1403+
if (bkey->sk_subtype == rel->rd_opcintype[i] ||
1404+
bkey->sk_subtype == InvalidOid)
1405+
{
1406+
FmgrInfo *procinfo;
1407+
1408+
procinfo = index_getprocinfo(rel, bkey->sk_attno, BTORDER_PROC);
1409+
ScanKeyEntryInitializeWithInfo(inskey.scankeys + i,
1410+
bkey->sk_flags,
1411+
bkey->sk_attno,
1412+
InvalidStrategy,
1413+
bkey->sk_subtype,
1414+
bkey->sk_collation,
1415+
procinfo,
1416+
bkey->sk_argument);
1417+
}
1418+
else
1419+
{
1420+
RegProcedure cmp_proc;
1421+
1422+
cmp_proc = get_opfamily_proc(rel->rd_opfamily[i],
1423+
rel->rd_opcintype[i],
1424+
bkey->sk_subtype, BTORDER_PROC);
1425+
if (!RegProcedureIsValid(cmp_proc))
1426+
elog(ERROR, "missing support function %d(%u,%u) for attribute %d of index \"%s\"",
1427+
BTORDER_PROC, rel->rd_opcintype[i], bkey->sk_subtype,
1428+
bkey->sk_attno, RelationGetRelationName(rel));
1429+
ScanKeyEntryInitialize(inskey.scankeys + i,
1430+
bkey->sk_flags,
1431+
bkey->sk_attno,
1432+
InvalidStrategy,
1433+
bkey->sk_subtype,
1434+
bkey->sk_collation,
1435+
cmp_proc,
1436+
bkey->sk_argument);
13971437
}
13981438
}
13991439

@@ -1482,6 +1522,8 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
14821522

14831523
if (!BufferIsValid(so->currPos.buf))
14841524
{
1525+
Assert(!so->needPrimScan);
1526+
14851527
/*
14861528
* We only get here if the index is completely empty. Lock relation
14871529
* because nothing finer to lock exists. Without a buffer lock, it's
@@ -1500,7 +1542,6 @@ _bt_first(IndexScanDesc scan, ScanDirection dir)
15001542

15011543
if (!BufferIsValid(so->currPos.buf))
15021544
{
1503-
Assert(!so->needPrimScan);
15041545
_bt_parallel_done(scan);
15051546
return false;
15061547
}

0 commit comments

Comments
 (0)
0