8000 Fix planner failures with overlapping mergejoin clauses in an outer j… · postgres/postgres@e7c02a5 · GitHub
[go: up one dir, main page]

Skip to content

Commit e7c02a5

Browse files
committed
Fix planner failures with overlapping mergejoin clauses in an outer join.
Given overlapping or partially redundant join clauses, for example t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x the planner's EquivalenceClass machinery will ordinarily refactor the clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't see multiple references to the same EquivalenceClass in a list of join equality clauses. However, if the join is outer, it's incorrect to derive a restriction clause on the outer side from the join conditions, so the clause refactoring does not happen and we end up with overlapping join conditions. The code that attempted to deal with such cases had several subtle bugs, which could result in "left and right pathkeys do not match in mergejoin" or "outer pathkeys do not match mergeclauses" planner errors, if the selected join plan type was a mergejoin. (It does not appear that any actually incorrect plan could have been emitted.) The core of the problem really was failure to recognize that the outer and inner relations' pathkeys have different relationships to the mergeclause list. A join's mergeclause list is constructed by reference to the outer pathkeys, so it will always be ordered the same as the outer pathkeys, but this cannot be presumed true for the inner pathkeys. If the inner sides of the mergeclauses contain multiple references to the same EquivalenceClass ({t2.x} in the above example) then a simplistic rendering of the required inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery recognizes that the second sort column is redundant and throws it away. The mergejoin planning code failed to account for that behavior properly. One error was to try to generate cut-down versions of the mergeclause list from cut-down versions of the inner pathkeys in the same way as the initial construction of the mergeclause list from the outer pathkeys was done; this could lead to choosing a mergeclause list that fails to match the outer pathkeys. The other problem was that the pathkey cross-checking code in create_mergejoin_plan treated the inner and outer pathkey lists identically, whereas actually the expectations for them must be different. That led to false "pathkeys do not match" failures in some cases, and in principle could have led to failure to detect bogus plans in other cases, though there is no indication that such bogus plans could be generated. Reported by Alexander Kuzmenkov, who also reviewed this patch. This has been broken for years (back to around 8.3 according to my testing), so back-patch to all supported branches. Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
1 parent 83fce67 commit e7c02a5

File tree

6 files changed

+322
-124
lines changed

6 files changed

+322
-124
lines changed

src/backend/optimizer/path/joinpath.c

Lines changed: 14 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -739,10 +739,10 @@ sort_inner_and_outer(PlannerInfo *root,
739739
outerkeys = all_pathkeys; /* no work at first one... */
740740

741741
/* Sort the mergeclauses into the corresponding ordering */
742-
cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
743-
outerkeys,
744-
true,
745-
extra->mergeclause_list);
742+
cur_mergeclauses =
743+
find_mergeclauses_for_outer_pathkeys(root,
744+
outerkeys,
745+
extra->mergeclause_list);
746746

747747
/* Should have used them all... */
748748
Assert(list_length(cur_mergeclauses) == list_length(extra->mergeclause_list));
@@ -987,10 +987,10 @@ match_unsorted_outer(PlannerInfo *root,
987987
continue;
988988

989989
/* Look for useful mergeclauses (if any) */
990-
mergeclauses = find_mergeclauses_for_pathkeys(root,
991-
outerpath->pathkeys,
992-
true,
993-
extra->mergeclause_list);
990+
mergeclauses =
991+
find_mergeclauses_for_outer_pathkeys(root,
992+
outerpath->pathkeys,
993+
extra->mergeclause_list);
994994

995995
/*
996996
* Done with this outer path if no chance for a mergejoin.
@@ -1110,10 +1110,9 @@ match_unsorted_outer(PlannerInfo *root,
11101110
if (sortkeycnt < num_sortkeys)
11111111
{
11121112
newclauses =
1113-
find_mergeclauses_for_pathkeys(root,
1114-
trialsortkeys,
1115-
false,
1116-
mergeclauses);
1113+
trim_mergeclauses_for_inner_pathkeys(root,
1114+
mergeclauses,
1115+
trialsortkeys);
11171116
Assert(newclauses != NIL);
1118< 8000 code>1117
}
11191118
else
@@ -1152,10 +1151,9 @@ match_unsorted_outer(PlannerInfo *root,
11521151
if (sortkeycnt < num_sortkeys)
11531152
{
11541153
newclauses =
1155-
find_mergeclauses_for_pathkeys(root,
1156-
trialsortkeys,
1157-
false,
1158-
mergeclauses);
1154+
trim_mergeclauses_for_inner_pathkeys(root,
1155+
mergeclauses,
1156+
trialsortkeys);
11591157
Assert(newclauses != NIL);
11601158
}
11611159
else

src/backend/optimizer/path/pathkeys.c

Lines changed: 121 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -941,29 +941,27 @@ update_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo)
941941
}
942942

943943
/*
944-
* find_mergeclauses_for_pathkeys
945-
* This routine attempts to find a set of mergeclauses that can be
946-
* used with a specified ordering for one of the input relations.
944+
* find_mergeclauses_for_outer_pathkeys
945+
* This routine attempts to find a list of mergeclauses that can be
946+
* used with a specified ordering for the join's outer relation.
947947
* If successful, it returns a list of mergeclauses.
948948
*
949-
* 'pathkeys' is a pathkeys list showing the ordering of an input path.
950-
* 'outer_keys' is TRUE if these keys are for the outer input path,
951-
* FALSE if for inner.
949+
* 'pathkeys' is a pathkeys list showing the ordering of an outer-rel path.
952950
* 'restrictinfos' is a list of mergejoinable restriction clauses for the
953-
* join relation being formed.
951+
* join relation being formed, in no particular order.
954952
*
955953
* The restrictinfos must be marked (via outer_is_left) to show which side
956954
* of each clause is associated with the current outer path. (See
957955
* select_mergejoin_clauses())
958956
*
959957
* The result is NIL if no merge can be done, else a maximal list of
960958
* usable mergeclauses (represented as a list of their restrictinfo nodes).
959+
* The list is ordered to match the pathkeys, as required for execution.
961960
*/
962961
List *
963-
find_mergeclauses_for_pathkeys(PlannerInfo *root,
964-
List *pathkeys,
965-
bool outer_keys,
966-
List *restrictinfos)
962+
find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
963+
List *pathkeys,
964+
List *restrictinfos)
967965
{
968966
List *mergeclauses = NIL;
969967
ListCell *i;
@@ -1004,32 +1002,29 @@ find_mergeclauses_for_pathkeys(PlannerInfo *root,
10041002
*
10051003
* It's possible that multiple matching clauses might have different
10061004
* ECs on the other side, in which case the order we put them into our
1007-
* result makes a difference in the pathkeys required for the other
1008-
* input path. However this routine hasn't got any info about which
1005+
* result makes a difference in the pathkeys required for the inner
1006+
* input rel. However this routine hasn't got any info about which
10091007
* order would be best, so we don't worry about that.
10101008
*
10111009
* It's also possible that the selected mergejoin clauses produce
1012-
* a noncanonical ordering of pathkeys for the other side, ie, we
1010+
* a noncanonical ordering of pathkeys for the inner side, ie, we
10131011
* might select clauses that reference b.v1, b.v2, b.v1 in that
10141012
* order. This is not harmful in itself, though it suggests that
1015-
* the clauses are partially redundant. Since it happens only with
1016-
* redundant query conditions, we don't bother to eliminate it.
1017-
* make_inner_pathkeys_for_merge() has to delete duplicates when
1018-
* it constructs the canonical pathkeys list, and we also have to
1019-
* deal with the case in create_mergejoin_plan().
1013+
* the clauses are partially redundant. Since the alternative is
1014+
* to omit mergejoin clauses and thereby possibly fail to generate a
1015+
* plan altogether, we live with it. make_inner_pathkeys_for_merge()
1016+
* has to delete duplicates when it constructs the inner pathkeys
1017+
* list, and we also have to deal with such cases specially in
1018+
* create_mergejoin_plan().
10201019
*----------
10211020
*/
10221021
foreach(j, restrictinfos)
10231022
{
10241023
RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
10251024
EquivalenceClass *clause_ec;
10261025

1027-
if (outer_keys)
1028-
clause_ec = rinfo->outer_is_left ?
1029-
rinfo->left_ec : rinfo->right_ec;
1030-
else
1031-
clause_ec = rinfo->outer_is_left ?
1032-
rinfo->right_ec : rinfo->left_ec;
1026+
clause_ec = rinfo->outer_is_left ?
1027+
rinfo->left_ec : rinfo->right_ec;
10331028
if (clause_ec == pathkey_ec)
10341029
matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
10351030
}
@@ -1233,8 +1228,8 @@ select_outer_pathkeys_for_merge(PlannerInfo *root,
12331228
* must be applied to an inner path to make it usable with the
12341229
* given mergeclauses.
12351230
*
1236-
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses
1237-
* that will be used in a merge join.
1231+
* 'mergeclauses' is a list of RestrictInfos for the mergejoin clauses
1232+
* that will be used in a merge join, in order.
12381233
* 'outer_pathkeys' are the already-known canonical pathkeys for the outer
12391234
* side of the join.
12401235
*
@@ -1311,8 +1306,13 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13111306
opathkey->pk_nulls_first);
13121307

13131308
/*
1314-
* Don't generate redundant pathkeys (can happen if multiple
1315-
* mergeclauses refer to same EC).
1309+
* Don't generate redundant pathkeys (which can happen if multiple
1310+
* mergeclauses refer to the same EC). Because we do this, the output
1311+
* pathkey list isn't necessarily ordered like the mergeclauses, which
1312+
* complicates life for create_mergejoin_plan(). But if we didn't,
1313+
* we'd have a noncanonical sort key list, which would be bad; for one
1314+
* reason, it certainly wouldn't match any available sort order for
1315+
* the input relation.
13161316
*/
13171317
if (!pathkey_is_redundant(pathkey, pathkeys))
13181318
pathkeys = lappend(pathkeys, pathkey);
@@ -1321,6 +1321,98 @@ make_inner_pathkeys_for_merge(PlannerInfo *root,
13211321
return pathkeys;
13221322
}
13231323

1324+
/*
1325+
* trim_mergeclauses_for_inner_pathkeys
1326+
* This routine trims a list of mergeclauses to include just those that
1327+
* work with a specified ordering for the join's inner relation.
1328+
*
1329+
* 'mergeclauses' is a list of RestrictInfos for mergejoin clauses for the
1330+
* join relation being formed, in an order known to work for the
1331+
* currently-considered sort ordering of the join's outer rel.
1332+
* 'pathkeys' is a pathkeys list showing the ordering of an inner-rel path;
1333+
* it should be equal to, or a truncation of, the result of
1334+
* make_inner_pathkeys_for_merge for these mergeclauses.
1335+
*
1336+
* What we return will be a prefix of the given mergeclauses list.
1337+
*
1338+
* We need this logic because make_inner_pathkeys_for_merge's result isn't
1339+
* necessarily in the same order as the mergeclauses. That means that if we
1340+
* consider an inner-rel pathkey list that is a truncation of that result,
1341+
* we might need to drop mergeclauses even though they match a surviving inner
1342+
* pathkey. This happens when they are to the right of a mergeclause that
1343+
* matches a removed inner pathkey.
1344+
*
1345+
* The mergeclauses must be marked (via outer_is_left) to show which side
1346+
* of each clause is associated with the current outer path. (See
1347+
* select_mergejoin_clauses())
1348+
*/
1349+
List *
1350+
trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
1351+
List *mergeclauses,
1352+
List *pathkeys)
1353+
{
1354+
List *new_mergeclauses = NIL;
1355+
PathKey *pathkey;
1356+
EquivalenceClass *pathkey_ec;
1357+
bool matched_pathkey;
1358+
ListCell *lip;
1359+
ListCell *i;
1360+
1361+
/* No pathkeys => no mergeclauses (though we don't expect this case) */
1362+
if (pathkeys == NIL)
1363+
return NIL;
1364+
/* Initialize to consider first pathkey */
1365+
lip = list_head(pathkeys);
1366+
pathkey = (PathKey *) lfirst(lip);
1367+
pathkey_ec = pathkey->pk_eclass;
1368+
lip = lnext(lip);
1369+
matched_pathkey = false;
1370+
1371+
/* Scan mergeclauses to see how many we can use */
1372+
foreach(i, mergeclauses)
1373+
{
1374+
RestrictInfo *rinfo = (RestrictInfo *) lfirst(i);
1375+
EquivalenceClass *clause_ec;
1376+
1377+
/* Assume we needn't do update_mergeclause_eclasses again here */
1378+
1379+
/* Check clause's inner-rel EC against current pathkey */
1380+
clause_ec = rinfo->outer_is_left ?
1381+
rinfo->right_ec : rinfo->left_ec;
1382+
1383+
/* If we don't have a match, attempt to advance to next pathkey */
1384+
if (clause_ec != pathkey_ec)
1385+
{
1386+
/* If we had no clauses matching this inner pathkey, must stop */
1387+
if (!matched_pathkey)
1388+
break;
1389+
1390+
/* Advance to next inner pathkey, if any */
1391+
if (lip == NULL)
1392+
break;
1393+
pathkey = (PathKey *) lfirst(lip);
1394+
pathkey_ec = pathkey->pk_eclass;
1395+
lip = lnext(lip);
1396+
matched_pathkey = false;
1397+
}
1398+
1399+
/* If mergeclause matches current inner pathkey, we can use it */
1400+
if (clause_ec == pathkey_ec)
1401+
{
1402+
new_mergeclauses = lappend(new_mergeclauses, rinfo);
1403+
matched_pathkey = true;
1404+
}
1405+
else
1406+
{
1407+
/* Else, no hope of adding any more mergeclauses */
1408+
break;
1409+
}
1410+
}
1411+
1412+
return new_mergeclauses;
1413+
}
1414+
1415+
13241416
/****************************************************************************
13251417
* PATHKEY USEFULNESS CHECKS
13261418
*

0 commit comments

Comments
 (0)
0