8000 Fix pg_dump for hash partitioning on enum columns. · postgres/postgres@012ffb3 · GitHub
[go: up one dir, main page]

Skip to content

Commit 012ffb3

Browse files
committed
Fix pg_dump for hash partitioning on enum columns.
Hash partitioning on an enum is problematic because the hash codes are derived from the OIDs assigned to the enum values, which will almost certainly be different after a dump-and-reload than they were before. This means that some rows probably end up in different partitions than before, causing restore to fail because of partition constraint violations. (pg_upgrade dodges this problem by using hacks to force the enum values to keep the same OIDs, but that's not possible nor desirable for pg_dump.) Users can work around that by specifying --load-via-partition-root, but since that's a dump-time not restore-time decision, one might find out the need for it far too late. Instead, teach pg_dump to apply that option automatically when dealing with a partitioned table that has hash-on-enum partitioning. Also deal with a pre-existing issue for --load-via-partition-root mode: in a parallel restore, we try to TRUNCATE target tables just before loading them, in order to enable some backend optimizations. This is bad when using --load-via-partition-root because (a) we're likely to suffer deadlocks from restore jobs trying to restore rows into other partitions than they came from, and (b) if we miss getting a deadlock we might still lose data due to a TRUNCATE removing rows from some already-completed restore job. The fix for this is conceptually simple: just don't TRUNCATE if we're dealing with a --load-via-partition-root case. The tricky bit is for pg_restore to identify those cases. In dumps using COPY commands we can inspect each COPY command to see if it targets the nominal target table or some ancestor. However, in dumps using INSERT commands it's pretty impractical to examine the INSERTs in advance. To provide a solution for that going forward, modify pg_dump to mark TABLE DATA items that are using --load-via-partition-root with a comment. (This change also responds to a complaint from Robert Haas that the dump output for --load-via-partition-root is pretty confusing.) pg_restore checks for the special comment as well as checking the COPY command if present. This will fail to identify the combination of --load-via-partition-root and --inserts in pre-existing dump files, but that should be a pretty rare case in the field. If it does happen you will probably get a deadlock failure that you can work around by not using parallel restore, which is the same as before this bug fix. Having done this, there seems no remaining reason for the alarmism in the pg_dump man page about combining --load-via-partition-root with parallel restore, so remove that warning. Patch by me; thanks to Julien Rouhaud for review. Back-patch to v11 where hash partitioning was introduced. Discussion: https://postgr.es/m/1376149.1675268279@sss.pgh.pa.us
1 parent b520129 commit 012ffb3

File tree

7 files changed

+289
-53
lines changed

7 files changed

+289
-53
lines changed

doc/src/sgml/ref/pg_dump.sgml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -804,16 +804,6 @@ PostgreSQL documentation
804804
and the two systems have different definitions of the collation used
805805
to sort the partitioning column.
806806
</para>
807-
808-
<para>
809-
It is best not to use parallelism when restoring from an archive made
810-
with this option, because <application>pg_restore</application> will
811-
not know exactly which partition(s) a given archive data item will
812-
load data into. This could result in inefficiency due to lock
< 8000 code>813-
conflicts between parallel jobs, or perhaps even restore failures due
814-
to foreign key constraints being set up before all the relevant data
815-
is loaded.
816-
</para>
817807
</listitem>
818808
</varlistentry>
819809

doc/src/sgml/ref/pg_dumpall.sgml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -342,10 +342,6 @@ PostgreSQL documentation
342342
and the two systems have different definitions of the collation used
343343
to sort the partitioning column.
344344
</para>
345-
346-
<!-- Currently, we don't need pg_dump's warning about parallelism here,
347-
since parallel restore from a pg_dumpall script is impossible.
348-
-->
349345
</listitem>
350346
</varlistentry>
351347

src/bin/pg_dump/common.c

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -259,6 +259,10 @@ getSchemaData(Archive *fout, int *numTablesPtr)
259259
write_msg(NULL, "flagging inherited columns in subtables\n");
260260
flagInhAttrs(fout->dopt, tblinfo, numTables);
261261

262+
if (g_verbose)
263+
write_msg(NULL, "reading partitioning data");
264+
getPartitioningInfo(fout);
265+
262266
if (g_verbose)
263267
write_msg(NULL, "reading indexes\n");
264268
getIndexes(fout, tblinfo, numTables);
@@ -319,7 +323,6 @@ static void
319323
flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
320324
InhInfo *inhinfo, int numInherits)
321325
{
322-
DumpOptions *dopt = fout->dopt;
323326
int i,
324327
j;
325328

@@ -335,18 +338,18 @@ flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
335338
continue;
336339

337340
/*
338-
* Normally, we don't bother computing anything for non-target tables,
339-
* but if load-via-partition-root is specified, we gather information
340-
* on every partition in the system so that getRootTableInfo can trace
341-
* from any given to leaf partition all the way up to the root. (We
342-
* don't need to mark them as interesting for getTableAttrs, though.)
341+
* Normally, we don't bother computing anything for non-target tables.
342+
* However, we must find the parents of non-root partitioned tables in
343+
* any case, so that we can trace from leaf partitions up to the root
344+
* (in case a leaf is to be dumped but its parents are not). We need
345+
* not mark such parents interesting for getTableAttrs, though.
343346
*/
344347
if (!tblinfo[i].dobj.dump)
345348
{
346349
mark_parents = false;
347350

348-
if (!dopt->load_via_partition_root ||
349-
!tblinfo[i].ispartition)
351+
if (!(tblinfo[i].relkind == RELKIND_PARTITIONED_TABLE &&
352+
tblinfo[i].ispartition))
350353
find_parents = false;
351354
}
352355

src/bin/pg_dump/pg_backup_archiver.c

Lines changed: 64 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ static RestorePass _tocEntryRestorePass(TocEntry *te);
7676
static bool _tocEntryIsACL(TocEntry *te);
7777
static void _disableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
7878
static void _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te);
79+
static bool is_load_via_partition_root(TocEntry *te);
7980
static void buildTocEntryArrays(ArchiveHandle *AH);
8081
static void _moveBefore(ArchiveHandle *AH, TocEntry *pos, TocEntry *te);
8182
static int _discoverArchiveFormat(ArchiveHandle *AH);
@@ -864,6 +865,8 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
864865
}
865866
else
866867
{
868+
bool use_truncate;
869+
867870
_disableTriggersIfNecessary(AH, te);
868871

869872
/* Select owner and schema as necessary */
@@ -875,13 +878,24 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
875878

876879
/*
877880
* In parallel restore, if we created the table earlier in
878-
* the run then we wrap the COPY in a transaction and
879-
* precede it with a TRUNCATE. If archiving is not on
880-
* this prevents WAL-logging the COPY. This obtains a
881-
* speedup similar to that from using single_txn mode in
882-
* non-parallel restores.
881+
* this run (so that we know it is empty) and we are not
882+
* restoring a load-via-partition-root data item then we
883+
* wrap the COPY in a transaction and precede it with a
884+
* TRUNCATE. If wal_level is set to minimal this prevents
885+
* WAL-logging the COPY. This obtains a speedup similar
886+
* to that from using single_txn mode in non-parallel
887+
* restores.
888+
*
889+
* We mustn't do this for load-via-partition-root cases
890+
* because some data might get moved across partition
891+
* boundaries, risking deadlock and/or loss of previously
892+
* loaded data. (We assume that all partitions of a
893+
* partitioned table will be treated the same way.)
883894
*/
884-
if (is_parallel && te->created)
895+
use_truncate = is_parallel && te->created &&
896+
!is_load_via_partition_root(te);
897+
898+
if (use_truncate)
885899
{
886900
/*
887901
* Parallel restore is always talking directly to a
@@ -922,7 +936,7 @@ restore_toc_entry(ArchiveHandle *AH, TocEntry *te, bool is_parallel)
922936
AH->outputKind = OUTPUT_SQLCMDS;
923937

924938
/* close out the transaction started above */
925-
if (is_parallel && te->created)
939+
if (use_truncate)
926940
CommitTransaction(&AH->public);
927941

928942
_enableTriggersIfNecessary(AH, te);
@@ -1014,6 +1028,43 @@ _enableTriggersIfNecessary(ArchiveHandle *AH, TocEntry *te)
10141028
fmtQualifiedId(te->namespace, te->tag));
10151029
}
10161030

1031+
/*
1032+
* Detect whether a TABLE DATA TOC item is performing "load via partition
1033+
* root", that is the target table is an ancestor partition rather than the
1034+
* table the TOC item is nominally for.
1035+
*
1036+
* In newer archive files this can be detected by checking for a special
1037+
* comment placed in te->defn. In older files we have to fall back to seeing
1038+
* if the COPY statement targets the named table or some other one. This
1039+
* will not work for data dumped as INSERT commands, so we could give a false
1040+
* negative in that case; fortunately, that's a rarely-used option.
1041+
*/
1042+
static bool
1043+
is_load_via_partition_root(TocEntry *te)
1044+
{
1045+
if (te->defn &&
1046+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
1047+
return true;
1048+
if (te->copyStmt && *te->copyStmt)
1049+
{
1050+
PQExpBuffer copyStmt = createPQExpBuffer();
1051+
bool result;
1052+
1053+
/*
1054+
* Build the initial part of the COPY as it would appear if the
1055+
* nominal target table is the actual target. If we see anything
1056+
* else, it must be a load-via-partition-root case.
1057+
*/
1058+
appendPQExpBuffer(copyStmt, "COPY %s ",
1059+
fmtQualifiedId(te->namespace, te->tag));
1060+
result = strncmp(te->copyStmt, copyStmt->data, copyStmt->len) != 0;
1061+
destroyPQExpBuffer(copyStmt);
1062+
return result;
1063+
}
1064+
/* Assume it's not load-via-partition-root */
1065+
return false;
1066+
}
1067+
10171068
/*
10181069
* This is a routine that is part of the dumper interface, hence the 'Archive*' parameter.
10191070
*/
@@ -2970,8 +3021,12 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH)
29703021
res = res & ~REQ_DATA;
29713022
}
29723023

2973-
/* If there's no definition command, there's no schema component */
2974-
if (!te->defn || !te->defn[0])
3024+
/*
3025+
* If there's no definition command, there's no schema component. Treat
3026+
* "load via partition root" comments as not schema.
3027+
*/
3028+
if (!te->defn || !te->defn[0] ||
3029+
strncmp(te->defn, "-- load via partition root ", 27) == 0)
29753030
res = res & ~REQ_SCHEMA;
29763031

29773032
/*

src/bin/pg_dump/pg_dump.c

Lines changed: 131 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -286,6 +286,7 @@ static void appendReloptionsArrayAH(PQExpBuffer buffer, const char *reloptions,
286286
static char *get_synchronized_snapshot(Archive *fout);
287287
static void setupDumpWorker(Archive *AHX);
288288
static TableInfo *getRootTableInfo(TableInfo *tbinfo);
289+
static bool forcePartitionRootLoad(const TableInfo *tbinfo);
289290

290291

291292
int
@@ -1939,11 +1940,13 @@ dumpTableData_insert(Archive *fout, void *dcontext)
19391940
insertStmt = createPQExpBuffer();
19401941

19411942
/*
1942-
* When load-via-partition-root is set, get the root table
1943-
* name for the partition table, so that we can reload data
1944-
* through the root table.
1943+
* When load-via-partition-root is set or forced, get the root
1944+
* table name for the partition table, so that we can reload
1945+
* data through the root table.
19451946
*/
1946-
if (dopt->load_via_partition_root && tbinfo->ispartition)
1947+
if (tbinfo->ispartition &&
1948+
(dopt->load_via_partition_root ||
1949+
forcePartitionRootLoad(tbinfo)))
19471950
targettab = getRootTableInfo(tbinfo);
19481951
else
19491952
targettab = tbinfo;
@@ -2093,6 +2096,35 @@ getRootTableInfo(TableInfo *tbinfo)
20932096
return parentTbinfo;
20942097
}
20952098

2099+
/*
2100+
* forcePartitionRootLoad
2101+
* Check if we must force load_via_partition_root for this partition.
2102+
*
2103+
* This is required if any level of ancestral partitioned table has an
2104+
* unsafe partitioning scheme.
2105+
*/
2106+
static bool
2107+
forcePartitionRootLoad(const TableInfo *tbinfo)
2108+
{
2109+
TableInfo *parentTbinfo;
2110+
2111+
Assert(tbinfo->ispartition);
2112+
Assert(tbinfo->numParents == 1);
2113+
2114+
parentTbinfo = tbinfo->parents[0];
2115+
if (parentTbinfo->unsafe_partitions)
2116+
return true;
2117+
while (parentTbinfo->ispartition)
2118+
{
2119+
Assert(parentTbinfo->numParents == 1);
2120+
parentTbinfo = parentTbinfo->parents[0];
2121+
if (parentTbinfo->unsafe_partitions)
2122+
return true;
2123+
}
2124+
2125+
return false;
2126+
}
2127+
20962128
/*
20972129
* dumpTableData -
20982130
* dump the contents of a single table
@@ -2107,34 +2139,40 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
21072139
PQExpBuffer copyBuf = createPQExpBuffer();
21082140
PQExpBuffer clistBuf = createPQExpBuffer();
21092141
DataDumperPtr dumpFn;
2142+
char *tdDefn = "";
21102143
char *copyStmt;
21112144
const char *copyFrom;
21122145

21132146
/* We had better have loaded per-column details about this table */
21142147
Assert(tbinfo->interesting);
21152148

2149+
/*
2150+
* When load-via-partition-root is set or forced, get the root table name
2151+
* for the partition table, so that we can reload data through the root
2152+
* table. Then construct a comment to be inserted into the TOC entry's
2153+
* defn field, so that such cases can be identified reliably.
2154+
*/
2155+
if (tbinfo->ispartition &&
2156+
(dopt->load_via_partition_root ||
2157+
forcePartitionRootLoad(tbinfo)))
2158+
{
2159+
TableInfo *parentTbinfo;
2160+
2161+
parentTbinfo = getRootTableInfo(tbinfo);
2162+
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2163+
printfPQExpBuffer(copyBuf, "-- load via partition root %s",
2164+
copyFrom);
2165+
tdDefn = pg_strdup(copyBuf->data);
2166+
}
2167+
else
2168+
copyFrom = fmtQualifiedDumpable(tbinfo);
2169+
21162170
if (!dopt->dump_inserts)
21172171
{
21182172
/* Dump/restore using COPY */
21192173
dumpFn = dumpTableData_copy;
2120-
2121-
/*
2122-
* When load-via-partition-root is set, get the root table name for
2123-
* the partition table, so that we can reload data through the root
2124-
* table.
2125-
*/
2126-
if (dopt->load_via_partition_root && tbinfo->ispartition)
2127-
{
2128-
TableInfo *parentTbinfo;
2129-
2130-
parentTbinfo = getRootTableInfo(tbinfo);
2131-
copyFrom = fmtQualifiedDumpable(parentTbinfo);
2132-
}
2133-
else
2134-
copyFrom = fmtQualifiedDumpable(tbinfo);
2135-
21362174
/* must use 2 steps here 'cause fmtId is nonreentrant */
2137-
appendPQExpBuffer(copyBuf, "COPY %s ",
2175+
printfPQExpBuffer(copyBuf, "COPY %s ",
21382176
copyFrom);
21392177
appendPQExpBuffer(copyBuf, "%s %sFROM stdin;\n",
21402178
fmtCopyColumnList(tbinfo, clistBuf),
@@ -2158,7 +2196,7 @@ dumpTableData(Archive *fout, TableDataInfo *tdinfo)
21582196
tbinfo->dobj.name, tbinfo->dobj.namespace->dobj.name,
21592197
NULL, tbinfo->rolname,
21602198
false, "TABLE DATA", SECTION_DATA,
2161-
"", "", copyStmt,
2199+
tdDefn, "", copyStmt,
21622200
&(tbinfo->dobj.dumpId), 1,
21632201
dumpFn, tdinfo);
21642202

@@ -6796,6 +6834,77 @@ getInherits(Archive *fout, int *numInherits)
67966834
return inhinfo;
67976835
}
67986836

6837+
/*
6838+
* getPartitioningInfo
6839+
* get information about partitioning
6840+
*
6841+
* For the most part, we only collect partitioning info about tables we
6842+
* intend to dump. However, this function has to consider all partitioned
6843+
* tables in the database, because we need to know about parents of partitions
6844+
* we are going to dump even if the parents themselves won't be dumped.
6845+
*
6846+
* Specifically, what we need to know is whether each partitioned table
6847+
* has an "unsafe" partitioning scheme that requires us to force
6848+
* load-via-partition-root mode for its children. Currently the only case
6849+
* for which we force that is hash partitioning on enum columns, since the
6850+
* hash codes depend on enum value OIDs which won't be replicated across
6851+
* dump-and-reload. There are other cases in which load-via-partition-root
6852+
* might be necessary, but we expect users to cope with them.
6853+
*/
6854+
void
6855+
getPartitioningInfo(Archive *fout)
6856+
{
6857+
PQExpBuffer query;
6858+
PGresult *res;
6859+
int ntups;
6860+
int i;
6861+
6862+
/* hash partitioning didn't exist before v11 */
6863+
if (fout->remoteVersion < 110000)
6864+
return;
6865+
/* needn't bother if schema-only dump */
6866+
if (fout->dopt->schemaOnly)
6867+
return;
6868+
6869+
query = createPQExpBuffer();
6870+
6871+
/*
6872+
* Unsafe partitioning schemes are exactly those for which hash enum_ops
6873+
* appears among the partition opclasses. We needn't check partstrat.
6874+
*
6875+
* Note that this query may well retrieve info about tables we aren't
6876+
* going to dump and hence have no lock on. That's okay since we need not
6877+
* invoke any unsafe server-side functions.
6878+
*/
6879+
appendPQExpBufferStr(query,
6880+
"SELECT partrelid FROM pg_partitioned_table WHERE\n"
6881+
"(SELECT c.oid FROM pg_opclass c JOIN pg_am a "
6882+
"ON c.opcmethod = a.oid\n"
6883+
"WHERE opcname = 'enum_ops' "
6884+
"AND opcnamespace = 'pg_catalog'::regnamespace "
6885+
"AND amname = 'hash') = ANY(partclass)");
6886+
6887+
res = ExecuteSqlQuery(fout, query->data, PGRES_TUPLES_OK);
6888+
6889+
ntups = PQntuples(res);
6890+
6891+
for (i = 0; i < ntups; i++)
6892+
{
6893+
Oid tabrelid = atooid(PQgetvalue(res, i, 0));
6894+
TableInfo *tbinfo;
6895+
6896+
tbinfo = findTableByOid(tabrelid);
6897+
if (tbinfo == NULL)
6898+
exit_horribly(NULL, "failed sanity check, table OID %u appearing in pg_partitioned_table not found\n",
6899+
tabrelid);
6900+
tbinfo->unsafe_partitions = true;
6901+
}
6902+
6903+
PQclear(res);
6904+
6905+
destroyPQExpBuffer(query);
6906+
}
6907+
67996908
/*
68006909
* getIndexes
68016910
* get information about every index on a dumpable table

0 commit comments

Comments
 (0)
0