8000 Support INSERT ... ON CONFLICT {UPDATE | IGNORE} · petergeoghegan/postgres@30203a0 · GitHub
[go: up one dir, main page]

Skip to content

Commit 30203a0

Browse files
author
Peter Geoghegan
committed
Support INSERT ... ON CONFLICT {UPDATE | IGNORE}
This non-standard INSERT clause allows DML statement authors to specify that in the event of each of any of the tuples being inserted duplicating an existing tuple in terms of a value or set of values constrained by a unique index, an alternative path may be taken. The statement may alternatively IGNORE the tuple being inserted without raising an error, or go to UPDATE the existing tuple whose value is duplicated by a value within one single tuple proposed for insertion. The implementation loops until either an insert or an UPDATE/IGNORE occurs. No existing tuple may be affected more than once per INSERT. This is implemented using a new infrastructure called "speculative insertion". (The approach to "Value locking" presenting here follows design postgres#2, as described on https://wiki.postgresql.org/wiki/Value_locking). Alternatively, we may go to UPDATE, using the EvalPlanQual() mechanism to execute a special auxiliary plan. READ COMMITTED isolation level is permitted to UPDATE a tuple even where no version is visible to the command's MVCC snapshot. Similarly, any query predicate associated with the UPDATE portion of the new statement need only satisfy an already locked, conclusively committed and visible conflict tuple. When the predicate isn't satisfied, the tuple is still locked, which implies that at READ COMMITTED, a tuple may be locked without any version being visible to the command's MVCC snapshot. Users specify a single unique index to take the alternative path on, which is inferred from a set of user-supplied column names (or expressions). This is mandatory for the ON CONFLICT UPDATE variant, which should address concerns about spuriously taking an incorrect alternative ON CONFLICT path (i.e. the wrong unique index is used for arbitration of whether or not to take the alternative path) due to there being more than one would-be unique violation. Previous revisions of the patch didn't mandate this. However, we may still IGNORE based on the first would-be unique violation detected, on the assumption that it doesn't particularly matter where it originated from for that variant (iff the user didn't make a point of indicated his or her intent). The auxiliary ModifyTable plan used by the UPDATE portion of the new statement is not formally a subplan of its parent INSERT ModifyTable plan. Rather, it's an independently planned subquery, whose execution is tightly driven by its parent. Special auxiliary state pertaining to the auxiliary UPDATE is tracked by its parent through all stages of query execution. The optimizer imposes some restrictions on child auxiliary UPDATE plans, which make the plans comport with their parent to the extent required during the executor stage. One user-visible consequences of this is that the special auxiliary UPDATE query cannot have subselects within its targetlist or WHERE clause. UPDATEs may not reference any other table, and UPDATE FROM is disallowed. INSERT's RETURNING clause continues to only project tuples actually inserted.
1 parent c19f3d4 commit 30203a0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+1754
-111
lines changed

contrib/pg_stat_statements/pg_stat_statements.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2198,6 +2198,10 @@ JumbleQuery(pgssJumbleState *jstate, Query *query)
21982198
JumbleRangeTable(jstate, query->rtable);
21992199
JumbleExpr(jstate, (Node *) query->jointree);
22002200
JumbleExpr(jstate, (Node *) query->targetList);
2201+
APP_JUMB(query->specClause);
2202+
APP_JUMB(query->arbiterIndex);
2203+
if (query->onConflict)
2204+
JumbleQuery(jstate, (Query *) query->onConflict);
22012205
JumbleExpr(jstate, (Node *) query->returningList);
22022206
JumbleExpr(jstate, (Node *) query->groupClause);
22032207
JumbleExpr(jstate, query->havingQual);

contrib/postgres_fdw/postgres_fdw.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1167,6 +1167,11 @@ postgresPlanForeignModify(PlannerInfo *root,
11671167
List *returningList = NIL;
11681168
List *retrieved_attrs = NIL;
11691169

1170+
if (plan->spec != SPEC_NONE)
1171+
ereport(ERROR,
1172+
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
1173+
errmsg("INSERT...ON CONFLICT is not supported by postgres_fdw")));
1174+
11701175
initStringInfo(&sql);
11711176

11721177
/*

src/backend/access/heap/heapam.c

Lines changed: 75 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -2032,6 +2032,9 @@ FreeBulkInsertState(BulkInsertState bistate)
20322032
* This causes rows to be frozen, which is an MVCC violation and
20332033
* requires explicit options chosen by user.
20342034
*
2035+
* If HEAP_INSERT_SPECULATIVE is specified, the MyProc->specInsert fields
2036+
* are filled.
2037+
*
20352038
* Note that these options will be applied when inserting into the heap's
20362039
* TOAST table, too, if the tuple requires any out-of-line data.
20372040
*
@@ -2200,6 +2203,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
22002203

22012204
END_CRIT_SECTION();
22022205

2206+
/*
2207+
* Let others know that we speculatively inserted this tuple, before
2208+
* releasing the buffer lock.
2209+
*/
2210+
if (options & HEAP_INSERT_SPECULATIVE)
2211+
SetSpeculativeInsertion(relation->rd_node, &heaptup->t_self);
2212+
22032213
UnlockReleaseBuffer(buffer);
22042214
if (vmbuffer != InvalidBuffer)
22052215
ReleaseBuffer(vmbuffer);
@@ -2639,11 +2649,17 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
26392649
* (the last only for HeapTupleSelfUpdated, since we
26402650
* cannot obtain cmax from a combocid generated by another transaction).
26412651
* See comments for struct HeapUpdateFailureData for additional info.
2652+
*
2653+
* If 'killspeculative' is true, we're "super-deleting" a tuple we just
2654+
* inserted in the same command. Instead of the normal visibility checks, we
2655+
* check that the tuple was inserted by the current transaction and given
2656+
* command id. Also, instead of setting its xmax, we set xmin to invalid,
2657+
* making it immediately appear as dead to everyone.
26422658
*/
26432659
HTSU_Result
26442660
heap_delete(Relation relation, ItemPointer tid,
26452661
CommandId cid, Snapshot crosscheck, bool wait,
2646-
HeapUpdateFailureData *hufd)
2662+
HeapUpdateFailureData *hufd, bool killspeculative)
26472663
{
26482664
HTSU_Result result;
26492665
TransactionId xid = GetCurrentTransactionId();
@@ -2701,7 +2717,16 @@ heap_delete(Relation relation, ItemPointer tid,
27012717
tp.t_self = *tid;
27022718

27032719
l1:
2704-
result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
2720+
if (!killspeculative)
2721+
result = HeapTupleSatisfiesUpdate(&tp, cid, buffer);
2722+
else
2723+
{
2724+
if (tp.t_data->t_choice.t_heap.t_xmin != xid ||
2725+
tp.t_data->t_choice.t_heap.t_field3.t_cid != cid)
2726+
elog(ERROR, "attempted to kill a tuple inserted by another transaction or command");
2727+
result = HeapTupleMayBeUpdated;
2728+
}
2729+
27052730

27062731
if (result == HeapTupleInvisible)
27072732
{
@@ -2849,12 +2874,15 @@ heap_delete(Relation relation, ItemPointer tid,
28492874
* using our own TransactionId below, since some other backend could
28502875
* incorporate our XID into a MultiXact immediately afterwards.)
28512876
*/
2852-
MultiXactIdSetOldestMember();
2877+
if (!killspeculative)
2878+
{
2879+
MultiXactIdSetOldestMember();
28532880

2854-
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(tp.t_data),
2855-
tp.t_data->t_infomask, tp.t_data->t_infomask2,
2856-
xid, LockTupleExclusive, true,
2857-
&new_xmax, &new_infomask, &new_infomask2);
2881+
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(tp.t_data),
2882+
tp.t_data->t_infomask, tp.t_data->t_infomask2,
2883+
xid, LockTupleExclusive, true,
2884+
&new_xmax, &new_infomask, &new_infomask2);
2885+
}
28582886

28592887
START_CRIT_SECTION();
28602888

@@ -2881,8 +2909,21 @@ heap_delete(Relation relation, ItemPointer tid,
28812909
tp.t_data->t_infomask |= new_infomask;
28822910
tp.t_data->t_infomask2 |= new_infomask2;
28832911
HeapTupleHeaderClearHotUpdated(tp.t_data);
2884-
HeapTupleHeaderSetXmax(tp.t_data, new_xmax);
2885-
HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
2912+
/*
2913+
* When killing a speculatively-inserted tuple, we set xmin to invalid
2914+
* instead of setting xmax, to make the tuple clearly invisible to
2915+
* everyone. In particular, we want HeapTupleSatisfiesDirty() to regard
2916+
* the tuple as dead, so that another backend inserting a duplicate key
2917+
* value won't unnecessarily wait for our transaction to finish.
2918+
*/
2919+
if (killspeculative)
2920+
HeapTupleHeaderSetXmin(tp.t_data, InvalidTransactionId);
2921+
else
2922+
{
2923+
HeapTupleHeaderSetXmax(tp.t_data, new_xmax);
2924+
HeapTupleHeaderSetCmax(tp.t_data, cid, iscombo);
2925+
}
2926+
28862927
/* Make sure there is no forward chain link in t_ctid */
28872928
tp.t_data->t_ctid = tp.t_self;
28882929

@@ -2899,7 +2940,11 @@ heap_delete(Relation relation, ItemPointer tid,
28992940
if (RelationIsAccessibleInLogicalDecoding(relation))
29002941
log_heap_new_cid(relation, &tp);
29012942

2902-
xlrec.flags = all_visible_cleared ? XLOG_HEAP_ALL_VISIBLE_CLEARED : 0;
2943+
xlrec.flags = 0;
2944+
if (all_visible_cleared)
2945+
xlrec.flags |= XLOG_HEAP_ALL_VISIBLE_CLEARED;
2946+
if (killspeculative)
2947+
xlrec.flags |= XLOG_HEAP_KILLED_SPECULATIVE_TUPLE;
29032948
xlrec.infobits_set = compute_infobits(tp.t_data->t_infomask,
29042949
tp.t_data->t_infomask2);
29052950
xlrec.target.node = relation->rd_node;
@@ -3015,7 +3060,7 @@ simple_heap_delete(Relation relation, ItemPointer tid)
30153060
result = heap_delete(relation, tid,
30163061
GetCurrentCommandId(true), InvalidSnapshot,
30173062
true /* wait for commit */ ,
3018-
&hufd);
3063+
&hufd, false);
30193064
switch (result)
30203065
{
30213066
case HeapTupleSelfUpdated:
@@ -4101,15 +4146,16 @@ get_mxact_status_for_lock(LockTupleMode mode, bool is_update)
41014146
*
41024147
* Function result may be:
41034148
* HeapTupleMayBeUpdated: lock was successfully acquired
4149+
* HeapTupleInvisible: lock failed because tuple instantaneously invisible
41044150
* HeapTupleSelfUpdated: lock failed because tuple updated by self
41054151
* HeapTupleUpdated: lock failed because tuple updated by other xact
41064152
* HeapTupleWouldBlock: lock couldn't be acquired and wait_policy is skip
41074153
*
4108-
* In the failure cases, the routine fills *hufd with the tuple's t_ctid,
4109-
* t_xmax (resolving a possible MultiXact, if necessary), and t_cmax
4110-
* (the last only for HeapTupleSelfUpdated, since we
4111-
* cannot obtain cmax from a combocid generated by another transaction).
4112-
* See comments for struct HeapUpdateFailureData for additional info.
4154+
* In the failure cases other than HeapTupleInvisible, the routine fills *hufd
4155+
* with the tuple's t_ctid, t_xmax (resolving a possible MultiXact, if
4156+
* necessary), and t_cmax (the last only for HeapTupleSelfUpdated, since we
4157+
* cannot obtain cmax from a combocid generated by another transaction). See
4158+
* comments for struct HeapUpdateFailureData for additional info.
41134159
*
41144160
* See README.tuplock for a thorough explanation of this mechanism.
41154161
*/
@@ -4146,8 +4192,15 @@ heap_lock_tuple(Relation relation, HeapTuple tuple,
41464192

41474193
if (result == HeapTupleInvisible)
41484194
{
4149-
UnlockReleaseBuffer(*buffer);
4150-
elog(ERROR, "attempted to lock invisible tuple");
4195+
LockBuffer(*buffer, BUFFER_LOCK_UNLOCK);
4196+
4197+
/*
4198+
* This is possible, but only when locking a tuple for speculative
4199+
* insertion. We return this value here rather than throwing an error
4200+
* in order to give that case the opportunity to throw a more specific
4201+
* error.
4202+
*/
4203+
return HeapTupleInvisible;
41514204
}
41524205
else if (result == HeapTupleBeingUpdated)
41534206
{
@@ -7476,7 +7529,10 @@ heap_xlog_delete(XLogRecPtr lsn, XLogRecord *record)
74767529
HeapTupleHeaderClearHotUpdated(htup);
74777530
fix_infomask_from_infobits(xlrec->infobits_set,
74787531
&htup->t_infomask, &htup->t_infomask2);
7479-
HeapTupleHeaderSetXmax(htup, xlrec->xmax);
7532+
if (xlrec->flags & XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)
7533+
HeapTupleHeaderSetXmin(htup, InvalidTransactionId);
7534+
else
7535+
HeapTupleHeaderSetXmax(htup, xlrec->xmax);
74807536
HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
74817537

74827538
/* Mark the page as a candidate for pruning */

src/backend/access/nbtree/nbtinsert.c

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
#include "miscadmin.h"
2222
#include "storage/lmgr.h"
2323
#include "storage/predicate.h"
24+
#include "storage/procarray.h"
25+
#include "utils/inval.h"
2426
#include "utils/tqual.h"
2527

2628

@@ -50,7 +52,8 @@ static Buffer _bt_newroot(Relation rel, Buffer lbuf, Buffer rbuf);
5052
static TransactionId _bt_check_unique(Relation rel, IndexTuple itup,
5153
Relation heapRel, Buffer buf, OffsetNumber offset,
5254
ScanKey itup_scankey,
53-
IndexUniqueCheck checkUnique, bool *is_unique);
55+
IndexUniqueCheck checkUnique, bool *is_unique,
56+
uint32 *speculativeToken);
5457
static void _bt_findinsertloc(Relation rel,
5558
Buffer *bufptr,
5659
OffsetNumber *offsetptr,
@@ -159,16 +162,26 @@ _bt_doinsert(Relation rel, IndexTuple itup,
159162
if (checkUnique != UNIQUE_CHECK_NO)
160163
{
161164
TransactionId xwait;
165+
uint32 speculativeToken;
162166

163167
offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
164168
xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey,
165-
checkUnique, &is_unique);
169+
checkUnique, &is_unique, &speculativeToken);
166170

167171
if (TransactionIdIsValid(xwait))
168172
{
169173
/* Have to wait for the other guy ... */
170174
_bt_relbuf(rel, buf);
171-
XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
175+
/*
176+
* If it's a speculative insertion, wait for it to finish (ie.
177+
* to go ahead with the insertion, or kill the tuple). Otherwise
178+
* wait for the transaction to finish as usual.
179+
*/
180+
if (speculativeToken)
181+
SpeculativeInsertionWait(xwait, speculativeToken);
182+
else
183+
XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
184+
172185
/* start over... */
173186
_bt_freestack(stack);
174187
goto top;
@@ -210,9 +223,12 @@ _bt_doinsert(Relation rel, IndexTuple itup,
210223
* also point to end-of-page, which means that the first tuple to check
211224
* is the first tuple on the next page.
212225
*
213-
* Returns InvalidTransactionId if there is no conflict, else an xact ID
214-
* we must wait for to see if it commits a conflicting tuple. If an actual
215-
* conflict is detected, no return --- just ereport().
226+
* Returns InvalidTransactionId if there is no conflict, else an xact ID we
227+
* must wait for to see if it commits a conflicting tuple. If an actual
228+
* conflict is detected, no return --- just ereport(). If an xact ID is
229+
* returned, and the conflicting tuple still has a speculative insertion in
230+
* progress, *speculativeToken is set to non-zero, and the caller can wait for
231+
* the verdict on the insertion using SpeculativeInsertionWait().
216232
*
217233
* However, if checkUnique == UNIQUE_CHECK_PARTIAL, we always return
218234
* InvalidTransactionId because we don't want to wait. In this case we
@@ -222,7 +238,8 @@ _bt_doinsert(Relation rel, IndexTuple itup,
222238
static TransactionId
223239
_bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
224240
Buffer buf, OffsetNumber offset, ScanKey itup_scankey,
225-
IndexUniqueCheck checkUnique, bool *is_unique)
241+
IndexUniqueCheck checkUnique, bool *is_unique,
242+
uint32 *speculativeToken)
226243
{
227244
TupleDesc itupdesc = RelationGetDescr(rel);
228245
int natts = rel->rd_rel->relnatts;
@@ -339,6 +356,7 @@ _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
339356
if (nbuf != InvalidBuffer)
340357
_bt_relbuf(rel, nbuf);
341358
/* Tell _bt_doinsert to wait... */
359+
*speculativeToken = SnapshotDirty.speculativeToken;
342360
return xwait;
343361
}
344362

src/backend/catalog/index.c

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1644,8 +1644,50 @@ BuildIndexInfo(Relation index)
16441644
ii->ii_ExclusionStrats = NULL;
16451645
}
16461646

1647+
/*
1648+
* fetch info for checking unique constraints. (this is currently only used
1649+
* by ExecCheckIndexConstraints(), for INSERT ... ON CONFLICT UPDATE, which
1650+
* must support "speculative insertion". In regular insertions, the index
1651+
* AM handles the unique check itself. Might make sense to do this lazily,
1652+
* only when needed)
1653+
*/
1654+
if (indexStruct->indisunique)
1655+
{
1656+
int ncols = index->rd_rel->relnatts;
1657+
1658+
if (index->rd_rel->relam != BTREE_AM_OID)
1659+
elog(ERROR, "only b-tree indexes are supported for foreign keys");
1660+
1661+
ii->ii_UniqueOps = (Oid *) palloc(sizeof(Oid) * ncols);
1662+
ii->ii_UniqueProcs = (Oid *) palloc(sizeof(Oid) * ncols);
1663+
ii->ii_UniqueStrats = (uint16 *) palloc(sizeof(uint16) * ncols);
1664+
1665+
/*
1666+
* We have to look up the operator's strategy number. This
1667+
* provides a cross-check that the operator does match the index.
1668+
*/
1669+
/* We need the func OIDs and strategy numbers too */
1670+
for (i = 0; i < ncols; i++)
1671+
{
1672+
ii->ii_UniqueStrats[i] = BTEqualStrategyNumber;
1673+
ii->ii_UniqueOps[i] =
1674+
get_opfamily_member(index->rd_opfamily[i],
1675+
index->rd_opcintype[i],
1676+
index->rd_opcintype[i],
1677+
ii->ii_UniqueStrats[i]);
1678+
ii->ii_UniqueProcs[i] = get_opcode(ii->ii_UniqueOps[i]);
1679+
}
1680+
ii->ii_Unique = true;
1681+
}
1682+
else
1683+
{
1684+
ii->ii_UniqueOps = NULL;
1685+
ii->ii_UniqueProcs = NULL;
1686+
ii->ii_UniqueStrats = NULL;
1687+
ii->ii_Unique = false;
1688+
}
1689+
16471690
/* other info */
1648-
ii->ii_Unique = indexStruct->indisunique;
16491691
ii->ii_ReadyForInserts = IndexIsReady(indexStruct);
16501692

16511693
/* initialize index-build state to default */
@@ -2575,10 +2617,10 @@ IndexCheckExclusion(Relation heapRelation,
25752617
/*
25762618
* Check that this tuple has no conflicts.
25772619
*/
2578-
check_exclusion_constraint(heapRelation,
2620+
check_exclusion_or_unique_constraint(heapRelation,
25792621
indexRelation, indexInfo,
25802622
&(heapTuple->t_self), values, isnull,
2581-
estate, true, false);
2623+
estate, true, false, true, NULL);
25822624
}
25832625

25842626
heap_endscan(scan);

src/backend/commands/constraint.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -170,9 +170,9 @@ unique_key_recheck(PG_FUNCTION_ARGS)
170170
* For exclusion constraints we just do the normal check, but now it's
171171
* okay to throw error.
172172
*/
173-
check_exclusion_constraint(trigdata->tg_relation, indexRel, indexInfo,
173+
check_exclusion_or_unique_constraint(trigdata->tg_relation, indexRel, indexInfo,
174174
&(new_row->t_self), values, isnull,
175-
estate, false, false);
175+
estate, false, false, true, NULL);
176176
}
177177

178178
/*

src/backend/commands/copy.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2397,7 +2397,8 @@ CopyFrom(CopyState cstate)
23972397

23982398
if (resultRelInfo->ri_NumIndices > 0)
23992399
recheckIndexes = ExecInsertIndexTuples(slot, &(tuple->t_self),
2400-
estate);
2400+
estate, false,
2401+
InvalidOid);
24012402

24022403
/* AFTER ROW INSERT Triggers */
24032404
ExecARInsertTriggers(estate, resultRelInfo, tuple,
@@ -2504,7 +2505,7 @@ CopyFromInsertBatch(CopyState cstate, EState *estate, CommandId mycid,
25042505
ExecStoreTuple(bufferedTuples[i], myslot, InvalidBuffer, false);
25052506
recheckIndexes =
25062507
ExecInsertIndexTuples(myslot, &(bufferedTuples[i]->t_self),
2507-
estate);
2508+
estate, false, InvalidOid);
25082509
ExecARInsertTriggers(estate, resultRelInfo,
25092510
bufferedTuples[i],
25102511
recheckIndexes);

0 commit comments

Comments
 (0)
0