8000 Redesign initialization of partition routing structures · postgrespro/postgres@3f2393e · GitHub
[go: up one dir, main page]

Skip to content
  • Commit 3f2393e

    Browse files
    committed
    Redesign initialization of partition routing structures
    This speeds up write operations (INSERT, UPDATE, DELETE, COPY, as well as the future MERGE) on partitioned tables. This changes the setup for tuple routing so that it does far less work during the initial setup and pushes more work out to when partitions receive tuples. PartitionDispatchData structs for sub-partitioned tables are only created when a tuple gets routed through it. The possibly large arrays in the PartitionTupleRouting struct have largely been removed. The partitions[] array remains but now never contains any NULL gaps. Previously the NULLs had to be skipped during ExecCleanupTupleRouting(), which could add a large overhead to the cleanup when the number of partitions was large. The partitions[] array is allocated small to start with and only enlarged when we route tuples to enough partitions that it runs out of space. This allows us to keep simple single-row partition INSERTs running quickly. Redesign The arrays in PartitionTupleRouting which stored the tuple translation maps have now been removed. These have been moved out into a PartitionRoutingInfo struct which is an additional field in ResultRelInfo. The find_all_inheritors() call still remains by far the slowest part of ExecSetupPartitionTupleRouting(). This commit just removes the other slow parts. In passing also rename the tuple translation maps from being ParentToChild and ChildToParent to being RootToPartition and PartitionToRoot. The old names mislead you into thinking that a partition of some sub-partitioned table would translate to the rowtype of the sub-partitioned table rather than the root partitioned table. Authors: David Rowley and Amit Langote, heavily revised by Álvaro Herrera Testing help from Jesper Pedersen and Kato Sho. Discussion: https://postgr.es/m/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
    1 parent a387a3d commit 3f2393e

    File tree

    9 files changed

    +637
    -714
    lines changed

    9 files changed

    +637
    -714
    lines changed

    src/backend/commands/copy.c

    Lines changed: 24 additions & 62 deletions
    Original file line numberDiff line numberDiff line change
    @@ -2316,6 +2316,7 @@ CopyFrom(CopyState cstate)
    23162316
    bool *nulls;
    23172317
    ResultRelInfo *resultRelInfo;
    23182318
    ResultRelInfo *target_resultRelInfo;
    2319+
    ResultRelInfo *prevResultRelInfo = NULL;
    23192320
    EState *estate = CreateExecutorState(); /* for ExecConstraints() */
    23202321
    ModifyTableState *mtstate;
    23212322
    ExprContext *econtext;
    @@ -2331,7 +2332,6 @@ CopyFrom(CopyState cstate)
    23312332
    CopyInsertMethod insertMethod;
    23322333
    uint64 processed = 0;
    23332334
    int nBufferedTuples = 0;
    2334-
    int prev_leaf_part_index = -1;
    23352335
    bool has_before_insert_row_trig;
    23362336
    bool has_instead_insert_row_trig;
    23372337
    bool leafpart_use_multi_insert = false;
    @@ -2515,8 +2515,12 @@ CopyFrom(CopyState cstate)
    25152515
    /*
    25162516
    * If there are any triggers with transition tables on the named relation,
    25172517
    * we need to be prepared to capture transition tuples.
    2518+
    *
    2519+
    * Because partition tuple routing would like to know about whether
    2520+
    * transition capture is active, we also set it in mtstate, which is
    2521+
    * passed to ExecFindPartition() below.
    25182522
    */
    2519-
    cstate->transition_capture =
    2523+
    cstate->transition_capture = mtstate->mt_transition_capture =
    25202524
    MakeTransitionCaptureState(cstate->rel->trigdesc,
    25212525
    RelationGetRelid(cstate->rel),
    25222526
    CMD_INSERT);
    @@ -2526,19 +2530,8 @@ CopyFrom(CopyState cstate)
    25262530
    * CopyFrom tuple routing.
    25272531
    */
    25282532
    if (cstate->rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
    2529-
    {
    25302533
    proute = ExecSetupPartitionTupleRouting(NULL, cstate->rel);
    25312534

    2532-
    /*
    2533-
    * If we are capturing transition tuples, they may need to be
    2534-
    * converted from partition format back to partitioned table format
    2535-
    * (this is only ever necessary if a BEFORE trigger modifies the
    2536-
    * tuple).
    2537-
    */
    2538-
    if (cstate->transition_capture != NULL)
    2539-
    ExecSetupChildParentMapForLeaf(proute);
    2540-
    }
    2541-
    25422535
    /*
    25432536
    * It's more efficient to prepare a bunch of tuples for insertion, and
    25442537
    * insert them in one heap_multi_insert() call, than call heap_insert()
    @@ -2694,25 +2687,17 @@ CopyFrom(CopyState cstate)
    26942687
    /* Determine the partition to heap_insert the tuple into */
    26952688
    if (proute)
    26962689
    {
    2697-
    int leaf_part_index;
    26982690
    TupleConversionMap *map;
    26992691

    27002692
    /*
    2701-
    * Away we go ... If we end up not finding a partition after all,
    2702-
    * ExecFindPartition() does not return and errors out instead.
    2703-
    * Otherwise, the returned value is to be used as an index into
    2704-
    * arrays mt_partitions[] and mt_partition_tupconv_maps[] that
    2705-
    * will get us the ResultRelInfo and TupleConversionMap for the
    2706-
    * partition, respectively.
    2693+
    * Attempt to find a partition suitable for this tuple.
    2694+
    * ExecFindPartition() will raise an error if none can be found or
    2695+
    * if the found partition is not suitable for INSERTs.
    27072696
    */
    2708-
    leaf_part_index = ExecFindPartition(target_resultRelInfo,
    2709-
    proute->partition_dispatch_info,
    2710-
    slot,
    2711-
    estate);
    2712-
    Assert(leaf_part_index >= 0 &&
    2713-
    leaf_part_index < proute->num_partitions);
    2714-
    2715-
    if (prev_leaf_part_index != leaf_part_index)
    2697+
    resultRelInfo = ExecFindPartition(mtstate, target_resultRelInfo,
    2698+
    proute, slot, estate);
    2699+
    2700+
    if (prevResultRelInfo != resultRelInfo)
    27162701
    {
    27172702
    /* Check if we can multi-insert into this partition */
    27182703
    if (insertMethod == CIM_MULTI_CONDITIONAL)
    @@ -2725,12 +2710,9 @@ CopyFrom(CopyState cstate)
    27252710
    if (nBufferedTuples > 0)
    27262711
    {
    27272712
    ExprContext *swapcontext;
    2728-
    ResultRelInfo *presultRelInfo;
    2729-
    2730-
    presultRelInfo = proute->partitions[prev_leaf_part_index];
    27312713

    27322714
    CopyFromInsertBatch(cstate, estate, mycid, hi_options,
    2733-
    presultRelInfo, myslot, bistate,
    2715+
    prevResultRelInfo, myslot, bistate,
    27342716
    nBufferedTuples, bufferedTuples,
    27352717
    firstBufferedLineNo);
    27362718
    nBufferedTuples = 0;
    @@ -2787,21 +2769,6 @@ CopyFrom(CopyState cstate)
    27872769
    }
    27882770
    }
    27892771

    2790-
    /*
    2791-
    * Overwrite resultRelInfo with the corresponding partition's
    2792-
    * one.
    2793-
    */
    2794-
    resultRelInfo = proute->partitions[leaf_part_index];
    2795-
    if (unlikely(resultRelInfo == NULL))
    2796-
    {
    2797-
    resultRelInfo = ExecInitPartitionInfo(mtstate,
    2798-
    target_resultRelInfo,
    2799-
    proute, estate,
    2800-
    leaf_part_index);
    2801-
    proute->partitions[leaf_part_index] = resultRelInfo;
    2802-
    Assert(resultRelInfo != NULL);
    2803-
    }
    2804-
    28052772
    /* Determine which triggers exist on this partition */
    28062773
    has_before_insert_row_trig = (resultRelInfo->ri_TrigDesc &&
    28072774
    resultRelInfo->ri_TrigDesc->trig_insert_before_row);
    @@ -2827,7 +2794,7 @@ CopyFrom(CopyState cstate)
    28272794
    * buffer when the partition being inserted into changes.
    28282795
    */
    28292796
    ReleaseBulkInsertStatePin(bistate);
    2830-
    prev_leaf_part_index = leaf_part_index;
    2797+
    prevResultRelInfo = resultRelInfo;
    28312798
    }
    28322799

    28332800
    /*
    @@ -2837,7 +2804,7 @@ CopyFrom(CopyState cstate)
    28372804

    28382805
    /*
    28392806
    * If we're capturing transition tuples, we might need to convert
    2840-
    * from the partition rowtype to parent rowtype.
    2807+
    * from the partition rowtype to root rowtype.
    28412808
    */
    28422809
    if (cstate->transition_capture != NULL)
    28432810
    {
    @@ -2850,8 +2817,7 @@ CopyFrom(CopyState cstate)
    28502817
    */
    28512818
    cstate->transition_capture->tcs_original_insert_tuple = NULL;
    28522819
    cstate->transition_capture->tcs_map =
    2853-
    TupConvMapForLeaf(proute, target_resultRelInfo,
    2854-
    leaf_part_index);
    2820+
    resultRelInfo->ri_PartitionInfo->pi_PartitionToRootMap;
    28552821
    }
    28562822
    else
    28572823
    {
    @@ -2865,18 +2831,18 @@ CopyFrom(CopyState cstate)
    28652831
    }
    28662832

    28672833
    /*
    2868-
    * We might need to convert from the parent rowtype to the
    2869-
    * partition rowtype.
    2834+
    * We might need to convert from the root rowtype to the partition
    2835+
    * rowtype.
    28702836
    */
    2871-
    map = proute->parent_child_tupconv_maps[leaf_part_index];
    2837+
    map = resultRelInfo->ri_PartitionInfo->pi_RootToPartitionMap;
    28722838
    if (map != NULL)
    28732839
    {
    28742840
    TupleTableSlot *new_slot;
    28752841
    MemoryContext oldcontext;
    28762842

    2877-
    Assert(proute->partition_tuple_slots != NULL &&
    2878-
    proute->partition_tuple_slots[leaf_part_index] != NULL);
    2879-
    new_slot = proute->partition_tuple_slots[leaf_part_index];
    2843+
    new_slot = resultRelInfo->ri_PartitionInfo->pi_PartitionTupleSlot;
    2844+
    Assert(new_slot != NULL);
    2845+
    28802846
    slot = execute_attr_map_slot(map->attrMap, slot, new_slot);
    28812847

    28822848
    /*
    @@ -3021,12 +2987,8 @@ CopyFrom(CopyState cstate)
    30212987
    {
    30222988
    if (insertMethod == CIM_MULTI_CONDITIONAL)
    30232989
    {
    3024-
    ResultRelInfo *presultRelInfo;
    3025-
    3026-
    presultRelInfo = proute->partitions[prev_leaf_part_index];
    3027-
    30282990
    CopyFromInsertBatch(cstate, estate, mycid, hi_options,
    3029-
    presultRelInfo, myslot, bistate,
    2991+
    prevResultRelInfo, myslot, bistate,
    30302992
    nBufferedTuples, bufferedTuples,
    30312993
    firstBufferedLineNo);
    30322994
    }

    src/backend/executor/execMain.c

    Lines changed: 1 addition & 1 deletion
    Original file line numberDiff line numberDiff line change
    @@ -1345,7 +1345,7 @@ InitResultRelInfo(ResultRelInfo *resultRelInfo,
    13451345

    13461346
    resultRelInfo->ri_PartitionCheck = partition_check;
    13471347
    resultRelInfo->ri_PartitionRoot = partition_root;
    1348-
    resultRelInfo->ri_PartitionReadyForRouting = false;
    1348+
    resultRelInfo->ri_PartitionInfo = NULL; /* may be set later */
    13491349
    }
    13501350

    13511351
    /*

    0 commit comments

    Comments
     (0)
    0