Partitioned Data Manipulation Language (partitioned DML) is designed for the following types of bulk updates and deletes:
- Periodic cleanup and garbage collection. Examples are deleting old rows or
setting columns to
NULL
. - Backfilling new columns with default values. An example is using an
UPDATE
statement to set a new column's value toFalse
where it is currentlyNULL
.
Partitioned DML isn't suitable for small-scale transaction processing. If you want to run a statement on a few rows, use transactional DMLs with identifiable primary keys. For more information, see Using DML.
If you need to commit a large number of blind writes, but don't require an atomic transaction, you can bulk modify your Spanner tables using batch write. For more information, see Modify data using batch writes.
You can get insights on active partitioned DML queries and their progress from statistics tables in your Spanner database. For more information, see Active partitioned DMLs statistics.
DML and partitioned DML
Spanner supports two execution modes for DML statements:
DML, which is suitable for transaction processing. For more information, see Using DML.
Partitioned DML, which enables large-scale, database-wide operations with minimal impact on concurrent transaction processing by partitioning the key space and running the statement over partitions in separate, smaller-scoped transactions. For more information, see Using partitioned DML.
The following table highlights some of the differences between the two execution modes.
DML | Partitioned DML |
---|---|
Rows that don't match the WHERE clause might be locked. |
Only rows that match the WHERE clause are locked. |
Transaction size limits apply. | Spanner handles the transaction limits and per-transaction concurrency limits. |
Statements don't need to be idempotent. | A DML statement must be idempotent to ensure consistent results. |
A transaction can include multiple DML and SQL statements. | A partitioned transaction can include only one DML statement. |
There are no restrictions on complexity of statements. | Statements must be fully partitionable. |
You create read-write transactions in your client code. | Spanner creates the transactions. |
Partitionable and idempotent
When a partitioned DML statement runs, rows in one partition don't have access
to rows in other partitions, and you cannot choose how Spanner creates
the partitions. Partitioning ensures scalability, but it also means that
partitioned DML statements must be fully partitionable. That is, the
partitioned DML statement must be expressible as the union of a set of
statements, where each statement accesses a single row of the table and each
statement accesses no other tables. For example, a DML statement that accesses
multiple tables or performs a self-join is not partitionable. If the DML
statement is not partitionable, Spanner returns the error
BadUsage
.
These DML statements are fully partitionable, because each statement can be applied to a single row in the table:
UPDATE Singers SET LastName = NULL WHERE LastName = '';
DELETE FROM Albums WHERE MarketingBudget > 10000;
This DML statement is not fully partitionable, because it accesses multiple tables:
# Not fully partitionable
DELETE FROM Singers WHERE
SingerId NOT IN (SELECT SingerId FROM Concerts);
Spanner might execute a partitioned DML statement multiple times against some partitions due to network-level retries. As a result, a statement might be executed more than once against a row. The statement must therefore be idempotent to yield consistent results. A statement is idempotent if executing it multiple times against a single row leads to the same result.
This DML statement is idempotent:
UPDATE Singers SET MarketingBudget = 1000 WHERE true;
This DML statement is not idempotent:
UPDATE Singers SET MarketingBudget = 1.5 * MarketingBudget WHERE true;
Row locking
Spanner acquires a lock only if a row is a candidate for update or
deletion. This behavior is different from
DML execution, which might read-lock
rows that don't match the WHERE
clause.
Execution and transactions
Whether a DML statement is partitioned or not depends on the client library method that you choose for execution. Each client library provides separate methods for DML execution and Partitioned DML execution.
You can execute only one partitioned DML statement in a call to the client library method.
Spanner does not apply the partitioned DML statements atomically across the entire table. Spanner does, however, apply partitioned DML statements atomically across each partition.
Partitioned DML does not support commit or rollback. Spanner executes and applies the DML statement immediately.
- If you cancel the operation, Spanner cancels the executing partitions and doesn't start the remaining partitions. Spanner does not roll back any partitions that have already executed.
- If the execution of the statement causes an error, then execution stops
across all partitions and Spanner returns that error for the
entire operation. Some examples of errors are violations of data type
constraints, violations of
UNIQUE INDEX
, and violations ofON DELETE NO ACTION
. Depending on the point in time when the execution failed, the statement might have successfully run against some partitions, and might never have been run against other partitions.
If the partitioned DML statement succeeds, then Spanner ran the statement at least once against each partition of the key range.
Count of modified rows
A partitioned DML statement returns a lower bound on the number of modified rows. It might not be an exact count of the number of rows modified, because there is no guarantee that Spanner counts all the modified rows.
Transaction limits
Spanner creates the partitions and transactions that it needs to execute a partitioned DML statement. Transaction limits or per-transaction concurrency limits apply, but Spanner attempts to keep the transactions within the limits.
Spanner allows a maximum of 20,000 concurrent partitioned DML statements per database.
Unsupported features
Spanner does not support some features for partitioned DML:
INSERT
is not supported.- Google Cloud console: You can't execute partitioned DML statements in the Google Cloud console.
- Query plans and profiling: The Google Cloud CLI and the client libraries don't support query plans and profiling.
- Subqueries that read from another table, or a different row of the same table.
For complex scenarios, such as moving a table or transformations that require joins across tables, consider using the Dataflow connector.
Examples
The following code example updates the MarketingBudget
column of the Albums
table.
C++
You use the ExecutePartitionedDml()
function to execute a partitioned DML statement.
C#
You use the ExecutePartitionedUpdateAsync()
method to execute a partitioned DML statement.
Go
You use the PartitionedUpdate()
method to execute a partitioned DML statement.
Java
You use the executePartitionedUpdate()
method to execute a partitioned DML statement.
Node.js
You use the runPartitionedUpdate()
method to execute a partitioned DML statement.
PHP
You use the executePartitionedUpdate()
method to execute a partitioned DML statement.
Python
You use the execute_partitioned_dml()
method to execute a partitioned DML statement.
Ruby
You use the execute_partitioned_update()
method to execute a partitioned DML statement.
The following code example deletes rows from the Singers
table, based on the
SingerId
column.
C++
C#
Go
Java
Node.js
PHP
Python
Ruby
What's next?
Learn how to modify data Using DML.
Learn about Data Manipulation Language (DML) best practices.
To learn about the differences between DML and mutations, see Compare DML and Mutations
Consider using the Dataflow connector for other data transformation scenarios.