-
Notifications
You must be signed in to change notification settings - Fork 4.2k

Description
Describe the bug
Adding distKey=true
to a column where the table previously had no distribution key fails to deploy because it tries to create the table.
The diff:
[~] Custom::RedshiftDatabaseQuery staff_join_staff_roles-id/Resource/Resource staffjoinstaffrolesidFA762FF7
└─ [~] tableColumns
└─ @@ -2,7 +2,7 @@
[ ] {
[ ] "dataType": "varchar(36)",
[ ] "name": "staff_id",
[-] "distKey": false,
[+] "distKey": true,
[ ] "sortKey": false,
[ ] "id": "staff_id"
[ ] },
The error:
3:45:11 PM | UPDATE_FAILED | Custom::RedshiftDatabaseQuery | staffjoinstaffrolesidFA762FF7
Received response status [FAILED] from custom resource. Message returned: Statement status was FAILED: ERROR: Relation "staff_join_staff_roles" already exist
s
Logs: /aws/lambda/dev-business-insights-QueryRedshiftDatabase3de5bea-fGJLJqA4DIkh
at waitForStatementComplete (/var/task/redshift-data.js:34:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async executeStatement (/var/task/redshift-data.js:18:5)
at async createTable (/var/task/table.js:47:5)
at async handler (/var/task/table.js:22:27) (RequestId: fd93e151-06ce-487f-8412-955a039bb9c3)
Expected Behavior
CDK should add a distribution key to the table.
Current Behavior
CDK tries to create the table from scratch.
Reproduction Steps
It's unfortunately difficult for me to create a runnable snippet due to the amount of proprietary code used to generate it, but I've been able to reproduce the error with various tables in our system. The basic steps are:
- Create a table with any number of columns, ensuring that
- None are set as the distribution key.
- Do not set the distribution style.
- Deploy
- Edit your code and set
.distKey(true)
for any single column - Deploy -- it will fail
As best as I can figure it, our code does something like this:
List<Column> columns = List<Columns>.of(
Column.builder()
.name("foo")
.dataType("VARCHAR(255)")
.distKey(false) // Change this to `true` on your 2nd deploy
.sortKey(false)
.build(),
Column.builder()
.name("bar")
.dataType("VARCHAR(255)")
.distKey(false)
.sortKey(false)
.build()
);
Table table = Table.Builder.create(databaseScope, "mytable-id")
.tableName("mytable")
.cluster(myDbCluster)
.databaseName("mydatabase")
.tableColumns(columns)
.build();
Possible Solution
The current code issues a CREATE TABLE
statement if the table either has no distkey and we're adding one, or had a distkey and we're removing it:
// aws-redshift-alpha/lib/private/database-query-provider/table.ts
if ((!oldDistKey && newDistKey ) || (oldDistKey && !newDistKey)) {
return createTable(tableNamePrefix, tableNameSuffix, tableColumns, tableAndClusterProps);
} else if (oldDistKey !== newDistKey) {
alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}
The problem here is that the resulting query tries to create a new table with the same name, which Redshift will reject because the table already exists.
Redshift supports adding and removing distribution keys on existing tables (see docs here) so we should take advantage of that. I believe the following will work:
if (!oldDistKey && newDistKey) {
// Table has no existing distribution key, add a new one
alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE KEY DISTKEY ${newDistKey}`);
} else if (oldDistKey && !newDistKey) {
// Table has a distribution key, remove and set to AUTO
alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE AUTO`);
} else if (oldDistKey !== newDistKey) {
// Table has an existing distribution key, change it
alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}
Additional Information/Context
No response
CDK CLI Version
2.88.0
Framework Version
2.17.184
Node.js Version
20.5.1
OS
MacOS 13.5
Language
Java
Language Version
JDK 17
Other information
No response