(redshift): Cannot add distkey to existing table

Describe the bug

Adding distKey=true to a column where the table previously had no distribution key fails to deploy because it tries to create the table.

The diff:

[~] Custom::RedshiftDatabaseQuery staff_join_staff_roles-id/Resource/Resource staffjoinstaffrolesidFA762FF7 
 └─ [~] tableColumns
     └─ @@ -2,7 +2,7 @@
        [ ] {
        [ ]   "dataType": "varchar(36)",
        [ ]   "name": "staff_id",
        [-]   "distKey": false,
        [+]   "distKey": true,
        [ ]   "sortKey": false,
        [ ]   "id": "staff_id"
        [ ] },

The error:

3:45:11 PM | UPDATE_FAILED        | Custom::RedshiftDatabaseQuery               | staffjoinstaffrolesidFA762FF7
Received response status [FAILED] from custom resource. Message returned: Statement status was FAILED: ERROR: Relation "staff_join_staff_roles" already exist
s

Logs: /aws/lambda/dev-business-insights-QueryRedshiftDatabase3de5bea-fGJLJqA4DIkh

at waitForStatementComplete (/var/task/redshift-data.js:34:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async executeStatement (/var/task/redshift-data.js:18:5)
at async createTable (/var/task/table.js:47:5)
at async handler (/var/task/table.js:22:27) (RequestId: fd93e151-06ce-487f-8412-955a039bb9c3)

Expected Behavior

CDK should add a distribution key to the table.

Current Behavior

CDK tries to create the table from scratch.

Reproduction Steps

It's unfortunately difficult for me to create a runnable snippet due to the amount of proprietary code used to generate it, but I've been able to reproduce the error with various tables in our system. The basic steps are:

Create a table with any number of columns, ensuring that

None are set as the distribution key.
Do not set the distribution style.

Deploy
Edit your code and set .distKey(true) for any single column
Deploy -- it will fail

As best as I can figure it, our code does something like this:

List<Column> columns = List<Columns>.of(
    Column.builder()
          .name("foo")
          .dataType("VARCHAR(255)")
          .distKey(false)  // Change this to `true` on your 2nd deploy
          .sortKey(false)
          .build(),
    Column.builder()
          .name("bar")
          .dataType("VARCHAR(255)")
          .distKey(false)
          .sortKey(false)
          .build()
);

Table table = Table.Builder.create(databaseScope, "mytable-id")
    .tableName("mytable")
    .cluster(myDbCluster)
    .databaseName("mydatabase")
    .tableColumns(columns)
    .build();

Possible Solution

The current code issues a CREATE TABLE statement if the table either has no distkey and we're adding one, or had a distkey and we're removing it:

// aws-redshift-alpha/lib/private/database-query-provider/table.ts
if ((!oldDistKey && newDistKey ) || (oldDistKey && !newDistKey)) {
  return createTable(tableNamePrefix, tableNameSuffix, tableColumns, tableAndClusterProps);
} else if (oldDistKey !== newDistKey) {
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

The problem here is that the resulting query tries to create a new table with the same name, which Redshift will reject because the table already exists.

Redshift supports adding and removing distribution keys on existing tables (see docs here) so we should take advantage of that. I believe the following will work:

if (!oldDistKey && newDistKey) {
  // Table has no existing distribution key, add a new one
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE KEY DISTKEY ${newDistKey}`);
} else if (oldDistKey && !newDistKey) {
  // Table has a distribution key, remove and set to AUTO
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE AUTO`);
} else if (oldDistKey !== newDistKey) {
  // Table has an existing distribution key, change it
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

Additional Information/Context

No response

CDK CLI Version

2.88.0

Framework Version

2.17.184

Node.js Version

20.5.1

OS

MacOS 13.5

Language

Java

Language Version

JDK 17

Other information

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Description

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions