8000 (redshift): Cannot add distkey to existing table · Issue #26733 · aws/aws-cdk · GitHub
[go: up one dir, main page]

Skip to content
(redshift): Cannot add distkey to existing table #26733
@ghost

Description

Describe the bug

Adding distKey=true to a column where the table previously had no distribution key fails to deploy because it tries to create the table.

The diff:

[~] Custom::RedshiftDatabaseQuery staff_join_staff_roles-id/Resource/Resource staffjoinstaffrolesidFA762FF7 
 └─ [~] tableColumns
     └─ @@ -2,7 +2,7 @@
        [ ] {
        [ ]   "dataType": "varchar(36)",
        [ ]   "name": "staff_id",
        [-]   "distKey": false,
        [+]   "distKey": true,
        [ ]   "sortKey": false,
        [ ]   "id": "staff_id"
        [ ] },

The error:

3:45:11 PM | UPDATE_FAILED        | Custom::RedshiftDatabaseQuery               | staffjoinstaffrolesidFA762FF7
Received response status [FAILED] from custom resource. Message returned: Statement status was FAILED: ERROR: Relation "staff_join_staff_roles" already exist
s

Logs: /aws/lambda/dev-business-insights-QueryRedshiftDatabase3de5bea-fGJLJqA4DIkh

at waitForStatementComplete (/var/task/redshift-data.js:34:15)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async executeStatement (/var/task/redshift-data.js:18:5)
at async createTable (/var/task/table.js:47:5)
at async handler (/var/task/table.js:22:27) (RequestId: fd93e151-06ce-487f-8412-955a039bb9c3)

Expected Behavior

CDK should add a distribution key to the table.

Current Behavior

CDK tries to create the table from scratch.

Reproduction Steps

It's unfortunately difficult for me to create a runnable snippet due to the amount of proprietary code used to generate it, but I've been able to reproduce the error with various tables in our system. The basic steps are:

  1. Create a table with any number of columns, ensuring that
  • None are set as the distribution key.
  • Do not set the distribution style.
  1. Deploy
  2. Edit your code and set .distKey(true) for any single column
  3. Deploy -- it will fail

As best as I can figure it, our code does something like this:

List<Column> columns = List<Columns>.of(
    Column.builder()
          .name("foo")
          .dataType("VARCHAR(255)")
          .distKey(false)  // Change this to `true` on your 2nd deploy
          .sortKey(false)
          .build(),
    Column.builder()
          .name("bar")
          .dataType("VARCHAR(255)")
          .distKey(false)
          .sortKey(false)
          .build()
);

Table table = Table.Builder.create(databaseScope, "mytable-id")
    .tableName("mytable")
    .cluster(myDbCluster)
    .databaseName("mydatabase")
    .tableColumns(columns)
    .build();

Possible Solution

The current code issues a CREATE TABLE statement if the table either has no distkey and we're adding one, or had a distkey and we're removing it:

// aws-redshift-alpha/lib/private/database-query-provider/table.ts
if ((!oldDistKey && newDistKey ) || (oldDistKey && !newDistKey)) {
  return createTable(tableNamePrefix, tableNameSuffix, tableColumns, tableAndClusterProps);
} else if (oldDistKey !== newDistKey) {
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

The problem here is that the resulting query tries to create a new table with the same name, which Redshift will reject because the table already exists.

Redshift supports adding and removing distribution keys on existing tables (see docs here) so we should take advantage of that. I believe the following will work:

if (!oldDistKey && newDistKey) {
  // Table has no existing distribution key, add a new one
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE KEY DISTKEY ${newDistKey}`);
} else if (oldDistKey && !newDistKey) {
  // Table has a distribution key, remove and set to AUTO
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTSTYLE AUTO`);
} else if (oldDistKey !== newDistKey) {
  // Table has an existing distribution key, change it
  alterationStatements.push(`ALTER TABLE ${tableName} ALTER DISTKEY ${newDistKey}`);
}

Additional Information/Context

No response

CDK CLI Version

2.88.0

Framework Version

2.17.184

Node.js Version

20.5.1

OS

MacOS 13.5

Language

Java

Language Version

JDK 17

Other information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    @aws-cdk/aws-redshiftRelated to Amazon RedshiftbugThis issue is a bug.effort/smallSmall work item – less than a day of effortp1

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0