-
Notifications
You must be signed in to change notification settings - Fork 533
fix: Handle max concurrent queries errors #20907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🚀 ClickHouse Cloud UI deployed to Vercel: |
6778728
to
56298c6
Compare
56298c6
to
10badf5
Compare
return nil | ||
} | ||
|
||
have := have.Get(want.Name) | ||
if have == nil { | ||
tableName := want.Name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part about re-using the tables changes state
@@ -103,23 +142,21 @@ func unsafeChanges(changes []schema.TableColumnChange) []schema.TableColumnChang | |||
} | |||
|
|||
func (c *Client) createTable(ctx context.Context, table *schema.Table, partition []spec.PartitionStrategy, orderBy []spec.OrderByStrategy) (err error) { | |||
c.logger.Debug().Str("table", table.Name).Msg("Table doesn't exist, creating") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was confusing as we called dropTable
then createTable
when doing force migration, so even if the table existed this log was printed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Congrats on finding and fixing this bug 👍
My only concern here is the retry logic.
Obviously, without more context there's no smartness possible to the retry, like we do on InsertSplitter, so we agree that it's not gonna get a lot smarter than "try running the same query again later".
However, retrying up to 10 times, 1s +/- .5s later in the case of "Too many simultaneous queries" sounds 💣 🤔 no? I think this will by default increase congestion. My proposed remediation isn't very smart...I'd 10x the delay or something like that.
The default strategy is backoff + random jitter, but I don't mind increasing the retry delay. Based on my tests with the current values and 2000 concurrent syncs (from the test) we hardly every retry more than twice |
retry.Attempts(5), | ||
retry.Delay(3 * time.Second), | ||
retry.MaxJitter(1 * time.Second), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to expose these variables in the spec? Not blocking, just thinking out loud.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure actually I think we can expose based on user feedback. Could be too much control. The default retry logic is backoff + random jitter so I think that should cover quite a lot of cases (especially on newer versions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one minor non-blocking suggestion
🤖 I have created a release *beep* *boop* --- ## [7.1.1](plugins-destination-clickhouse-v7.1.0...plugins-destination-clickhouse-v7.1.1) (2025-07-02) ### Bug Fixes * **deps:** Update golang.org/x/exp digest to b7579e2 ([#20935](#20935)) ([aac340d](aac340d)) * **deps:** Update module github.com/cloudquery/codegen to v0.3.29 ([#20947](#20947)) ([af179be](af179be)) * Handle max concurrent queries errors ([#20907](#20907)) ([92c2827](92c2827)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
This PR fixes an issue that's more evident on older ClickHouse versions (e.g. 22 that we don't support anymore) where
max_concurrent_queries
defaulted to100
. On newer versions the default is 1500.The result of hitting the limit is that some tables can be dropped unexpectedly, since
checkPartitionOrOrderByChanged
called fromcheckForced
herecloudquery/plugins/destination/clickhouse/client/migrate.go
Line 31 in a69aa8c
checkPartitionOrOrderByChanged
can return an error when called further down the line herecloudquery/plugins/destination/clickhouse/client/migrate.go
Line 149 in a69aa8c
Had to override the
max_concurrent_queries
configuration so using the docker compose file for that.The solution implemented in this PR has 2 parts:
connection.Exec
row.Scan
and so on, so I'd basically need to wrap all ClickHouse's driver interfaces.