-
Notifications
You must be signed in to change notification settings - Fork 2.4k
EMQX-10891 route cleanup optimization #12196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EMQX-10891 route cleanup optimization #12196
Conversation
apps/emqx/src/emqx_router.erl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: it's two separate transactions now (mnesia:match_delete/2 is a transaction that can't be nested).
apps/emqx/src/emqx_router.erl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: both v1 and v2 cleanups are backward compatible at Mnesia level: older nodes that don't have Mnesia patch will commit the transaction correctly (thanks to the fact that Mnesia internally was ready for match_delete implementation).
Older Mria releases would fail on both core and replicant nodes (even though Mria ptotcol version is bumped and new replicant nodes will only connect to new cores), However, the failure will lead to shard restart and re-bootstrapping which will eventually bring all nodes to the consistent state.
28d0fb5 to
7650cbb
Compare
7650cbb to
ec1de6d
Compare
apps/emqx/src/emqx_router.erl
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: lists:foreach/2 is probably too silent.. 🤔 Though we eventually ignore return value of cleanup_routes/1 in emqx_router_helper, it may have sense to return ok/error instead of ok.
ec1de6d to
34cb426
Compare
34cb426 to
57f5e74
Compare
|
There are two more potential smaller optimizations:
|
57f5e74 to
c5964bf
Compare
It may lead to no nodes running the cleanup, if they happen to start the transaction concurrently. For the DS GC worker, we set retries to 1 to account for that race condition, but I'm not sure it'll always make one node win, in particular if the cluster is large enough. 🤔 We'd probably need leader election with consensus to address this more properly, I think. |
Thaks @thalesmg 🙌 Indeed, 0 retries is not an option. 🙈 1 retry definitely doesn't mean there is always only one winner, but it can probably have an affect. The first sleep retry time used by global is random value up to 250 ms and this cleanup transaction can take longer, at least in a highly loaded cluster. |
|
I'm going to merge and try to address the discussed concerns in a separate PR. |
Mria 0.8.1 release implements `mria:match_delete/2` function
This must be much more network efficient since both Mria and Mnesia
need only to replicate one op, e.g.: `{MatchPattern, clear_table}`
instead of replicating one per each key to be deleted.
c5964bf to
965ce5d
Compare
Fixes EMQX-10891 (partially for now)
Release version: v/e5.?
Summary
PR Checklist
Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:
changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.mdfilesChecklist for CI (.github/workflows) changes
changes/dir for user-facing artifacts update