EMQX-10891 route cleanup optimization #12196

SergeTupchiy · 2023-12-18T19:32:51Z

Fixes EMQX-10891 (partially for now)

Release version: v/e5.?

Summary

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

Added tests for the changes
Added property-based tests for code which performs user input validation
Changed lines covered in coverage report
Change log has been added to changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md files
For internal contributor: there is a jira ticket to track this change
Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

If changed package build workflow, pass this action (manual trigger)
Change log has been added to changes/ dir for user-facing artifacts update

SergeTupchiy · 2023-12-18T19:33:53Z

apps/emqx/src/emqx_router.erl

Note: it's two separate transactions now (mnesia:match_delete/2 is a transaction that can't be nested).

SergeTupchiy · 2023-12-18T19:38:06Z

apps/emqx/src/emqx_router.erl

Note: both v1 and v2 cleanups are backward compatible at Mnesia level: older nodes that don't have Mnesia patch will commit the transaction correctly (thanks to the fact that Mnesia internally was ready for match_delete implementation).
Older Mria releases would fail on both core and replicant nodes (even though Mria ptotcol version is bumped and new replicant nodes will only connect to new cores), However, the failure will lead to shard restart and re-bootstrapping which will eventually bring all nodes to the consistent state.

SergeTupchiy · 2023-12-29T18:11:30Z

apps/emqx/src/emqx_router.erl

Note: lists:foreach/2 is probably too silent.. 🤔 Though we eventually ignore return value of cleanup_routes/1 in emqx_router_helper, it may have sense to return ok/error instead of ok.

SergeTupchiy · 2024-01-09T19:16:24Z

There are two more potential smaller optimizations:

currently emqx_router_helper attempts to cleanup routes twice (because it receives two nodedown messages, at least when a replicant is down), more details about it in another PR thread: feat(routing): add route sync process pool #12237 (comment)
every core node attempts to cleanup routes (strictly one after another, thanks to the global lock, but each node tries to acquire the lock indefinitely, since retries defaults to infinity): https://github.com/emqx/emqx/blob/master/apps/emqx/src/emqx_router_helper.erl#L154
Each call to mnesia:match_delete/2 is a transaction with full table lock and it will cause full table scan on every node each time, so probably it makes sense to minimize its usage.
It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.
This is probably less fault tolerant if the lock winner somehow fails to cleanup routes, but we can either retry cleanup on fail, or set retries slightly higher than 0, e.g. to number of core nodes / 2 (however, in this case each can still win the lock and repeat cleanup if faster than sleep time between retries).

thalesmg · 2024-01-09T20:58:06Z

It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.

It may lead to no nodes running the cleanup, if they happen to start the transaction concurrently.

For the DS GC worker, we set retries to 1 to account for that race condition, but I'm not sure it'll always make one node win, in particular if the cluster is large enough. 🤔

We'd probably need leader election with consensus to address this more properly, I think.

SergeTupchiy · 2024-01-10T10:34:47Z

It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.

It may lead to no nodes running the cleanup, if they happen to start the transaction concurrently.

For the DS GC worker, we set retries to 1 to account for that race condition, but I'm not sure it'll always make one node win, in particular if the cluster is large enough. 🤔

We'd probably need leader election with consensus to address this more properly, I think.

Thaks @thalesmg 🙌 Indeed, 0 retries is not an option. 🙈 1 retry definitely doesn't mean there is always only one winner, but it can probably have an affect. The first sleep retry time used by global is random value up to 250 ms and this cleanup transaction can take longer, at least in a highly loaded cluster.
Anyway, it's probably not absolutely worth changing it. 🤔

SergeTupchiy · 2024-01-10T14:16:21Z

I'm going to merge and try to address the discussed concerns in a separate PR.
The fact that nodes eagerly cleanup routes of the down node also increases the risk of some late cleanup of new routes occurring after the down node is back and adds these new routes faster than some of the other node does a subsequent cleanup..

Mria 0.8.1 release implements `mria:match_delete/2` function

This must be much more network efficient since both Mria and Mnesia need only to replicate one op, e.g.: `{MatchPattern, clear_table}` instead of replicating one per each key to be deleted.

SergeTupchiy commented Dec 18, 2023

View reviewed changes

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch 6 times, most recently from 28d0fb5 to 7650cbb Compare December 21, 2023 18:10

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 7650cbb to ec1de6d Compare December 29, 2023 18:06

SergeTupchiy commented Dec 29, 2023

View reviewed changes

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from ec1de6d to 34cb426 Compare December 29, 2023 18:18

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 34cb426 to 57f5e74 Compare January 9, 2024 17:39

SergeTupchiy marked this pull request as ready for review January 9, 2024 19:16

SergeTupchiy requested review from a team, Rory-Z and lafirest as code owners January 9, 2024 19:16

ieQu1 previously approved these changes Jan 9, 2024

View reviewed changes

SergeTupchiy dismissed ieQu1’s stale review via c5964bf January 9, 2024 19:33

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 57f5e74 to c5964bf Compare January 9, 2024 19:33

ieQu1 previously approved these changes Jan 9, 2024

View reviewed changes

Rory-Z previously approved these changes Jan 10, 2024

View reviewed changes

SergeTupchiy added 3 commits January 10, 2024 16:23

ci: upgrade to OTP-26.1.2-3 (mnesia:match_delete/2 function added)

c77cd69

chore: upgrade ekka to 0.18.1 (mria 0.8.1)

febaaef

Mria 0.8.1 release implements `mria:match_delete/2` function

perf: use mria:match_delete/2 to cleanup routes

965ce5d

This must be much more network efficient since both Mria and Mnesia need only to replicate one op, e.g.: `{MatchPattern, clear_table}` instead of replicating one per each key to be deleted.

SergeTupchiy dismissed stale reviews from Rory-Z and ieQu1 via 965ce5d January 10, 2024 14:24

SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from c5964bf to 965ce5d Compare January 10, 2024 14:24

ieQu1 approved these changes Jan 10, 2024

View reviewed changes

SergeTupchiy merged commit 8bea071 into emqx:master Jan 10, 2024

BrewTestBot mentioned this pull request Feb 1, 2024

emqx 5.5.0 Homebrew/homebrew-core#161557

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

EMQX-10891 route cleanup optimization #12196

EMQX-10891 route cleanup optimization #12196

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EMQX-10891 route cleanup optimization #12196

EMQX-10891 route cleanup optimization #12196

Uh oh!

Conversation

Uh oh!

Summary

PR Checklist

Checklist for CI (.github/workflows) changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants