8000 EMQX-10891 route cleanup optimization by SergeTupchiy · Pull Request #12196 · emqx/emqx · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@SergeTupchiy
Copy link
Contributor
@SergeTupchiy SergeTupchiy commented Dec 18, 2023

Fixes EMQX-10891 (partially for now)

Release version: v/e5.?

Summary

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

  • Added tests for the changes
  • Added property-based tests for code which performs user input validation
  • Changed lines covered in coverage report
  • Change log has been added to changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md files
  • For internal contributor: there is a jira ticket to track this change
  • Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
  • Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

  • If changed package build workflow, pass this action (manual trigger)
  • Change log has been added to changes/ dir for user-facing artifacts update

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: it's two separate transactions now (mnesia:match_delete/2 is a transaction that can't be nested).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: both v1 and v2 cleanups are backward compatible at Mnesia level: older nodes that don't have Mnesia patch will commit the transaction correctly (thanks to the fact that Mnesia internally was ready for match_delete implementation).
Older Mria releases would fail on both core and replicant nodes (even though Mria ptotcol version is bumped and new replicant nodes will only connect to new cores), However, the failure will lead to shard restart and re-bootstrapping which will eventually bring all nodes to the consistent state.

@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch 6 times, most recently from 28d0fb5 to 7650cbb Compare December 21, 2023 18:10
@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 7650cbb to ec1de6d Compare December 29, 2023 18:06
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: lists:foreach/2 is probably too silent.. 🤔 Though we eventually ignore return value of cleanup_routes/1 in emqx_router_helper, it may have sense to return ok/error instead of ok.

@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from ec1de6d to 34cb426 Compare December 29, 2023 18:18
@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 34cb426 to 57f5e74 Compare January 9, 2024 17:39
@SergeTupchiy
Copy link
Contributor Author

There are two more potential smaller optimizations:

  • currently emqx_router_helper attempts to cleanup routes twice (because it receives two nodedown messages, at least when a replicant is down), more details about it in another PR thread: feat(routing): add route sync process pool #12237 (comment)
  • every core node attempts to cleanup routes (strictly one after another, thanks to the global lock, but each node tries to acquire the lock indefinitely, since retries defaults to infinity): https://github.com/emqx/emqx/blob/master/apps/emqx/src/emqx_router_helper.erl#L154
    Each call to mnesia:match_delete/2 is a transaction with full table lock and it will cause full table scan on every node each time, so probably it makes sense to minimize its usage.
    It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.
    This is probably less fault tolerant if the lock winner somehow fails to cleanup routes, but we can either retry cleanup on fail, or set retries slightly higher than 0, e.g. to number of core nodes / 2 (however, in this case each can still win the lock and repeat cleanup if faster than sleep time between retries).

@SergeTupchiy SergeTupchiy marked this pull request as ready for review January 9, 2024 19:16
@SergeTupchiy SergeTupchiy requested review from a team, Rory-Z and lafirest as code owners January 9, 2024 19:16
ieQu1
ieQu1 previously approved these changes Jan 9, 2024
@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from 57f5e74 to c5964bf Compare January 9, 2024 19:33
ieQu1
ieQu1 previously approved these changes Jan 9, 2024
@thalesmg
Copy link
Contributor
thalesmg commented Jan 9, 2024

It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.

It may lead to no nodes running the cleanup, if they happen to start the transaction concurrently.

For the DS GC worker, we set retries to 1 to account for that race condition, but I'm not sure it'll always make one node win, in particular if the cluster is large enough. 🤔

We'd probably need leader election with consensus to address this more properly, I think.

Rory-Z
Rory-Z previously approved these changes Jan 10, 2024
@SergeTupchiy
Copy link
Contributor Author

It's possible to set the number of global lock retries, e. .g. if we set it to 0, only one node that is the first to win the lock will do the cleanup.

It may lead to no nodes running the cleanup, if they happen to start the transaction concurrently.

For the DS GC worker, we set retries to 1 to account for that race condition, but I'm not sure it'll always make one node win, in particular if the cluster is large enough. 🤔

We'd probably need leader election with consensus to address this more properly, I think.

Thaks @thalesmg 🙌 Indeed, 0 retries is not an option. 🙈 1 retry definitely doesn't mean there is always only one winner, but it can probably have an affect. The first sleep retry time used by global is random value up to 250 ms and this cleanup transaction can take longer, at least in a highly loaded cluster.
Anyway, it's probably not absolutely worth changing it. 🤔

@SergeTupchiy
Copy link
Contributor Author

I'm going to merge and try to address the discussed concerns in a separate PR.
The fact that nodes eagerly cleanup routes of the down node also increases the risk of some late cleanup of new routes occurring after the down node is back and adds these new routes faster than some of the other node does a subsequent cleanup..

Mria 0.8.1 release implements `mria:match_delete/2` function
This must be much more network efficient since both Mria and Mnesia
need only to replicate one op, e.g.: `{MatchPattern, clear_table}`
instead of replicating one per each key to be deleted.
@SergeTupchiy SergeTupchiy dismissed stale reviews from Rory-Z and ieQu1 via 965ce5d January 10, 2024 14:24
@SergeTupchiy SergeTupchiy force-pushed the EMQX-10891-route-cleanup-optimization branch from c5964bf to 965ce5d Compare January 10, 2024 14:24
@SergeTupchiy SergeTupchiy merged commit 8bea071 into emqx:master Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0