-
Notifications
You must be signed in to change notification settings - Fork 2.4k
0817 improve topic-index search #11495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0817 improve topic-index search #11495
Conversation
70bbed4 to
e7abc4d
Compare
also avoid using records (setelement) for recursive return values
50af2d7 to
a30d87e
Compare
3bfd47e to
62423b0
Compare
apps/emqx/src/emqx_trie_search.erl
Outdated
|
|
||
| %% @doc Entrypoint of the search for a given topic. | ||
| search(Topic, NextF, Opts) -> | ||
| Words = words(Topic), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, the search should only receive concrete topics (i.e.: not topic filters containing # or +). Yet, this function can receive filters and yield possibly strange results. Should we check for wildcard characters before starting the search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The topic validation should have been done at higher level. e.g. when parsing MQTT packet.
Doing it again here is a waste.
Also, in case this code is executed by a single process or a pool of processes, it's less scalable to put the validations at this low level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, since we are already traversing the words after split, it's not that expensive to add an assertion. Will add it.
apps/emqx/src/emqx_trie_search.erl
Outdated
| end. | ||
|
|
||
| %% Try to use '+' as the next word in the prefix. | ||
| search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: for this quick append optimization, why do we need the next word to also be equal?
i.e., couldn't we have:
| search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) -> | |
| search_plus(C, [W, X | Words], [W, Y | Filter], RPrefix, T, Acc) -> |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also: if we do encounter the situation [W, X | Words], [W, X | Filter], couldn't we fast-forward both W and X in one go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point.
now this part is re-written though.
apps/emqx/src/emqx_trie_search.erl
Outdated
|
|
||
| %% Compare prefix word then the next words in suffix against the search-target | ||
| %% topic or topic-filter. | ||
| compare(_, NotFilter, _) when is_binary(NotFilter) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to add more docs to solidify my own understanding. {match, full} and {match, prefix} are more intuitive, but the others are less so on their own.
| compare(_, NotFilter, _) when is_binary(NotFilter) -> | |
| %% Note: this function might also be fed a `+' as the first word of the "target topic", so | |
| %% the roles of target and filter are a bit fuzzy here. | |
| %% - `lower': the target topic is lexicographically smaller than the _current_ topic | |
| %% filter. Therefore it's no use to continue traversing the subscription table. | |
| %% - `higher': the target topic is lexicographically greater than the _current_ topic | |
| %% filter. Therefore we attempt to go to the next filter in the table, as there's a | |
| %% chance it'll match the target topic. | |
| %% - `shorter': the first word of the target topic exactly matches the first word of the | |
| %% _current_ topic filter, and there are both more target topic and filter words to | |
| %% compare. Since we try to fast-forward exact word matches, if we reach this condition | |
| %% it means we might be comparing wildcards with a concrete words, and need to traverse | |
| %% the table further to check what's actually subscribed to. | |
| compare(_, NotFilter, _) when is_binary(NotFilter) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to document the new compare/3 function.
| %% found a topic match | ||
| match_topics(C, Topic, NextF(Key), add(C, Acc, Key)); | ||
| match_topics(#ctx{nextf = NextF} = C, Topic, {F, _}, Acc) when F < Topic -> | ||
| %% the last key is a filter, try jump to the topic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: since we're searching for topic filters, shouldn't we also add F to the result set as a valid match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we have done searching for filters (wildcards), but start searching for non-wildcads.
If the last is a match, it should have already been added to Acc.
This improves matching performance and decreases GC pressure on synthetic workloads.
|
moved to: #11517 |
This PR is an attempt to further optimize topic-index search performance on #11396
Main changes are:
gb_treesbased index, which might be more suitable for rule-engine thanetsbased.Summary
🤖 Generated by Copilot at 70bbed4
This pull request refactors the topic index modules and tests to use a common trie-search logic and adds a new topic index implementation using gb_trees and persistent_term. The purpose is to improve code reuse, consistency, readability, and performance of topic matching. The affected files are
emqx_topic_index.erl,emqx_topic_index_SUITE.erl,emqx_topic_gbt.erl,emqx_trie_search.erl, andemqx_cluster_rpc.erl.PR Checklist
Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:
changes/(ce|ee)/(feat|perf|fix)-<PR-id>.en.mdfilesChecklist for CI (.github/workflows) changes
changes/dir for user-facing artifacts update