8000 0817 improve topic-index search by zmstone · Pull Request #11495 · emqx/emqx · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@zmstone
Copy link
Member
@zmstone zmstone commented Aug 22, 2023

This PR is an attempt to further optimize topic-index search performance on #11396
Main changes are:

  1. Optimized the trie-search algorithm to avoid repeated prefix matches
  2. Made an abstraction of the trie-search algorithm.
  3. Implemented a gb_trees based index, which might be more suitable for rule-engine than ets based.

Summary

🤖 Generated by Copilot at 70bbed4

This pull request refactors the topic index modules and tests to use a common trie-search logic and adds a new topic index implementation using gb_trees and persistent_term. The purpose is to improve code reuse, consistency, readability, and performance of topic matching. The affected files are emqx_topic_index.erl, emqx_topic_index_SUITE.erl, emqx_topic_gbt.erl, emqx_trie_search.erl, and emqx_cluster_rpc.erl.

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

  • Added tests for the changes
  • Added property-based tests for code which performs user input validation
  • Changed lines covered in coverage report
  • Change log has been added to changes/(ce|ee)/(feat|perf|fix)-<PR-id>.en.md files
  • For internal contributor: there is a jira ticket to track this change
  • Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
  • Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

  • If changed package build workflow, pass this action (manual trigger)
  • Change log has been added to changes/ dir for user-facing artifacts update

@zmstone zmstone requested review from a team and lafirest as code owners August 22, 2023 15:17
@zmstone zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 70bbed4 to e7abc4d Compare August 22, 2023 16:04
@zmstone zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 50af2d7 to a30d87e Compare August 24, 2023 10:25
@zmstone zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 3bfd47e to 62423b0 Compare August 24, 2023 11:30

%% @doc Entrypoint of the search for a given topic.
search(Topic, NextF, Opts) ->
Words = words(Topic),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, the search should only receive concrete topics (i.e.: not topic filters containing # or +). Yet, this function can receive filters and yield possibly strange results. Should we check for wildcard characters before starting the search?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The topic validation should have been done at higher level. e.g. when parsing MQTT packet.
Doing it again here is a waste.
Also, in case this code is executed by a single process or a pool of processes, it's less scalable to put the validations at this low level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, since we are already traversing the words after split, it's not that expensive to add an assertion. Will add it.

end.

%% Try to use '+' as the next word in the prefix.
search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: for this quick append optimization, why do we need the next word to also be equal?

i.e., couldn't we have:

Suggested change
search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) ->
search_plus(C, [W, X | Words], [W, Y | Filter], RPrefix, T, Acc) ->

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: if we do encounter the situation [W, X | Words], [W, X | Filter], couldn't we fast-forward both W and X in one go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
now this part is re-written though.


%% Compare prefix word then the next words in suffix against the search-target
%% topic or topic-filter.
compare(_, NotFilter, _) when is_binary(NotFilter) ->
Copy link
Contributor
@thalesmg thalesmg Aug 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to add more docs to solidify my own understanding. {match, full} and {match, prefix} are more intuitive, but the others are less so on their own.

Suggested change
compare(_, NotFilter, _) when is_binary(NotFilter) ->
%% Note: this function might also be fed a `+' as the first word of the "target topic", so
%% the roles of target and filter are a bit fuzzy here.
%% - `lower': the target topic is lexicographically smaller than the _current_ topic
%% filter. Therefore it's no use to continue traversing the subscription table.
%% - `higher': the target topic is lexicographically greater than the _current_ topic
%% filter. Therefore we attempt to go to the next filter in the table, as there's a
%% chance it'll match the target topic.
%% - `shorter': the first word of the target topic exactly matches the first word of the
%% _current_ topic filter, and there are both more target topic and filter words to
%% compare. Since we try to fast-forward exact word matches, if we reach this condition
%% it means we might be comparing wildcards with a concrete words, and need to traverse
%% the table further to check what's actually subscribed to.
compare(_, NotFilter, _) when is_binary(NotFilter) ->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to document the new compare/3 function.

%% found a topic match
match_topics(C, Topic, NextF(Key), add(C, Acc, Key));
match_topics(#ctx{nextf = NextF} = C, Topic, {F, _}, Acc) when F < Topic ->
%% the last key is a filter, try jump to the topic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: since we're searching for topic filters, shouldn't we also add F to the result set as a valid match?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we have done searching for filters (wildcards), but start searching for non-wildcads.
If the last is a match, it should have already been added to Acc.

This improves matching performance and decreases GC pressure on
synthetic workloads.
@zmstone
Copy link
Member Author
zmstone commented Aug 25, 2023

moved to: #11517

@zmstone zmstone closed this Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0