8000 0817 improve topic-index search by zmstone · Pull Request #11495 · emqx/emqx · GitHub

0817 improve topic-index search #11495

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

zmstone wants to merge 10 commits into emqx:release-52 from zmstone:0817-use-gb_tree-for-topic-index

Member

zmstone commented

•

This PR is an attempt to further optimize topic-index search performance on #11396
Main changes are:

Optimized the trie-search algorithm to avoid repeated prefix matches
Made an abstraction of the trie-search algorithm.
Implemented a gb_trees based index, which might be more suitable for rule-engine than ets based.

Summary

`🤖 Generated by Copilot at 70bbed4`

This pull request refactors the topic index modules and tests to use a common trie-search logic and adds a new topic index implementation using gb_trees and persistent_term. The purpose is to improve code reuse, consistency, readability, and performance of topic matching. The affected files are emqx_topic_index.erl, emqx_topic_index_SUITE.erl, emqx_topic_gbt.erl, emqx_trie_search.erl, and emqx_cluster_rpc.erl.

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

Added tests for the changes
Added property-based tests for code which performs user input validation
Changed lines covered in coverage report
Change log has been added to changes/(ce|ee)/(feat|perf|fix)-<PR-id>.en.md files
For internal contributor: there is a jira ticket to track this change
Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

If changed package build workflow, pass this action (manual trigger)
Change log has been added to changes/ dir for user-facing artifacts update

zmstone requested review from a team and lafirest as code owners

August 22, 2023 15:17

zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 70bbed4 to e7abc4d Compare

August 22, 2023 16:04

thalesmg reviewed

View reviewed changes

apps/emqx/src/emqx_trie_search.erl Outdated Show resolved Hide resolved

apps/emqx/src/emqx_trie_search.erl Outdated Show resolved Hide resolved

apps/emqx/src/emqx_trie_search.erl Outdated Show resolved Hide resolved

zmstone added 5 commits

August 24, 2023 12:25


          chore: fix a typo in log message

ae094e3


          test: add more debug output

6b152b3


          refactor(topic_index): optimize trie-search performance

f4c8c6b


          refactor(topic_index): no forced ceiling entry in index table

a1e6635


          refactor(topic_index): remove more unnecessary next calls

a30d87e

also avoid using records (setelement) for recursive return values

zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 50af2d7 to a30d87e Compare

August 24, 2023 10:25


          refactor(topic_index): less special handling for leading $ words

62423b0

zmstone force-pushed the 0817-use-gb_tree-for-topic-index branch from 3bfd47e to 62423b0 Compare

August 24, 2023 11:30

thalesmg reviewed

View reviewed changes

apps/emqx/src/emqx_trie_search.erl Show resolved Hide resolved

apps/emqx/src/emqx_trie_search.erl Outdated

+              %% @doc Entrypoint of the search for a given topic.
+              search(Topic, NextF, Opts) ->
+                  Words = words(Topic),

Contributor

thalesmg

Apparently, the search should only receive concrete topics (i.e.: not topic filters containing # or +). Yet, this function can receive filters and yield possibly strange results. Should we check for wildcard characters before starting the search?

Member Author

zmstone

The topic validation should have been done at higher level. e.g. when parsing MQTT packet.
Doing it again here is a waste.
Also, in case this code is executed by a single process or a pool of processes, it's less scalable to put the validations at this low level.

Member Author

zmstone

Sorry, since we are already traversing the words after split, it's not that expensive to add an assertion. Will add it.

apps/emqx/src/emqx_trie_search.erl Outdated

+                  end.
+              %% Try to use '+' as the next word in the prefix.
+              search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) ->

Contributor

thalesmg

Q: for this quick append optimization, why do we need the next word to also be equal?

i.e., couldn't we have:

Suggested change

      
            search_plus(C, [W, X | Words], [W, X | Filter], RPrefix, T, Acc) ->
          
            search_plus(C, [W, X | Words], [W, Y | Filter], RPrefix, T, Acc) ->

?

Contributor

thalesmg

Also: if we do encounter the situation [W, X | Words], [W, X | Filter], couldn't we fast-forward both W and X in one go?

Member Author

zmstone

good point.
now this part is re-written though.

apps/emqx/src/emqx_trie_search.erl Outdated

+              %% Compare prefix word then the next words in suffix against the search-target
+              %% topic or topic-filter.
+              compare(_, NotFilter, _) when is_binary(NotFilter) ->

Contributor

thalesmg

•

Trying to add more docs to solidify my own understanding. {match, full} and {match, prefix} are more intuitive, but the others are less so on their own.

Suggested change

      
            compare(_, NotFilter, _) when is_binary(NotFilter) ->
          
            %% Note: this function might also be fed a `+' as the first word of the "target topic", so
          
            %% the roles of target and filter are a bit fuzzy here.
          
            %% - `lower': the target topic is lexicographically smaller than the _current_ topic
          
            %%   filter.  Therefore it's no use to continue traversing the subscription table.
          
            %% - `higher': the target topic is lexicographically greater than the _current_ topic
          
            %%   filter.  Therefore we attempt to go to the next filter in the table, as there's a
          
            %%   chance it'll match the target topic.
          
            %% - `shorter': the first word of the target topic exactly matches the first word of the
          
            %%   _current_ topic filter, and there are both more target topic and filter words to
          
            %%   compare.  Since we try to fast-forward exact word matches, if we reach this condition
          
            %%   it means we might be comparing wildcards with a concrete words, and need to traverse
          
            %%   the table further to check what's actually subscribed to.
          
            compare(_, NotFilter, _) when is_binary(NotFilter) ->

Member Author

zmstone

Tried to document the new compare/3 function.

apps/emqx/src/emqx_trie_search.erl

+                  %% found a topic match
+                  match_topics(C, Topic, NextF(Key), add(C, Acc, Key));
+              match_topics(#ctx{nextf = NextF} = C, Topic, {F, _}, Acc) when F < Topic ->
+                  %% the last key is a filter, try jump to the topic

Contributor

thalesmg

Q: since we're searching for topic filters, shouldn't we also add F to the result set as a valid match?

Member Author

zmstone

Here, we have done searching for filters (wildcards), but start searching for non-wildcads.
If the last is a match, it should have already been added to Acc.


          feat(topicidx): iterate on trie search implementation

cf45e80

This improves matching performance and decreases GC pressure on
synthetic workloads.

keynslug mentioned this pull request

feat(topicidx): iterate on trie search implementation #11517

Merged

7 tasks

zmstone added 3 commits

August 25, 2023 09:23


          refactor(topic_index): simplify compare function

ecac673


          docs: refine code comments

2332eb2


          chore(topic_index): add topic validation

Member Author

zmstone commented

moved to: #11517

zmstone closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

0