py: Faster qstr search #6896

amirgon · 2021-02-13T22:54:54Z

Motivation: qstr search scales poorly - see https://forum.micropython.org/viewtopic.php?f=2&t=9748

In short, when there are many qstrs, performance is poor because of the way they are searched.
This affects, for example, load time of mpy modules since strings are replaced by qstrs during import.
On ESP32 it takes many seconds loading few mpy files in the presence of large native libraries (LVGL for example).

Today qstr implementation scans strings sequntially.
In cases there are many strings this can become very inefficient.
This change improves qstr search performance by u 8000 sing binary search in
sorted qstr pools, when possible.

This change introduces an option to create a sorted string pool, which
is then searched by a binary search instead of sequential search.

qstr pool can be either "sorted" or "unsorted", whereas the unsorted is
searched sequentally as today.
Native modules (MP_ROM_QSTR) and frozen modules generate sorted pools.
Currently runtime strings are unsorted.

The constant string pools is split into two and a new pool is introduced,
"special_const_pool". This is required because the first sequence of
strings already requires special ordering therefore created unsorted,
while the rest of the constants are generated sorted.

qstr_find_strn searches strings in each pool. If the pool is sorted and
larger than a threshold, it will be search using binary search instead
of sequential search, significantly improving performance.

stinos · 2021-02-14T08:06:07Z

Do you have some measurements, preferably from different ports, showing how significant the improvement is in different situations? And if the search is faster what is the effect on creating and adding new qstr values at runtime?

amirgon · 2021-02-14T22:20:03Z

Do you have some measurements, preferably from different ports, showing how significant the improvement is in different situations?

Unfortunately I can only provide measurements for ESP32.
On ESP32 @240Mhz I'm importing a 10KB view.mpy file shortly after reset:

>>> ticks_ms();import view;ticks_ms()

Without this change the import completes in 3590ms
With this change the import completes in 169ms

That's x 21 performance gain.
Measuring this multiple times I'm getting the same results with up to 1 ms difference.

The firmware is Micropython 1.14 with LVGL native module.
LVGL is a large library. Looking at qstrdefs.generated.h there are 4450 QDEFs. It's clear that scanning all these strings sequentially for each and every str-->qstr conversion is inefficient, particularly since the const qstr pool is stored on flash.

And if the search is faster what is the effect on creating and adding new qstr values at runtime?

There is no effect on creating and adding new qstr values at runtime because runtime qstr pools remain unsorted, and scanned in the same manner as today.
Only qstr pools generated at compile time are sorted, so the benefit becomes significant when scaling native and frozen modules.

stinos · 2021-02-15T07:12:26Z

That's x 21 performance gain.

Not bad :)

runtime qstr pools remain unsorted

Oops sorry missed that. Any particular reason for this?

amirgon · 2021-02-15T08:10:33Z

runtime qstr pools remain unsorted

Oops sorry missed that. Any particular reason for this?

Yes.

qstrs are indices into qstr pools. So once a pool is in use (the code refers to specific qstrs in the pool), we can no longer change its sort order otherwise existing qstrs used in the code would no longer be correct.
To overcome this we would either need to add an indirection level when accessing qstrs (that would cost performance and RAM) or select a different solution such as hash-table instead of sorted array. Hash-table is a valid alternative, but has its disadvantages (more costly in RAM, harder to tune optimally, more complex to implement).
From practical point of view, when scaling up I think we can expect the mass of qstrs to come from native or frozen modules, and the minority to be runtime qstrs. So most benefit would come from improving access to native / frozen qstrs.

stinos · 2021-02-15T19:11:17Z

Thanks for the explanation, makes sense. You might want to fix code formatting, some grammar issues in comments.

amirgon · 2021-02-15T22:40:17Z

You might want to fix code formatting, some grammar issues in comments.

Done.

Pending upstream PR: micropython#6896 This optimization is important because LVGL creates thousands of qstrs and 'import' can take seconds instead of milliseconds without better qstr search. Currently this PR is pending. Once it is merged or some other optimization that solves this problem is merged, this change could be reverted and replaced by the upstream solution. Today qstr implementation scans strings sequntially. In cases there are many strings this can become very inefficient. This change improves qstr search performance by using binary search in sorted qstr pools, when possible. This change introduces an option to create a sorted string pool, which is then searched by a binary search instead of sequential search. qstr pool can be either "sorted" or "unsorted", whereas the unsorted is searched sequentally as today. Native modules (MP_ROM_QSTR) and frozen modules generate sorted pools. Currently runtime strings are unsorted. The constant string pools is split into two and a new pool is introduced, "special_const_pool". This is required because the first sequence of strings already requires special ordering therefore created unsorted, while the rest of the constants are generated sorted. qstr_find_strn searches strings in each pool. If the pool is sorted and larger than a threshold, it will be search using binary search instead of sequential search, significantly improving performance. (cherry picked from commit 8532c21)

Gadgetoid · 2022-01-26T13:37:11Z

Just to weigh in here, I've been shipping a MicroPython build for PicoSystem that includes this patch since October 2021 and - so far - have had no problems with it. The speedup (alongside doing some heinous things with builtins) is essential for our wide/flat API surface.

thyrrestrup · 2022-01-26T15:50:17Z

Regarding indirection when accessing qstr’s see #8191

dpgeorge · 2022-02-09T11:21:20Z

Running some benchmarks on this PR, it looks very good. On PYBv1.1 which has about 1600 static qstrs:

diff of scores (higher is better)
N=100 M=100                  pyb-p0 ->     pyb-p1         diff      diff% (error%)
bm_chaos.py                  369.56 ->     374.23 :      +4.67 =  +1.264% (+/-0.02%)
bm_fannkuch.py                77.51 ->      78.45 :      +0.94 =  +1.213% (+/-0.00%)
bm_fft.py                   2558.76 ->    2577.15 :     +18.39 =  +0.719% (+/-0.00%)
bm_float.py                 5922.05 ->    6143.30 :    +221.25 =  +3.736% (+/-0.02%)
bm_hexiom.py                  47.96 ->      49.77 :      +1.81 =  +3.774% (+/-0.00%)
bm_nqueens.py               4446.35 ->    4457.18 :     +10.83 =  +0.244% (+/-0.00%)
bm_pidigits.py               647.41 ->     650.75 :      +3.34 =  +0.516% (+/-0.15%)
core_qstr.py                5020.31 ->   16570.01 :  +11549.70 = +230.059% (+/-0.01%)
misc_aes.py                  421.88 ->     427.24 :      +5.36 =  +1.271% (+/-0.01%)
misc_mandel.py              3566.58 ->    3584.88 :     +18.30 =  +0.513% (+/-0.00%)
misc_pystone.py             2487.97 ->    2583.21 :     +95.24 =  +3.828% (+/-0.02%)
misc_raytrace.py             376.64 ->     383.84 :      +7.20 =  +1.912% (+/-0.02%)
viper_call0.py               576.54 ->     576.73 :      +0.19 =  +0.033% (+/-0.04%)
viper_call1a.py              550.05 ->     550.38 :      +0.33 =  +0.060% (+/-0.04%)
viper_call1b.py              435.39 ->     435.95 :      +0.56 =  +0.129% (+/-0.19%)
viper_call1c.py              441.49 ->     440.80 :      -0.69 =  -0.156% (+/-0.21%)
viper_call2a.py              535.74 ->     536.31 :      +0.57 =  +0.106% (+/-0.12%)
viper_call2b.py              381.40 ->     381.91 :      +0.51 =  +0.134% (+/-0.15%)

The new test there, core_qstr.py, effectively calls qstr_find_strn() many times in a loop. And it's significantly faster with this PR. Other benchmarks are not impacted, which is good.

On unix (which is always noisy) it's possible to also see an improvement, although unix only has about 600 static qstrs:

diff of scores (higher is better)
N=5000 M=100                unix-p0 ->    unix-p1         diff      diff% (error%)
bm_fannkuch.py                10.52 ->      11.00 :      +0.48 =  +4.563% (+/-4.87%)
bm_fft.py                 221486.06 ->  222913.26 :   +1427.20 =  +0.644% (+/-5.24%)
bm_float.py               446655.34 ->  452957.42 :   +6302.08 =  +1.411% (+/-4.16%)
bm_hexiom.py                3531.20 ->    3549.03 :     +17.83 =  +0.505% (+/-2.11%)
bm_nqueens.py             442323.78 ->  414218.36 :  -28105.42 =  -6.354% (+/-8.21%)
bm_pidigits.py             48953.40 ->   50719.93 :   +1766.53 =  +3.609% (+/-7.57%)
core_qstr.py              940349.56 -> 1432698.78 : +492349.22 = +52.358% (+/-7.42%)
misc_aes.py                33653.40 ->   32860.59 :    -792.81 =  -2.356% (+/-7.40%)
misc_mandel.py            242690.30 ->  238764.20 :   -3926.10 =  -1.618% (+/-6.17%)
misc_pystone.py           139726.47 ->  135769.40 :   -3957.07 =  -2.832% (+/-7.61%)
misc_raytrace.py           37689.06 ->   36494.55 :   -1194.51 =  -3.169% (+/-7.47%)
viper_call0.py             34051.60 ->   34323.85 :    +272.25 =  +0.800% (+/-6.95%)
viper_call1a.py            31946.89 ->   33872.77 :   +1925.88 =  +6.028% (+/-7.69%)
viper_call1b.py            27876.34 ->   26771.29 :   -1105.05 =  -3.964% (+/-6.14%)
viper_call1c.py            26162.97 ->   26561.40 :    +398.43 =  +1.523% (+/-10.81%)
viper_call2a.py            27179.72 ->   33407.50 :   +6227.78 = +22.913% (+/-12.60%)
viper_call2b.py            19137.07 ->   24199.32 :   +5062.25 = +26.453% (+/-8.79%)

dpgeorge · 2022-02-09T13:02:37Z

py/qstr.c

+
+        // sequential search for the remaining strings
+        for (const byte **q = pool->qstrs + low; q != pool->qstrs + high + 1; q++) {
+            if (*q &&


@amirgon why does it check for *q being non-zero? The original code did not need to do this in the linear search part.

I removed this check after rebase.

dpgeorge · 2022-02-09T13:03:04Z

py/qstr.c

@@ -171,14 +194,48 @@ STATIC qstr qstr_add(const byte *q_ptr) {
    return MP_STATE_VM(last_pool)->total_prev_len + MP_STATE_VM(last_pool)->len - 1;
 }

+#define MP_QSTR_SEARCH_THRESHOLD 10


How was this number chosen?

The rational was that sequential search over small sequences is more efficient than binary search because of caching, compiler optimizations etc. It's probably a different number for different architectures, compiler flags etc.

So this number was chosen simply because it was good enough on my tests.

dpgeorge · 2022-02-09T13:46:57Z

py/qstr.c

+                if (len > str_len) {
+                    len = str_len;
+                }
+                int cmp = memcmp(Q_GET_DATA(*q), str, str_len);


It might work to sort on qstr hash (and length) instead of qstr data, and then this comparison here would just be a comparison (subtraction) of the 2 hashes, which would be a lot faster than a memcmp.

Once it reached a case of 2 identical hashes it would then need to use memcmp, or just fall through to a linear search.

This is a good idea so I gave it a shot.

It turned out to be more challenging than I expected:
As you can see, extra_coverage test fails. That's because it tests auto completion which prints qstrs in order(find_completions, print_completions). So once qstrs are sorted by hash, the autocompletion order is no longer alphabetical...

We have a few options:

Sort the auto completions results (allocate ram and sort qstr range by name?)

Keep autocompletions unsorted

Give up sorting by hash and keep alphabetical qstr sort

What do you think?

Would be good to run some benchmarks on sorting by hash vs string content. Do you have the original version which used string sorting?

Do you have the original version which used string sorting?

Yes. Only the last commit on this PR introduced hash sorting.
Simply checkout previous commmit 5808635

codecov-commenter · 2022-02-19T15:12:28Z

Codecov Report

Merging #6896 (b314530) into master (ff9c708) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #6896      +/-   ##
==========================================
+ Coverage   98.45%   98.47%   +0.02%     
==========================================
  Files         153      153              
  Lines       20153    20168      +15     
==========================================
+ Hits        19842    19861      +19     
+ Misses        311      307       -4

Impacted Files	Coverage Δ
py/qstr.c	`95.65% <100.00%> (+0.44%)`	⬆️
py/bc.c	`92.23% <0.00%> (+3.88%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff9c708...b314530. Read the comment docs.

dpgeorge · 2022-02-25T12:55:48Z

I did performance tests for sorting on strings and sorting on hashes (on PYBv1.0).

Going from baseline (current master) to sorting them on string value gives:

diff of scores (higher is better)
N=100 M=100                pyb-baseline -> pyb-sort-str         diff      diff% (error%)
bm_chaos.py                    357.40 ->     352.80 :      -4.60 =  -1.287% (+/-0.02%)
bm_fannkuch.py                  77.49 ->      77.40 :      -0.09 =  -0.116% (+/-0.01%)
bm_fft.py                     2539.26 ->    2525.13 :     -14.13 =  -0.556% (+/-0.00%)
bm_float.py                   5908.33 ->    5872.71 :     -35.62 =  -0.603% (+/-0.01%)
bm_hexiom.py                    47.93 ->      48.77 :      +0.84 =  +1.753% (+/-0.00%)
bm_nqueens.py                 4459.94 ->    4433.80 :     -26.14 =  -0.586% (+/-0.00%)
bm_pidigits.py                 644.88 ->     640.95 :      -3.93 =  -0.609% (+/-0.27%)
core_import_mpy_multi.py       581.49 ->     759.37 :    +177.88 = +30.590% (+/-0.00%)
core_import_mpy_single.py       67.16 ->     105.41 :     +38.25 = +56.954% (+/-0.01%)
core_qstr.py                    64.12 ->     206.44 :    +142.32 = +221.959% (+/-0.01%)
core_yield_from.py             354.50 ->     353.96 :      -0.54 =  -0.152% (+/-0.00%)
misc_aes.py                    405.59 ->     410.03 :      +4.44 =  +1.095% (+/-0.00%)
misc_mandel.py                3416.48 ->    3387.86 :     -28.62 =  -0.838% (+/-0.00%)
misc_pystone.py               2405.57 ->    2419.06 :     +13.49 =  +0.561% (+/-0.00%)
misc_raytrace.py               374.01 ->     368.50 :      -5.51 =  -1.473% (+/-0.00%)

Going from baseline to sorting on hash:

diff of scores (higher is better)
N=100 M=100                pyb-baseline -> pyb-sort-hash         diff      diff% (error%)
bm_chaos.py                    357.40 ->     359.03 :      +1.63 =  +0.456% (+/-0.03%)
bm_fannkuch.py                  77.49 ->      76.28 :      -1.21 =  -1.561% (+/-0.01%)
bm_fft.py                     2539.26 ->    2523.73 :     -15.53 =  -0.612% (+/-0.00%)
bm_float.py                   5908.33 ->    5910.11 :      +1.78 =  +0.030% (+/-0.01%)
bm_hexiom.py                    47.93 ->      48.85 :      +0.92 =  +1.919% (+/-0.00%)
bm_nqueens.py                 4459.94 ->    4421.59 :     -38.35 =  -0.860% (+/-0.00%)
bm_pidigits.py                 644.88 ->     644.49 :      -0.39 =  -0.060% (+/-0.21%)
core_import_mpy_multi.py       581.49 ->     763.36 :    +181.87 = +31.277% (+/-0.00%)
core_import_mpy_single.py       67.16 ->     106.25 :     +39.09 = +58.204% (+/-0.01%)
core_qstr.py                    64.12 ->     212.31 :    +148.19 = +231.114% (+/-0.00%)
core_yield_from.py             354.50 ->     354.05 :      -0.45 =  -0.127% (+/-0.00%)
misc_aes.py                    405.59 ->     404.15 :      -1.44 =  -0.355% (+/-0.00%)
misc_mandel.py                3416.48 ->    3397.47 :     -19.01 =  -0.556% (+/-0.00%)
misc_pystone.py               2405.57 ->    2446.53 :     +40.96 =  +1.703% (+/-0.00%)
misc_raytrace.py               374.01 ->     369.21 :      -4.80 =  -1.283% (+/-0.00%)

Difference going from sorting on string to sorting on hash is small:

diff of scores (higher is better)
N=100 M=100                pyb-sort-str -> pyb-sort-hash         diff      diff% (error%)
bm_chaos.py                    352.80 ->     359.03 :      +6.23 =  +1.766% (+/-0.01%)
bm_fannkuch.py                  77.40 ->      76.28 :      -1.12 =  -1.447% (+/-0.00%)
bm_fft.py                     2525.13 ->    2523.73 :      -1.40 =  -0.055% (+/-0.00%)
bm_float.py                   5872.71 ->    5910.11 :     +37.40 =  +0.637% (+/-0.01%)
bm_hexiom.py                    48.77 ->      48.85 :      +0.08 =  +0.164% (+/-0.00%)
bm_nqueens.py                 4433.80 ->    4421.59 :     -12.21 =  -0.275% (+/-0.00%)
bm_pidigits.py                 640.95 ->     644.49 :      +3.54 =  +0.552% (+/-0.25%)
core_import_mpy_multi.py       759.37 ->     763.36 :      +3.99 =  +0.525% (+/-0.00%)
core_import_mpy_single.py      105.41 ->     106.25 :      +0.84 =  +0.797% (+/-0.01%)
core_qstr.py                   206.44 ->     212.31 :      +5.87 =  +2.843% (+/-0.00%)
core_yield_from.py             353.96 ->     354.05 :      +0.09 =  +0.025% (+/-0.00%)
misc_aes.py                    410.03 ->     404.15 :      -5.88 =  -1.434% (+/-0.00%)
misc_mandel.py                3387.86 ->    3397.47 :      +9.61 =  +0.284% (+/-0.00%)
misc_pystone.py               2419.06 ->    2446.53 :     +27.47 =  +1.136% (+/-0.00%)
misc_raytrace.py               368.50 ->     369.21 :      +0.71 =  +0.193% (+/-0.00%)

The core_qstr.py test is intended to really stress test the qstr_find_strn() function, so I'd say these results give an accurate picture.

Those tests are run with lots of frozen code on the PYBv1.0, and frozen qstrs are not sorted. So might be better to try again without frozen code.

dpgeorge · 2022-02-25T12:56:50Z

py/qstr.c

+                } else {
+                    // avoid a rare (hash,len) collisions
+                    while (MP_UNLIKELY(
+                        pool->lengths[mid] != str_len ||


I don't think you need this check on length. If the code gets here then cmp == 0, which can only happen if the lengths are equal (and also the hashes are equal).

And wouldn't it be simpler (smaller code, probably no effect on performance) to just break here and drop down into the sequential search?

You are right - this can be simplified.
Done in 3bc257c.

amirgon · 2022-02-25T20:43:47Z

I did performance tests for sorting on strings and sorting on hashes (on PYBv1.0).

@dpgeorge did you try performance tests on PYB with different values of MP_QSTR_SEARCH_THRESHOLD?

amirgon · 2022-08-09T21:19:52Z

@dpgeorge

It's definitely worth sorting all the qstrs, including the first set (the pre-defined ones, and the ones that need to fit in 1 byte).

I agree.
Done on 311b33c.

Sorting the first set of qstrs via hash will move MP_QSTR_ away from the second position. I'm not 100% sure this is OK, will need to be checked if nothing relies on that qstr having value 1.

I'm not sure how to check this really.

Sorting the first set of qstrs via hash will need builds that use 1-byte qstr hashes to use the most-significant-byte of the hash, so that the order of the pre-defined qstrs is the same regardless of whether 1-byte or 2-byte qstr hashes are used.

Once the first set is sorted by hash, why do we care if the order is different between 1-byte and 2-byte qstr hashes?

The code size increase is quite large... but I think there might be a few opportunities to reduce that, eg shrink the size of the qstr_pool_t struct.

I don't understand why we see a large code increase.
The code change is rather small, the increase per pool is small and there are usually small number of pools. We could even keep qstr_pool_t the same size if we encoded sorted in a bit of one of the other fields.

But overall, where does the significant code increase come from?
Maybe the compiler is unrolling the loop? But when code size is important we would compile with -Os and prevent loop unrolling.
Did you measure the code increase when compiling with -Os?

One thing we can consider in order to decrease memory usage is to remove qstr lengths array entirely.
Checking qstr length would be a little more expensive (count until the null termination), but memory saving could be significant with large number of strings.

Today qstr implementation scans strings sequntially. In cases there are many strings this can become very inefficient. This change improves qstr search performance by using binary search in sorted qstr pools, when possible. This change introduces an option to create a sorted string pool, which is then searched by a binary search instead of sequential search. qstr pool can be either "sorted" or "unsorted", whereas the unsorted is searched sequentally as today. Native modules (MP_ROM_QSTR) and frozen modules generate sorted pools. Currently runtime strings are unsorted. The constant string pools is split into two and a new pool is introduced, "special_const_pool". This is required because the first sequence of strings already requires special ordering therefore created unsorted, while the rest of the constants are generated sorted. qstr_find_strn searches strings in each pool. If the pool is sorted and larger than a th 10000 reshold, it will be search using binary search instead of sequential search, significantly improving performance. Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

Use Qstr class instead of tuple, where properties are calculated only when accessed. This is needed as preparation to using hash as the sort key instead the qstr string. It also makes the code more readable when referring to a qstr in py/makeqstrdata.py and tools/mpy-tool.py (for example, refer to q.order instead of q[0], or q.qstr instead of q[2]) Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

Instead of sorting by string, sort qstrs by (hash, len). This allows faster binary serach on qstr_find_strn, since it's faster to compare hashes than strings. A few strings needed to be moved to special string pool (QDEF0) because their qstr is assumed to be small (8 bit) on py/scope.c Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

In case cmp == 0 there could be a hash collision. To save program size Instead of checking for hash collisions, simply revert to linear search which also compares strings. Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

This fixes a corner case where both hash and lengths of two qstrs are equal. In such case, if the binary search lower bound is the second instance of the same hash, the search will miss the first instance. To overcome this, when lower bound hash equals the requested hash, we move the lower bound backwards to cover all previous hashes which are also equal to the requested hash. Also removed length comparison since it's no longer needed. Related: lvgl/lv_binding_micropython#224 Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

MP_QSTR_SEARCH_THRESHOLD doesn't provide improvement with sorted hashes. Remove MP_QSTR_SEARCH_THRESHOLD and simlify the binary search. Sequential search is still needed in case of hash collisions, or unsorted pools. Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

jimmo · 2023-10-30T03:48:45Z

Thanks @amirgon for your work on this and for bringing it to our attention in the first place -- this has been merged via #12678 which contains essentially the same change to qstr.c (but via a fairly circuitous route, see #10758, where we investigated a bunch of other things). I think it's still worth investigating splitting the static pool, and I'd love to eventually find a way to remove hashes without the performance hit.

dpgeorge · 2023-10-30T04:25:39Z

One thing we can consider in order to decrease memory usage is to remove qstr lengths array entirely.

I had a brief look at doing this and I think it would slow a lot of operations down, because there are many places in the code that query the qstr/str lengths. Eg all uses of GET_STR_DATA_LEN(), that would need to use strlen() to compute the qstr length each time it was called.

gitcnd · 2023-11-11T16:20:29Z

What are all the qstr's for? How many are used in a normal application? Eradicating as many as possible might be a more sensible option? Wouldn't making the caller turn whatever they want into an ENUM would make more sense?

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com> (cherry picked from commit 64c79a5)

amirgon force-pushed the qstr_bsearch branch from 443ef67 to c1c2855 Compare February 14, 2021 22:59

amirgon force-pushed the qstr_bsearch branch from c1c2855 to ddb974e Compare February 15, 2021 22:33

amirgon force-pushed the qstr_bsearch branch from ddb974e to 8532c21 Compare February 15, 2021 22:43

amirgon mentioned this pull request Sep 4, 2021

RPi Pico port lvgl/lv_micropython#42

Merged

embeddedt mentioned this pull request Sep 20, 2021

RPi Pico port compilation lvgl/lv_binding_micropython#174

Merged

dpgeorge added the py-core Relates to py/ directory in source label Nov 29, 2021

dpgeorge reviewed Feb 9, 2022

View reviewed changes

dpgeorge mentioned this pull request Feb 10, 2022

tests: add performance tests for qstrs and importing .mpy files #8288

Merged

amirgon force-pushed the qstr_bsearch branch from 8532c21 to c661055 Compare February 19, 2022 14:53

amirgon force-pushed the qstr_bsearch branch 5 times, most recently from 60f27df to cc808fe Compare February 24, 2022 21:45

dpgeorge reviewed Feb 25, 2022

View reviewed changes

amirgon force-pushed the qstr_bsearch branch 4 times, most recently from ebde816 to 04ba17b Compare August 9, 2022 20:50

amirgon force-pushed the qstr_bsearch branch from 04ba17b to 9dc5be8 Compare August 13, 2022 21:24

amirgon added 6 commits August 14, 2022 00:39

py/qstr: Optimize hash sort.

9696f43

In case cmp == 0 there could be a hash collision. To save program size Instead of checking for hash collisions, simply revert to linear search which also compares strings. Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

amirgon force-pushed the qstr_bsearch branch 3 times, most recently from 3570f0f to 72ee3ae Compare August 13, 2022 22:07

py/qstr: Sort the first QSTR pool.

311b33c

Signed-off-by: Amir Gonnen <amirgonnen@gmail.com>

amirgon force-pushed the qstr_bsearch branch from 72ee3ae to 311b33c Compare August 13, 2022 22:26

amirgon mentioned this pull request Nov 15, 2022

Integration as external c module with unmodified micropython. lvgl/lv_binding_micropython#242

Open

jimmo mentioned this pull request Feb 15, 2023

py/qstr: qstr performance improvements. #10758

Closed

jimmo mentioned this pull request Oct 13, 2023

py/qstr: Add support for sorted qstr pools. #12678

Merged

jimmo closed this Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py: Faster qstr search #6896

py: Faster qstr search #6896

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

py: Faster qstr search #6896

py: Faster qstr search #6896

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!