py/qstr: Add support for sorted qstr pools. #12678

jimmo · 2023-10-13T04:09:42Z

This is just the sorted-pools part of #10758 (originally based on #6896).

$ ./run-perfbench.py -s ~/qstr-sorted-baseline ~/qstr-sorted-main-pool-sorted 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/qstr-sorted-baseline -> /home/jimmo/qstr-sorted-main-pool-sorted         diff      diff% (error%)
bm_chaos.py                    357.95 ->     351.21 :      -6.74 =  -1.883% (+/-0.01%)
bm_fannkuch.py                  74.79 ->      74.11 :      -0.68 =  -0.909% (+/-0.00%)
bm_fft.py                     2336.12 ->    2334.83 :      -1.29 =  -0.055% (+/-0.00%)
bm_float.py                   5681.45 ->    5698.28 :     +16.83 =  +0.296% (+/-0.00%)
bm_hexiom.py                    45.28 ->      46.95 :      +1.67 =  +3.688% (+/-0.00%)
bm_nqueens.py                 4227.93 ->    4243.20 :     +15.27 =  +0.361% (+/-0.00%)
bm_pidigits.py                 645.60 ->     645.34 :      -0.26 =  -0.040% (+/-0.38%)
bm_wordcount.py                 48.47 ->      48.18 :      -0.29 =  
8000
-0.598% (+/-0.00%)
core_import_mpy_multi.py       421.16 ->     613.36 :    +192.20 = +45.636% (+/-0.01%)
core_import_mpy_single.py       63.15 ->      97.64 :     +34.49 = +54.616% (+/-0.11%)
core_locals.py                  41.92 ->      41.29 :      -0.63 =  -1.503% (+/-0.00%)
core_qstr.py                    64.10 ->     213.09 :    +148.99 = +232.434% (+/-0.01%)
core_str.py                      9.99 ->      29.85 :     +19.86 = +198.799% (+/-0.00%)
core_yield_from.py             353.06 ->     355.14 :      +2.08 =  +0.589% (+/-0.00%)
misc_aes.py                    406.01 ->     407.21 :      +1.20 =  +0.296% (+/-0.01%)
misc_mandel.py                3048.72 ->    3031.93 :     -16.79 =  -0.551% (+/-0.01%)
misc_pystone.py               2419.63 ->    2486.84 :     +67.21 =  +2.778% (+/-0.01%)
misc_raytrace.py               375.08 ->     365.74 :      -9.34 =  -2.490% (+/-0.01%)

This work was funded through GitHub Sponsors.

github-actions · 2023-10-13T04:18:12Z

Code size report:

   bare-arm:  +136 +0.240% 
minimal x86:  +312 +0.167% [incl +64(data)]
   unix x64:  +272 +0.034% standard[incl +32(data)]
      stm32:  +176 +0.045% PYBV10
     mimxrt:  +200 +0.055% TEENSY40
        rp2:  +144 +0.044% RPI_PICO
       samd:  +160 +0.061% ADAFRUIT_ITSYBITSY_M4_EXPRESS

jimmo · 2023-10-13T04:46:51Z

+136 bytes on bare-arm is 6*4 bytes for the extra pool, 84 bytes for the binary search, 28 bytes for total_prev_len & is_sorted now being a bitfield.

tannewt · 2023-10-13T05:34:40Z

Would it be faster to sort and lookup by hash instead of value?

dpgeorge · 2023-10-13T05:38:01Z

In the original PR for this I did compare sorting string value vs hash, see #6896 (comment)

dpgeorge · 2023-10-13T05:44:19Z

Would it be faster to sort and lookup by hash instead of value?

In summary, using a hash is not really any faster and a fair bit more complicated. There is a very detailed discussion and size/performance numbers in #6896.

codecov · 2023-10-13T06:30:24Z

Codecov Report

Merging #12678 (64c79a5) into master (2fda94c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #12678   +/-   ##
=======================================
  Coverage   98.39%   98.39%           
=======================================
  Files         158      158           
  Lines       20965    20973    +8     
=======================================
+ Hits        20629    20637    +8     
  Misses        336      336

Files	Coverage Δ
py/persistentcode.c	`96.57% <ø> (ø)`
py/qstr.c	`95.45% <100.00%> (+0.24%)`	⬆️

projectgus

I don't know the full history before this PR, but this change looks good and the performance numbers look good!

py/makeqstrdata.py

projectgus · 2023-10-17T06:43:07Z

py/makeqstrdata.py

@@ -52,7 +52,8 @@
 codepoint2name[ord("|")] = "pipe"
 codepoint2name[ord("~")] = "tilde"

-# static qstrs, should be sorted
+# static qstrs, these must maintain a specific order for .mpy compatibility


Maybe more extreme version of the previous idea, but what about adding a constant lookup table (equivalent of mp_binary_op_method_name) for these? That would add 157 byte code size, but also allow this whole qstr pool to be sorted. In the next .mpy version bump the lookup table could be removed to get the code size back.

(This would also potentially allow deleting the !sorted case from the search lookup code as all pools would be sorted, wouldn't it?)

I like this idea! It might be easier just to go with three sorted pools (because this list is almost sorted) as described above.

Unfortunately, any runtime qstrs are not sorted (because the pools are created incrementally), so the !sorted case needs to stay.

I'm guessing it's not viable to insertion sort the runtime pool? EDIT: Of course it's not, the addresses need to be stable. Oh well.

I'll stop proposing ever more complex changes now, I promise. 😆

@projectgus

The main problem is that we return the qstr id at insertion time. This id is the index across all the pools (i.e. the first pool contains qstr 0 -> N, the second contains N+1 -> M, etc).

So sorting would require some way to know the final index in the pool, even though the sort ordering might change with further insertions.

There are easy ways to solve this with extra indirection of course, but that would be an extra 2*sizeof(qstr) per entry in the runtime tables. Maybe this is worth considering?

Also other ideas around some sort of hash-table-like approach. The problem with hashtables though is that the main expensive operation we perform on the pool is set-existence (i.e. we have this new string from somewhere, and we're trying to decide whether it's an existing qstr or not). And worse, the expected outcome for any given pool is negative, which is pathological for high-load-factor hashtables.

These tests are designed to measure changes in performance relating to: - string interning / searching for existing strings - map lookup - string operations - string hashing This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

This test depends on the order in which qstrs are stored in ROM, which affects the order in which `dir()` will probe the object to see what it supports. Because of the lazy-loading in asyncio/__init__.py, if it tries to do e.g. `wait_for_ms` before `funcs` then it will import funcs, making `funcs` later succeed. But in the other way around, `funcs` will initially not be found. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

Required by upcoming qstr sorting. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

@amirgon

This provides a significant performance boost for qstr_find_strn, which is called a lot during parsing and loading of .mpy files, as well as interning of string objects (which happens in most string methods that return new strings). Also adds comments to explain the "static" qstrs. These are part of the .mpy ABI and avoid needing to duplicate string data for QSTRs known to already be in the firmware. The static pool isn't currently sorted, but in the future we could either split the static pool into the sorted regions, or in the next .mpy version just sort them. Based on initial work done by @amirgon in micropython#6896. This work was funded through GitHub Sponsors. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

jimmo mentioned this pull request Oct 13, 2023

py/qstr: qstr performance improvements. #10758

Closed

jimmo force-pushed the qstr-sorted branch from a561bf6 to b744255 Compare October 13, 2023 04:46

dpgeorge added the py-core Relates to py/ directory in source label Oct 13, 2023

projectgus reviewed Oct 17, 2023

View reviewed changes

jimmo force-pushed the qstr-sorted branch from b744255 to b1ffb5a Compare October 24, 2023 02:28

jimmo force-pushed the qstr-sorted branch from b1ffb5a to 65fc281 Compare October 27, 2023 06:00

dpgeorge force-pushed the qstr-sorted branch from 828ab0b to 6d7b145 Compare October 29, 2023 23:40

jimmo added 3 commits October 30, 2023 11:10

bare-arm/lib: Add minimal strncmp implementation.

e910533

Required by upcoming qstr sorting. Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

dpgeorge force-pushed the qstr-sorted branch from 6d7b145 to 64c79a5 Compare October 30, 2023 00:10

dpgeorge merged commit 64c79a5 into micropython:master Oct 30, 2023

This was referenced Oct 30, 2023

py: Faster qstr search #6896

Closed

py/qstr: Disable qstr hashing on low-flash/mem boards. #12835

Merged

romanz mentioned this pull request Apr 9, 2025

Cherry-pick sorted qstr pool support trezor/micropython#20

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py/qstr: Add support for sorted qstr pools. #12678

py/qstr: Add support for sorted qstr pools. #12678

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

py/qstr: Add support for sorted qstr pools. #12678

py/qstr: Add support for sorted qstr pools. #12678

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!