8000 py/qstr: Add support for sorted qstr pools. by jimmo · Pull Request #12678 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

py/qstr: Add support for sorted qstr pools. #12678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 30, 2023

Conversation

jimmo
Copy link
Member
@jimmo jimmo commented Oct 13, 2023

This is just the sorted-pools part of #10758 (originally based on #6896).

$ ./run-perfbench.py -s ~/qstr-sorted-baseline ~/qstr-sorted-main-pool-sorted 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/qstr-sorted-baseline -> /home/jimmo/qstr-sorted-main-pool-sorted         diff      diff% (error%)
bm_chaos.py                    357.95 ->     351.21 :      -6.74 =  -1.883% (+/-0.01%)
bm_fannkuch.py                  74.79 ->      74.11 :      -0.68 =  -0.909% (+/-0.00%)
bm_fft.py                     2336.12 ->    2334.83 :      -1.29 =  -0.055% (+/-0.00%)
bm_float.py                   5681.45 ->    5698.28 :     +16.83 =  +0.296% (+/-0.00%)
bm_hexiom.py                    45.28 ->      46.95 :      +1.67 =  +3.688% (+/-0.00%)
bm_nqueens.py                 4227.93 ->    4243.20 :     +15.27 =  +0.361% (+/-0.00%)
bm_pidigits.py                 645.60 ->     645.34 :      -0.26 =  -0.040% (+/-0.38%)
bm_wordcount.py                 48.47 ->      48.18 :      -0.29 =  
8000
-0.598% (+/-0.00%)
core_import_mpy_multi.py       421.16 ->     613.36 :    +192.20 = +45.636% (+/-0.01%)
core_import_mpy_single.py       63.15 ->      97.64 :     +34.49 = +54.616% (+/-0.11%)
core_locals.py                  41.92 ->      41.29 :      -0.63 =  -1.503% (+/-0.00%)
core_qstr.py                    64.10 ->     213.09 :    +148.99 = +232.434% (+/-0.01%)
core_str.py                      9.99 ->      29.85 :     +19.86 = +198.799% (+/-0.00%)
core_yield_from.py             353.06 ->     355.14 :      +2.08 =  +0.589% (+/-0.00%)
misc_aes.py                    406.01 ->     407.21 :      +1.20 =  +0.296% (+/-0.01%)
misc_mandel.py                3048.72 ->    3031.93 :     -16.79 =  -0.551% (+/-0.01%)
misc_pystone.py               2419.63 ->    2486.84 :     +67.21 =  +2.778% (+/-0.01%)
misc_raytrace.py               375.08 ->     365.74 :      -9.34 =  -2.490% (+/-0.01%)

This work was funded through GitHub Sponsors.

@github-actions
Copy link
github-actions bot commented Oct 13, 2023

Code size report:

   bare-arm:  +136 +0.240% 
minimal x86:  +312 +0.167% [incl +64(data)]
   unix x64:  +272 +0.034% standard[incl +32(data)]
      stm32:  +176 +0.045% PYBV10
     mimxrt:  +200 +0.055% TEENSY40
        rp2:  +144 +0.044% RPI_PICO
       samd:  +160 +0.061% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@jimmo
Copy link
Member Author
jimmo commented Oct 13, 2023

+136 bytes on bare-arm is 6*4 bytes for the extra pool, 84 bytes for the binary search, 28 bytes for total_prev_len & is_sorted now being a bitfield.

@tannewt
Copy link
tannewt commented Oct 13, 2023

Would it be faster to sort and lookup by hash instead of value?

@dpgeorge
Copy link
Member

In the original PR for this I did compare sorting string value vs hash, see #6896 (comment)

@dpgeorge
Copy link
Member

Would it be faster to sort and lookup by hash instead of value?

In summary, using a hash is not really any faster and a fair bit more complicated. There is a very detailed discussion and size/performance numbers in #6896.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Oct 13, 2023
@codecov
Copy link
codecov bot commented Oct 13, 2023

Codecov Report

Merging #12678 (64c79a5) into master (2fda94c) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #12678   +/-   ##
=======================================
  Coverage   98.39%   98.39%           
=======================================
  Files         158      158           
  Lines       20965    20973    +8     
=======================================
+ Hits        20629    20637    +8     
  Misses        336      336           
Files Coverage Δ
py/persistentcode.c 96.57% <ø> (ø)
py/qstr.c 95.45% <100.00%> (+0.24%) ⬆️

Copy link
Contributor
@projectgus projectgus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the full history before this PR, but this change looks good and the performance numbers look good!

@@ -52,7 +52,8 @@
codepoint2name[ord("|")] = "pipe"
codepoint2name[ord("~")] = "tilde"

# static qstrs, should be sorted
# static qstrs, these must maintain a specific order for .mpy compatibility
Copy link
Contributor
@projectgus projectgus Oct 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more extreme version of the previous idea, but what about adding a constant lookup table (equivalent of mp_binary_op_method_name) for these? That would add 157 byte code size, but also allow this whole qstr pool to be sorted. In the next .mpy version bump the lookup table could be removed to get the code size back.

(This would also potentially allow deleting the !sorted case from the search lookup code as all pools would be sorted, wouldn't it?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea! It might be easier just to go with three sorted pools (because this list is almost sorted) as described above.

Unfortunately, any runtime qstrs are not sorted (because the pools are created incrementally), so the !sorted case needs to stay.

Copy link
Contributor
@projectgus projectgus Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it's not viable to insertion sort the runtime pool? EDIT: Of course it's not, the addresses need to be stable. Oh well.

I'll stop proposing ever more complex changes now, I promise. 😆

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@projectgus

The main problem is that we return the qstr id at insertion time. This id is the index across all the pools (i.e. the first pool contains qstr 0 -> N, the second contains N+1 -> M, etc).

So sorting would require some way to know the final index in the pool, even though the sort ordering might change with further insertions.

There are easy ways to solve this with extra indirection of course, but that would be an extra 2*sizeof(qstr) per entry in the runtime tables. Maybe this is worth considering?

Also other ideas around some sort of hash-table-like approach. The problem with hashtables though is that the main expensive operation we perform on the pool is set-existence (i.e. we have this new string from somewhere, and we're trying to decide whether it's an existing qstr or not). And worse, the expected outcome for any given pool is negative, which is pathological for high-load-factor hashtables.

These tests are designed to measure changes in performance relating to:
 - string interning / searching for existing strings
 - map lookup
 - string operations
 - string hashing

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
jimmo added 3 commits October 30, 2023 11:10
This test depends on the order in which qstrs are stored in ROM, which
affects the order in which `dir()` will probe the object to see what it
supports.  Because of the lazy-loading in asyncio/__init__.py, if it
tries to do e.g. `wait_for_ms` before `funcs` then it will import funcs,
making `funcs` later succeed. But in the other way around, `funcs` will
initially not be found.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Required by upcoming qstr sorting.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
This provides a significant performance boost for qstr_find_strn, which is
called a lot during parsing and loading of .mpy files, as well as interning
of string objects (which happens in most string methods that return new
strings).

Also adds comments to explain the "static" qstrs.  These are part of the
.mpy ABI and avoid needing to duplicate string data for QSTRs known to
already be in the firmware.  The static pool isn't currently sorted, but in
the future we could either split the static pool into the sorted regions,
or in the next .mpy version just sort them.

Based on initial work done by @amirgon in micropython#6896.

This work was funded through GitHub Sponsors.

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0