8000 Additional function arg unpacking (PEP 448) by dlech · Pull Request #5807 · micropython/micropython · GitHub
[go: up one dir, main page]

Skip to content

Additional function arg unpacking (PEP 448) #5807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

dlech
Copy link
Contributor
@dlech dlech commented Mar 26, 2020

This is an attempt to implement PEP 448 for function arguments. It allows multiple *args and **args to be unpacked. (Popular request in #1329).

I'm curious to see how it affects code size and performance.

There are still some todo items:

  • There is still a null object emitted in the bytecodes that used to be the **args that needs to be removed
  • There are probably edge cases where memory allocation is wrong, especially in the case where one or more of the *args are iterators instead of list/tuple.

@dlech dlech force-pushed the pep-448 branch 3 times, most recently from 84a6ff9 to 129849f Compare March 27, 2020 01:59
@tve
Copy link
Contributor
tve commented Mar 27, 2020

might want to fix the typo in the title ;-)

@dlech dlech changed the title Additional function arg unacking (PEP 448) Additional function arg unpacking (PEP 448) Mar 27, 2020
@dlech
Copy link
Contributor Author
dlech commented Mar 27, 2020

If that is my worst mistake, we are in good shape 😆

@dpgeorge
Copy link
Member

Thanks for working on this. It's a tough problem, especially to make it efficient (in RAM usage, execution time, code size). Properly reviewing this would take some time.

For the record, the code size change is:

   bare-arm:   +48 +0.072% 
minimal x86:  +128 +0.086% 
   unix x64:  +144 +0.029% 
unix nanbox:  +272 +0.061% 
      stm32:   +52 +0.013% PYBV10
     cc3200:   +48 +0.026% 
    esp8266:   +72 +0.011% GENERIC
      esp32:   +36 +0.003% GENERIC[incl -32(data)]
        nrf:   +52 +0.036% pca10040
       samd:   +48 +0.047% ADAFRUIT_ITSYBITSY_M4_EXPRESS

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Mar 28, 2020
@dpgeorge dpgeorge mentioned this pull request Oct 28, 2021
50 tasks
tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Jan 7, 2022
…coded-samples

audiomp3: add decoded_samples property
@dpgeorge
Copy link
Member
dpgeorge commented Mar 1, 2022

Can you please rebase this on latest master?

@dlech dlech force-pushed the pep-448 branch 3 times, most recently from b50221d to d6aa8de Compare March 4, 2022 21:37
@dlech
Copy link
Contributor Author
dlech commented Mar 4, 2022

Rebased.

@dlech
Copy link
Contributor Author
dlech commented Mar 4, 2022
  • There is still a null object emitted in the bytecodes that used to be the **args that needs to be removed

Added a commit to address this todo item.

@dpgeorge
Copy link
Member
8000

There is still a null object emitted in the bytecodes that used to be the **args that needs to be removed

Added a commit to address this todo item.

IMO it makes sense to squash this commit into the first one, so that the cmd_showbc test does not change so much and then get reverted. That would make it a bit clearer as to the impact of this change, and how it changes generated bytecode.

There are probably edge cases where memory allocation is wrong, especially in the case where one or more of the *args are iterators instead of list/tuple.

Do you mean it might segfault in an edge case? Or that it simply makes a bad guess at how much memory to preallocate? As far as I can tell it all looks OK.


In fact this has exposed a bug in current master, the following will segfault:

def f(*a, **b):
    print(a)
    print(b)
f(*range(100), **{str(i):i for i in range(100)})

With this PR, that bug is fixed.

@dpgeorge
Copy link
Member

I think the general approach here is very clever. When *seq or **dict is used as an argument the Python stack looks like this:

fun(arg0, arg1, *seq, key0=value0, key1=value1, **dict)
old: fun arg0 arg1 key0 value0 key1 value1 seq dict
new: fun arg0 arg1 seq key0 value0 key1 value1 NULL dict 4

For simple cases:

fun(*seq)
old: fun seq NULL
new: fun seq 1
fun(**dict)
old: fun NULL dict
new: fun NULL dict 0

Also in the case of using **dict the bytecode for the opcode increased by 1 byte because in now needs 2 bytes to encode the unum = (npos | nkw <<8) value. As such fun(**dict) uses 2 extra bytes in total in the bytecode (one for the unum, one for the seq bitmap).

I think this can be optimised. Eg if npos==0 then don't store the bitmap. And if npos==0 and nkw==0 then implicitly add 1 to nkw.

@dlech
Copy link
Contributor Author
dlech commented Mar 10, 2022

IMO it makes sense to squash this commit into the first one,

Agreed.

Do you mean it might segfault in an edge case? Or that it simply makes a bad guess at how much memory to preallocate?

The latter.

I think this can be optimised. Eg if npos==0 then don't store the bitmap. And if npos==0 and nkw==0 then implicitly add 1 to nkw.

I will have a look.

@dlech
Copy link
Contributor Author
dlech commented Mar 24, 2022

I have updated this with the suggested changes. With regard to the suggested optimizations, I did it a bit differently. " if npos==0 then don't store the bitmap." was simple enough but "And if npos==0 and nkw==0 then implicitly add 1 to nkw." got messy since it requires checking for this in many places. So, I figured that if we were going to add all of checks everywhere anyway, we could do it slightly differently and optimize (at least in terms of fewer bytes emitted) more cases. See the commit message for the full explanation.

Given that arg unpacking is probably not frequently used, I'm not sure the trade off in compiler size/complexity is worth the savings in emitted byte code size.

@dpgeorge
Copy link
Member

Given that arg unpacking is probably not frequently used, I'm not sure the trade off in compiler size/complexity is worth the savings in emitted byte code size.

You might be right, it does look complicated. I will review it.

@peterhinch
Copy link
Contributor

FWIW I think this is a very useful feature - subject to code size, of course :)

For example in GUI code, widgets may have multiple constructor args to define colors, shapes, fonts, sizes, callbacks etc. A dict enables a set of common attributes to be assigned to multiple widgets. The ability to assign more than one dict would be useful. You might have one to define attributes common to all controls and another for additional attributes applicable only to (say) Meter instances.

return x


f4(*print_ret(["a", "b"]), kw_arg=print_ret(None))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test should go in tests/baics/fun_kwvarargs.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dpgeorge
Copy link
Member

Code size change for supporting multiple * and ** args is:

   bare-arm:   +80 +0.139% 
minimal x86:  +133 +0.081% 
   unix x64:  +144 +0.028% 
unix nanbox:  +300 +0.065% 
      stm32:   +68 +0.017% PYBV10
     cc3200:   +80 +0.044% 
    esp8266:   +92 +0.013% GENERIC
      esp32:   +92 +0.006% GENERIC[incl -16(data)]
        nrf:   +16 +0.009% pca10040
        rp2:   +48 +0.010% PICO
       samd:   +64 +0.046% ADAFRUIT_ITSYBITSY_M4_EXPRESS

Then the additional code size change on top of that for the encoding optimisation is:

   bare-arm:  +120 +0.208% 
minimal x86:  +204 +0.124% 
   unix x64:  +374 +0.071% 
unix nanbox:  +592 +0.128% 
      stm32:  +160 +0.041% PYBV10
     cc3200:  +128 +0.070% 
    esp8266:  +156 +0.022% GENERIC
      esp32:  +188 +0.012% GENERIC
        nrf:  +148 +0.085% pca10040
        rp2:  +136 +0.028% PICO
       samd:   -56 -0.040% ADAFRUIT_ITSYBITSY_M4_EXPRESS

I'd say that optimisation is not worth it. It's relatively rare to use * / ** args and there are better ways than that to spend bytes on reducing/optimising the bytecode.

So please remove that encoding optimisation commit from this PR. That should also fix the CI issues with native emitter.

py/vm.c Outdated
@@ -927,8 +927,8 @@ unwind_jump:;
// unum & 0xff == n_positional
// (unum >> 8) & 0xff == n_keyword
// We have following stack layout here:
// fun arg0 arg1 ... kw0 val0 kw1 val1 ... seq dict <- TOS
sp -= (unum & 0xff) + ((unum >> 7) & 0x1fe) + 2;
// fun arg0 arg1 ... kw0 val0 kw1 val1 ... bitmap <- TOS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this commit the bitmap does not exist so it should be seq instead. I.e., this commit should just delete the dict entry. Then in the next commit, seq is renamed to bitmap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dpgeorge
Copy link
Member

Performance change (without the encoding optimisation commit), on PYBv1.0:

diff of scores (higher is better)
N=100 M=100                        p0 ->         p1         diff      diff% (error%)
bm_chaos.py                    351.40 ->     355.38 :      +3.98 =  +1.133% (+/-0.01%)
bm_fannkuch.py                  77.31 ->      77.98 :      +0.67 =  +0.867% 
6D40
(+/-0.01%)
bm_fft.py                     2526.30 ->    2549.84 :     +23.54 =  +0.932% (+/-0.00%)
bm_float.py                   5905.82 ->    5939.79 :     +33.97 =  +0.575% (+/-0.00%)
bm_hexiom.py                    47.17 ->      47.27 :      +0.10 =  +0.212% (+/-0.01%)
bm_nqueens.py                 4419.24 ->    4451.92 :     +32.68 =  +0.739% (+/-0.00%)
bm_pidigits.py                 638.01 ->     640.70 :      +2.69 =  +0.422% (+/-0.23%)
core_import_mpy_multi.py       553.34 ->     556.64 :      +3.30 =  +0.596% (+/-0.01%)
core_import_mpy_single.py       68.36 ->      68.69 :      +0.33 =  +0.483% (+/-0.01%)
core_qstr.py                    63.73 ->      63.86 :      +0.13 =  +0.204% (+/-0.01%)
core_yield_from.py             356.12 ->     355.00 :      -1.12 =  -0.315% (+/-0.00%)
misc_aes.py                    411.90 ->     413.86 :      +1.96 =  +0.476% (+/-0.01%)
misc_mandel.py                3350.85 ->    3414.97 :     +64.12 =  +1.914% (+/-0.00%)
misc_pystone.py               2225.86 ->    2232.44 :      +6.58 =  +0.296% (+/-0.00%)
misc_raytrace.py               368.11 ->     372.18 :      +4.07 =  +1.106% (+/-0.00%)
viper_call0.py                 572.83 ->     572.73 :      -0.10 =  -0.017% (+/-0.00%)
viper_call1a.py                546.71 ->     546.71 :      +0.00 =  +0.000% (+/-0.00%)
viper_call1b.py                436.01 ->     435.91 :      -0.10 =  -0.023% (+/-0.01%)
viper_call1c.py                440.59 ->     440.51 :      -0.08 =  -0.018% (+/-0.00%)
viper_call2a.py                532.82 ->     532.85 :      +0.03 =  +0.006% (+/-0.01%)
viper_call2b.py                377.24 ->     377.17 :      -0.07 =  -0.019% (+/-0.00%)

@dlech
Copy link
Contributor Author
dlech commented Mar 29, 2022

I have made the requested changes, dropped the extra commits and rebased.

@codecov-commenter
Copy link
codecov-commenter commented Mar 29, 2022

Codecov Report

Merging #5807 (dcd0bdb) into master (5e685a9) will decrease coverage by 0.00%.
The diff coverage is 98.94%.

❗ Current head dcd0bdb differs from pull request most recent head 12755dc. Consider uploading reports for the commit 12755dc to get more accurate results

@@            Coverage Diff             @@
##           master    #5807      +/-   ##
==========================================
- Coverage   98.24%   98.24%   -0.01%     
==========================================
  Files         154      154              
  Lines       20273    20286      +13     
==========================================
+ Hits        19918    19929      +11     
- Misses        355      357       +2     
Impacted Files Coverage Δ
py/compile.c 99.81% <94.44%> (-0.13%) ⬇️
py/emitbc.c 100.00% <100.00%> (ø)
py/emitnative.c 99.31% <100.00%> (ø)
py/runtime.c 99.72% <100.00%> (+<0.01%) ⬆️
py/vm.c 98.98% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e685a9...12755dc. Read the comment docs.

@dlech
Copy link
Contributor Author
dlech commented Mar 29, 2022

The changes in code coverage show some potential issues:

  • this branch is not hit because the cpydiff isn't run in the coverage tests.
  • this branch and this branch is no longer hit because we are now allocating one mp_obj_t for each kw arg, including the ** args plus the size of the ** arg if available. So we are allocating one extra mp_obj_t that isn't being used for each ** arg. Probably the same applies to *args as well.

@dlech
Copy link
Contributor Author
dlech commented Mar 29, 2022

I have added a commit to address the overallocation issue. But it doesn't seem to affect the code coverage, so there is probably something else I am missing.

@dlech
Copy link
Contributor Author
dlech commented Mar 29, 2022

I've added a few more commits that should take care of the dropped coverage issues. So the only question that remains is should we add a test that duplicates the cpydiff to get the last bit of coverage or should we just include all cpydiff in the coverage run or don't worry about coverage for that branch?

This is a partial implementation of PEP 448 to allow multiple **
unpackings when calling a function or method.

The compiler is modified to encode the argument as a None: obj key-value
pair (similar to how regular keyword arguments are encoded as str: obj
pairs). The extra object that was pushed on the stack to hold a single
** unpacking object is no longer used and is removed.

The runtime is modified to decode this new format.

Signed-off-by: David Lechner <david@pybricks.com>
@dlech dlech force-pushed the pep-448 branch 2 times, most recently from dcd0bdb to 12755dc Compare March 29, 2022 19:14
@dpgeorge
Copy link
Member

So the only question that remains is should we add a test that duplicates the cpydiff to get the last bit of coverage

Yes I think we should add another test for this in the main test suite (this is to check when there are too many positional args, right?). Because cpydiff tests are intended to be changed/removed if/when the feature becomes compatible with CPython. Thus cpydiff tests cannot be relied upon for coverage.

dlech and others added 4 commits March 30, 2022 10:49
This is a partial implementation of PEP 448 to allow unpacking multiple
star args in a function or method call.

This is implemented by changing the emitted bytecodes so that both
positional args and star args are stored as positional args. A bitmap
is added to indicate if an argument at a given position is a positional
argument or a star arg.

In the bytecodes, this new bitmap takes the place of the old star arg.
It is stored as a small int, so this means only the first N arguments
can be star args where N is the number of bits in a small int.

The runtime is modified to interpret this new bytecode format while
still trying to perform as few memory reallocations as possible.

Signed-off-by: David Lechner <david@pybricks.com>
This fixes overallocating an extra mp_obj_t when the length of *args and
**args is known. Previously we were allocating 1 mp_obj_t for each
n_args and n_kw plus the length of each *arg and **arg (if they are
known). Since n_args includes *args and n_kw includes **args, this was
allocating an extra mp_obj_t in addition to the length of these args
when unpacked.

To fix this, we just subtract 1 from the length to account for the 1
already implicitly allocated by n_args and n_kw.

Signed-off-by: David Lechner <david@pybricks.com>
To reach this check, n_kw has to be >= 1 and therefore args2_alloc has
to be >= 2. Therefore new_alloc will always be >= 4. So this check will
never be true and can be removed.

Signed-off-by: David Lechner <david@pybricks.com>
This fixes code coverage for the case where a *arg without __len__ is
unpacked and uses exactly the amount of memory that was allocated for
kw args. This triggers the code branch where the memory for the kw args
gets reallocated since it was used already by the *arg unpacking.

Signed-off-by: David Lechner <david@pybricks.com>
This replaces instances of uint with size_t and int with ssize_t in
the mp_call_prepare_args_n_kw_var() function since all of the variables
are used as array offsets.

Also sort headers while we are touching this.

Signed-off-by: David Lechner <david@pybricks.com>
@dpgeorge
Copy link
Member

Thanks for the updates. This is looking very good now. Merged in 1e99d29 through 2e3f204

@dpgeorge dpgeorge closed this Mar 31, 2022
@dpgeorge
Copy link
Member

I have a few extra coverage tests to add in a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0