-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads #32209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I added a couple of comments and code style suggestions.
Co-authored-by: Christian Heimes <christian@python.org>
Thanks for the review @tiran! (And for the encouragement to make this PR.) |
…more conservative with errors in CheckEmscriptenSignals
Co-authored-by: Christian Heimes <christian@python.org>
Python/ceval.c
Outdated
@@ -1292,11 +1292,33 @@ eval_frame_handle_pending(PyThreadState *tstate) | |||
} | |||
|
|||
#define CHECK_EVAL_BREAKER() \ | |||
CHECK_EMSCRIPTEN_SIGNALS(); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't emscripten signals set the eval_breaker
variable, like all other platforms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the "shared-nothing" threading model. There is no preemptive multitasking so the thread that Python is running in can't be responsible for writing to the memory when Python code is running. The thread handling the UI cannot write into the Python/wasm memory because it doesn't have access to it. The embedder has to manually create a single byte of shared memory between the UI and the Python thread to handle the signal. But this byte lives outside the wasm memory and cannot be accessed via a pointer (so it can't be accessed directly from C at all without calling out into Javascript).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's how browser's web workers are implemented. Browsers are weird platforms and have all sorts of limitations in place to protect users. Browsers have SharedArrayBuffer disabled to mitigate SPECTRE-like attacks. Without SharedArrayBuffer we need another way to modify a flag from JavaScript code and set it in WASM heap. The original patch https://github.com/pyodide/pyodide/blob/main/cpython/patches/0007-Patch-in-keyboard-interrupt-handling.patch explains the reason in more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems odd.
So, you can read a shared buffer from javascript, but not web assembly? But you can call javascript from web assembly, though? So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?
Yes. There are plans to eventually lift this limitation some time in the next few years.
https://github.com/WebAssembly/multi-memory/blob/main/proposals/multi-memory/Overview.md
But we also need to get llvm + emscripten support, so even once wasm adds the feature it will take a while before it is accessible via our compiler toolchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I just wanted to make sure I understood this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To describe the situation more completely:
- Memory buffers can either be marked shared or private
- shared memory has worse performance than unshared memory because of mitigations for cache timing sidechannel attacks like Spectre and Heartbleed
- wasm currently only allows one memory
For performance reasons, we want the one memory to be private. So we're stuck with what this PR does.
With multiple memories, we could have a private memory and a shared memory. We would need some attribute that asks the linker to allocate certain things in the shared memory, and then we could mark eval_breaker
and similar signal flags with this "shared" attribute. In any case this is blocked on the wasm multiple memories proposal.
Would it possible to move the web assembly code to its own file(s)? That way we would need fewer |
Python/ceval.c
Outdated
extern int Py_EMSCRIPTEN_SIGNAL_HANDLING; | ||
void _Py_CheckEmscriptenSignals(void); | ||
|
||
#define PY_EMSCRIPTEN_SIGNAL_INTERVAL 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would increase this a lot, as it is hot code. In CPython running natively, it gets hit several million times per second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We benchmark this in Pyodide and surprisingly see no observable performance cost. These are benchmarks from our most recent commit in Pyodide. The columns marked (w/ ib)
do this extra check once every 50 instructions. As you can see there is no measured performance impact. I agree it's not what I expected, but I guess this check is very fast?
Interrupt buffer benchmarks from Pyodide main branch
- pystone
native: 0.198507 firefox: 0.732000 firefox(w/ ib): 0.752000 chrome: 0.646333 chrome(w/ ib): 0.656367
- allpairs_distances
native: 0.000153 firefox: 0.001333 firefox(w/ ib): 0.001111 chrome: 0.001322 chrome(w/ ib): 0.001344
- allpairs_distances_loops
native: 0.021808 firefox: 0.049889 firefox(w/ ib): 0.050667 chrome: 0.040656 chrome(w/ ib): 0.041411
- arc_distance
native: 0.001230 firefox: 0.003111 firefox(w/ ib): 0.003111 chrome: 0.003300 chrome(w/ ib): 0.003322
- check_mask
native: 0.003151 firefox: 0.011444 firefox(w/ ib): 0.011667 chrome: 0.008900 chrome(w/ ib): 0.009122
- create_grid
native: 0.011174 firefox: 0.013444 firefox(w/ ib): 0.013556 chrome: 0.013822 chrome(w/ ib): 0.013800
- cronbach
native: 0.005441 firefox: 0.009111 firefox(w/ ib): 0.009000 chrome: 0.009744 chrome(w/ ib): 0.009700
- diffusion
native: 0.021249 firefox: 0.051556 firefox(w/ ib): 0.051667 chrome: 0.052778 chrome(w/ ib): 0.052867
- evolve
native: 0.017652 firefox: 0.016000 firefox(w/ ib): 0.015667 chrome: 0.016867 chrome(w/ ib): 0.017067
- fdtd
native: 0.010733 firefox: 0.031333 firefox(w/ ib): 0.031333 chrome: 0.028111 chrome(w/ ib): 0.027500
- grayscott
native: 0.022387 firefox: 0.079667 firefox(w/ ib): 0.079889 chrome: 0.086411 chrome(w/ ib): 0.087311
- grouping
native: 0.006726 firefox: 0.013444 firefox(w/ ib): 0.013222 chrome: 0.012833 chrome(w/ ib): 0.012800
- growcut
native: 0.011023 firefox: 0.031889 firefox(w/ ib): 0.032222 chrome: 0.030489 chrome(w/ ib): 0.030322
- harris
native: 0.053286 firefox: 0.020444 firefox(w/ ib): 0.020222 chrome: 0.025444 chrome(w/ ib): 0.025522
- hasting
native: 0.000019 firefox: 0.000000 firefox(w/ ib): 0.000000 chrome: 0.000044 chrome(w/ ib): 0.000033
- julia
native: 0.001702 firefox: 0.004444 firefox(w/ ib): 0.004556 chrome: 0.004400 chrome(w/ ib): 0.004400
- l2norm
native: 0.002951 firefox: 0.006333 firefox(w/ ib): 0.006000 chrome: 0.006011 chrome(w/ ib): 0.006033
- large_decimal_list
native: 0.000359 firefox: 0.001222 firefox(w/ ib): 0.000889 chrome: 0.001100 chrome(w/ ib): 0.001144
- local_maxima
native: 0.014963 firefox: 0.054889 firefox(w/ ib): 0.056889 chrome: 0.044867 chrome(w/ ib): 0.045389
- log_likelihood
native: 0.008437 firefox: 0.014889 firefox(w/ ib): 0.015111 chrome: 0.015267 chrome(w/ ib): 0.015089
- lstsqr
native: 0.029089 firefox: 0.020667 firefox(w/ ib): 0.019333 chrome: 0.020833 chrome(w/ ib): 0.019722
- mandel
native: 0.026955 firefox: 0.090667 firefox(w/ ib): 0.091111 chrome: 0.097089 chrome(w/ ib): 0.095433
- multiple_sum
native: 0.007375 firefox: 0.023222 firefox(w/ ib): 0.023333 chrome: 0.023867 chrome(w/ ib): 0.023722
- pairwise_loop
native: 0.009979 firefox: 0.026889 firefox(w/ ib): 0.026667 chrome: 0.025222 chrome(w/ ib): 0.025567
- periodic_dist
native: 0.008906 firefox: 0.011444 firefox(w/ ib): 0.011333 chrome: 0.010033 chrome(w/ ib): 0.010133
- repeating
native: 0.009739 firefox: 0.009667 firefox(w/ ib): 0.009222 chrome: 0.007889 chrome(w/ ib): 0.007467
- reverse_cumsum
native: 0.028820 firefox: 0.014222 firefox(w/ ib): 0.013667 chrome: 0.014056 chrome(w/ ib): 0.013911
- rosen
native: 0.083447 firefox: 0.043000 firefox(w/ ib): 0.042556 chrome: 0.046611 chrome(w/ ib): 0.045756
- slowparts
native: 0.013151 firefox: 0.034222 firefox(w/ ib): 0.035000 chrome: 0.028389 chrome(w/ ib): 0.028656
- smoothing
native: 0.000003 firefox: 0.000000 firefox(w/ ib): 0.000000 chrome: 0.000011 chrome(w/ ib): 0.000000
- specialconvolve
native: 0.037426 firefox: 0.024556 firefox(w/ ib): 0.024556 chrome: 0.026622 chrome(w/ ib): 0.025900
- vibr_energy
native: 0.005246 firefox: 0.009889 firefox(w/ ib): 0.009778 chrome: 0.010333 chrome(w/ ib): 0.010356
- wave
native: 0.003782 firefox: 0.008889 firefox(w/ ib): 0.009111 chrome: 0.007578 chrome(w/ ib): 0.007756
- canvas_custom_font
native: nan firefox: 0.967778 firefox(w/ ib): 0.992778 chrome: 0.751767 chrome(w/ ib): 0.739433
- canvas_image
native: nan firefox: 0.644778 firefox(w/ ib): 0.656556 chrome: 0.557889 chrome(w/ ib): 0.555289
- canvas_image_affine
native: nan firefox: 1.599778 firefox(w/ ib): 1.626333 chrome: 1.264300 chrome(w/ ib): 1.272900
- canvas_rendering
native: nan firefox: 0.959444 firefox(w/ ib): 0.969889 chrome: 0.724700 chrome(w/ ib): 0.724033
- canvas_text_rotated
native: nan firefox: 1.342222 firefox(w/ ib): 1.406556 chrome: 1.055044 chrome(w/ ib): 1.050233
- wasm_custom_font
native: nan firefox: 0.613778 firefox(w/ ib): 0.614333 chrome: 0.501633 chrome(w/ ib): 0.497000
- wasm_image
native: nan firefox: 0.514444 firefox(w/ ib): 0.516222 chrome: 0.440578 chrome(w/ ib): 0.433600
- wasm_image_affine
native: nan firefox: 1.348667 firefox(w/ ib): 1.386889 chrome: 1.128511 chrome(w/ ib): 1.135756
- wasm_rendering
native: nan firefox: 0.586778 firefox(w/ ib): 0.586778 chrome: 0.485778 chrome(w/ ib): 0.470600
- wasm_text_rotated
native: nan firefox: 0.665778 firefox(w/ ib): 0.658444 chrome: 0.544767 chrome(w/ ib): 0.533867
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to build Pyodide with Python 3.11.0a6 and this patch and see if I can benchmark the final patch, but I suspect it won't be dramatically different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the JS gets inlined. Does it differ much from browser to browser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that I understand that table.
w/ ib
is this branch? Could you normalize the numbers?
The pyiodide main branch is based on 3.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the JS gets inlined.
Yeah something like this. Browser JITs can do amazing stuff sometimes, and making a read from a buffer like this is rather JIT friendly.
Does it differ much from browser to browser?
We are only benchmarking on Chrome and Firefox, but there isn't a big difference. On both Chrome and Firefox, it looks very similar with and without this change. I'm not sure why we don't benchmark Node...
The pyiodide main branch is based on 3.11?
Nope, main branch is 3.10. If the check is happening much more often in 3.11 then maybe the benchmark won't be accurate. I haven't successfully built Pyodide against Python 3.11 yet, working on it.
Could you normalize the numbers?
Yeah when I get a chance I will compute the geometric mean of the ratios with and without this check across benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the check is happening much more often in 3.11...
The check will happen more often, because 3.11 is faster 🙂 But no more often per work done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'd love to know how pyiodide 3.11 compares to 3.10, but that's way off topic for this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the ratio of (with signal polling) / (without signal polling) for each benchmark and took the geometric mean across all the benchmarks for three different trials (running the full benchmark suite) and got:
firefox: 0.996, 0.989, 1.008
chrome : 0.997, 1.012, 0.992
So either this patch has no performance impact or I am measuring wrong. This is benchmarked on Python 3.10.2.
I'd love to know how pyiodide 3.11 compares to 3.10
Right now I have the issue that I can't build numpy v1.21.4 against 3.11.0a6 because:
numpy/core/src/multiarray/arraytypes.c.src:65:26: error: expression is not assignable
Py_TYPE(&new_fields) = NULL;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With @tiran's assistance I did manage to get Python 3.11.0a6 working with Pyodide though, and the current patch passes all our signal handling tests.
Yeah that makes sense to me. Do you have a model for what this should look like? Or I can just guess. What I can do is define a separate |
Here you are https://github.com/python/cpython/compare/main...tiran:platform_objs?expand=1 |
… emscripten-signals
Co-authored-by: Brett Cannon <brett@python.org>
The PR needs a new blurb. You can either use the CLI tool "blurb" from PyPI or the blurb-it app (click on details next to |
Co-authored-by: Christian Heimes <christian@python.org>
Thanks for your contribution, @hoodmane ! Much appreciated. I'm merging the PR now so it gets into the next alpha release. |
https://bugs.python.org/issue47176
I put the custom behavior behind a runtime flag
Py_EMSCRIPTEN_SIGNAL_HANDLING
. Pyodide's public API allows toggling signal handling on and off at runtime, so this flag allows us to maintain that same behavior. This patch rebases cleanly onto Python 3.10.2 except forceval.c
whereCHECK_EVAL_BREAKER
is new in Python 3.11.See pyodide/pyodide#2332 where a similar patch is tested in Pyodide with Python 3.10.2. I tried to build Pyodide against Python 3.11.0a6 to test this exact version of the patch but I ran into a very large number of linker errors that I wasn't able to resolve so I gave up.
https://bugs.python.org/issue47176