bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads #32209

hoodmane · 2022-03-31T05:00:06Z

https://bugs.python.org/issue47176

I put the custom behavior behind a runtime flag Py_EMSCRIPTEN_SIGNAL_HANDLING. Pyodide's public API allows toggling signal handling on and off at runtime, so this flag allows us to maintain that same behavior. This patch rebases cleanly onto Python 3.10.2 except for ceval.c where CHECK_EVAL_BREAKER is new in Python 3.11.

See pyodide/pyodide#2332 where a similar patch is tested in Pyodide with Python 3.10.2. I tried to build Pyodide against Python 3.11.0a6 to test this exact version of the patch but I ran into a very large number of linker errors that I wasn't able to resolve so I gave up.

https://bugs.python.org/issue47176

…reads

tiran

Thanks for the PR! I added a couple of comments and code style suggestions.

Modules/signalmodule.c

Python/ceval.c

Modules/signalmodule.c

Co-authored-by: Christian Heimes <christian@python.org>

hoodmane · 2022-03-31T15:26:33Z

Thanks for the review @tiran! (And for the encouragement to make this PR.)

…more conservative with errors in CheckEmscriptenSignals

Modules/signalmodule.c

Co-authored-by: Christian Heimes <christian@python.org>

markshannon · 2022-03-31T16:16:17Z

Python/ceval.c

@@ -1292,11 +1292,33 @@ eval_frame_handle_pending(PyThreadState *tstate)
    }

 #define CHECK_EVAL_BREAKER() \
+    CHECK_EMSCRIPTEN_SIGNALS(); \


Why can't emscripten signals set the eval_breaker variable, like all other platforms?

Because of the "shared-nothing" threading model. There is no preemptive multitasking so the thread that Python is running in can't be responsible for writing to the memory when Python code is running. The thread handling the UI cannot write into the Python/wasm memory because it doesn't have access to it. The embedder has to manually create a single byte of shared memory between the UI and the Python thread to handle the signal. But this byte lives outside the wasm memory and cannot be accessed via a pointer (so it can't be accessed directly from C at all without calling out into Javascript).

It's how browser's web workers are implemented. Browsers are weird platforms and have all sorts of limitations in place to protect users. Browsers have SharedArrayBuffer disabled to mitigate SPECTRE-like attacks. Without SharedArrayBuffer we need another way to modify a flag from JavaScript code and set it in WASM heap. The original patch https://github.com/pyodide/pyodide/blob/main/cpython/patches/0007-Patch-in-keyboard-interrupt-handling.patch explains the reason in more details.

This seems odd.

So, you can read a shared buffer from javascript, but not web assembly? But you can call javascript from web assembly, though? So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?

So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?

Yes. There are plans to eventually lift this limitation some time in the next few years.
https://github.com/WebAssembly/multi-memory/blob/main/proposals/multi-memory/Overview.md
But we also need to get llvm + emscripten support, so even once wasm adds the feature it will take a while before it is accessible via our compiler toolchain.

Sure. I just wanted to make sure I understood this correctly.

To describe the situation more completely:

Memory buffers can either be marked shared or private

shared memory has worse performance than unshared memory because of mitigations for cache timing sidechannel attacks like Spectre and Heartbleed

wasm currently only allows one memory

For performance reasons, we want the one memory to be private. So we're stuck with what this PR does.

With multiple memories, we could have a private memory and a shared memory. We would need some attribute that asks the linker to allocate certain things in the shared memory, and then we could mark eval_breaker and similar signal flags with this "shared" attribute. In any case this is blocked on the wasm multiple memories proposal.

markshannon · 2022-03-31T16:55:43Z

Would it possible to move the web assembly code to its own file(s)? That way we would need fewer #ifs.
This isn't a problem for the web assembly code, specifically. Just that as we add more platforms, the code gets buried under #ifdefs.
We should probably do this for other platforms as well.

markshannon · 2022-03-31T17:02:11Z

Python/ceval.c

+extern int Py_EMSCRIPTEN_SIGNAL_HANDLING;
+void _Py_CheckEmscriptenSignals(void);
+
+#define PY_EMSCRIPTEN_SIGNAL_INTERVAL 50


I would increase this a lot, as it is hot code. In CPython running natively, it gets hit several million times per second.

We benchmark this in Pyodide and surprisingly see no observable performance cost. These are benchmarks from our most recent commit in Pyodide. The columns marked (w/ ib) do this extra check once every 50 instructions. As you can see there is no measured performance impact. I agree it's not what I expected, but I guess this check is very fast?

Interrupt buffer benchmarks from Pyodide main branch

From:
https://app.circleci.com/pipelines/github/pyodide/pyodide/3736/workflows/8ba48b60-2cb9-430b-946f-152cca4fc3f5/jobs/40720

- pystone native: 0.198507 firefox: 0.732000 firefox(w/ ib): 0.752000 chrome: 0.646333 chrome(w/ ib): 0.656367 - allpairs_distances native: 0.000153 firefox: 0.001333 firefox(w/ ib): 0.001111 chrome: 0.001322 chrome(w/ ib): 0.001344 - allpairs_distances_loops native: 0.021808 firefox: 0.049889 firefox(w/ ib): 0.050667 chrome: 0.040656 chrome(w/ ib): 0.041411 - arc_distance native: 0.001230 firefox: 0.003111 firefox(w/ ib): 0.003111 chrome: 0.003300 chrome(w/ ib): 0.003322 - check_mask native: 0.003151 firefox: 0.011444 firefox(w/ ib): 0.011667 chrome: 0.008900 chrome(w/ ib): 0.009122 - create_grid native: 0.011174 firefox: 0.013444 firefox(w/ ib): 0.013556 chrome: 0.013822 chrome(w/ ib): 0.013800 - cronbach native: 0.005441 firefox: 0.009111 firefox(w/ ib): 0.009000 chrome: 0.009744 chrome(w/ ib): 0.009700 - diffusion native: 0.021249 firefox: 0.051556 firefox(w/ ib): 0.051667 chrome: 0.052778 chrome(w/ ib): 0.052867 - evolve native: 0.017652 firefox: 0.016000 firefox(w/ ib): 0.015667 chrome: 0.016867 chrome(w/ ib): 0.017067 - fdtd native: 0.010733 firefox: 0.031333 firefox(w/ ib): 0.031333 chrome: 0.028111 chrome(w/ ib): 0.027500 - grayscott native: 0.022387 firefox: 0.079667 firefox(w/ ib): 0.079889 chrome: 0.086411 chrome(w/ ib): 0.087311 - grouping native: 0.006726 firefox: 0.013444 firefox(w/ ib): 0.013222 chrome: 0.012833 chrome(w/ ib): 0.012800 - growcut native: 0.011023 firefox: 0.031889 firefox(w/ ib): 0.032222 chrome: 0.030489 chrome(w/ ib): 0.030322 - harris native: 0.053286 firefox: 0.020444 firefox(w/ ib): 0.020222 chrome: 0.025444 chrome(w/ ib): 0.025522 - hasting native: 0.000019 firefox: 0.000000 firefox(w/ ib): 0.000000 chrome: 0.000044 chrome(w/ ib): 0.000033 - julia native: 0.001702 firefox: 0.004444 firefox(w/ ib): 0.004556 chrome: 0.004400 chrome(w/ ib): 0.004400 - l2norm native: 0.002951 firefox: 0.006333 firefox(w/ ib): 0.006000 chrome: 0.006011 chrome(w/ ib): 0.006033 - large_decimal_list native: 0.000359 firefox: 0.001222 firefox(w/ ib): 0.000889 chrome: 0.001100 chrome(w/ ib): 0.001144 - local_maxima native: 0.014963 firefox: 0.054889 firefox(w/ ib): 0.056889 chrome: 0.044867 chrome(w/ ib): 0.045389 - log_likelihood native: 0.008437 firefox: 0.014889 firefox(w/ ib): 0.015111 chrome: 0.015267 chrome(w/ ib): 0.015089 - lstsqr native: 0.029089 firefox: 0.020667 firefox(w/ ib): 0.019333 chrome: 0.020833 chrome(w/ ib): 0.019722 - mandel native: 0.026955 firefox: 0.090667 firefox(w/ ib): 0.091111 chrome: 0.097089 chrome(w/ ib): 0.095433 - multiple_sum native: 0.007375 firefox: 0.023222 firefox(w/ ib): 0.023333 chrome: 0.023867 chrome(w/ ib): 0.023722 - pairwise_loop native: 0.009979 firefox: 0.026889 firefox(w/ ib): 0.026667 chrome: 0.025222 chrome(w/ ib): 0.025567 - periodic_dist native: 0.008906 firefox: 0.011444 firefox(w/ ib): 0.011333 chrome: 0.010033 chrome(w/ ib): 0.010133 - repeating native: 0.009739 firefox: 0.009667 firefox(w/ ib): 0.009222 chrome: 0.007889 chrome(w/ ib): 0.007467 - reverse_cumsum native: 0.028820 firefox: 0.014222 firefox(w/ ib): 0.013667 chrome: 0.014056 chrome(w/ ib): 0.013911 - rosen native: 0.083447 firefox: 0.043000 firefox(w/ ib): 0.042556 chrome: 0.046611 chrome(w/ ib): 0.045756 - slowparts native: 0.013151 firefox: 0.034222 firefox(w/ ib): 0.035000 chrome: 0.028389 chrome(w/ ib): 0.028656 - smoothing native: 0.000003 firefox: 0.000000 firefox(w/ ib): 0.000000 chrome: 0.000011 chrome(w/ ib): 0.000000 - specialconvolve native: 0.037426 firefox: 0.024556 firefox(w/ ib): 0.024556 chrome: 0.026622 chrome(w/ ib): 0.025900 - vibr_energy native: 0.005246 firefox: 0.009889 firefox(w/ ib): 0.009778 chrome: 0.010333 chrome(w/ ib): 0.010356 - wave native: 0.003782 firefox: 0.008889 firefox(w/ ib): 0.009111 chrome: 0.007578 chrome(w/ ib): 0.007756 - canvas_custom_font native: nan firefox: 0.967778 firefox(w/ ib): 0.992778 chrome: 0.751767 chrome(w/ ib): 0.739433 - canvas_image native: nan firefox: 0.644778 firefox(w/ ib): 0.656556 chrome: 0.557889 chrome(w/ ib): 0.555289 - canvas_image_affine native: nan firefox: 1.599778 firefox(w/ ib): 1.626333 chrome: 1.264300 chrome(w/ ib): 1.272900 - canvas_rendering native: nan firefox: 0.959444 firefox(w/ ib): 0.969889 chrome: 0.724700 chrome(w/ ib): 0.724033 - canvas_text_rotated native: nan firefox: 1.342222 firefox(w/ ib): 1.406556 chrome: 1.055044 chrome(w/ ib): 1.050233 - wasm_custom_font native: nan firefox: 0.613778 firefox(w/ ib): 0.614333 chrome: 0.501633 chrome(w/ ib): 0.497000 - wasm_image native: nan firefox: 0.514444 firefox(w/ ib): 0.516222 chrome: 0.440578 chrome(w/ ib): 0.433600 - wasm_image_affine native: nan firefox: 1.348667 firefox(w/ ib): 1.386889 chrome: 1.128511 chrome(w/ ib): 1.135756 - wasm_rendering native: nan firefox: 0.586778 firefox(w/ ib): 0.586778 chrome: 0.485778 chrome(w/ ib): 0.470600 - wasm_text_rotated native: nan firefox: 0.665778 firefox(w/ ib): 0.658444 chrome: 0.544767 chrome(w/ ib): 0.533867

I will try to build Pyodide with Python 3.11.0a6 and this patch and see if I can benchmark the final patch, but I suspect it won't be dramatically different.

Maybe the JS gets inlined. Does it differ much from browser to browser?

I'm not sure that I understand that table.
w/ ib is this branch? Could you normalize the numbers?
The pyiodide main branch is based on 3.11?

Maybe the JS gets inlined.

Yeah something like this. Browser JITs can do amazing stuff sometimes, and making a read from a buffer like this is rather JIT friendly.

Does it differ much from browser to browser?

We are only benchmarking on Chrome and Firefox, but there isn't a big difference. On both Chrome and Firefox, it looks very similar with and without this change. I'm not sure why we don't benchmark Node...

The pyiodide main branch is based on 3.11?

Nope, main branch is 3.10. If the check is happening much more often in 3.11 then maybe the benchmark won't be accurate. I haven't successfully built Pyodide against Python 3.11 yet, working on it.

Could you normalize the numbers?

Yeah when I get a chance I will compute the geometric mean of the ratios with and without this check across benchmarks.

If the check is happening much more often in 3.11...

The check will happen more often, because 3.11 is faster 🙂 But no more often per work done.

(I'd love to know how pyiodide 3.11 compares to 3.10, but that's way off topic for this PR)

I took the ratio of (with signal polling) / (without signal polling) for each benchmark and took the geometric mean across all the benchmarks for three different trials (running the full benchmark suite) and got:

firefox: 0.996, 0.989, 1.008 chrome : 0.997, 1.012, 0.992

So either this patch has no performance impact or I am measuring wrong. This is benchmarked on Python 3.10.2.

I'd love to know how pyiodide 3.11 compares to 3.10

Right now I have the issue that I can't build numpy v1.21.4 against 3.11.0a6 because:

numpy/core/src/multiarray/arraytypes.c.src:65:26: error: expression is not assignable Py_TYPE(&new_fields) = NULL;

With @tiran's assistance I did manage to get Python 3.11.0a6 working with Pyodide though, and the current patch passes all our signal handling tests.

hoodmane · 2022-03-31T17:49:51Z

Would it possible to move the web assembly code to its own file(s)?

Yeah that makes sense to me. Do you have a model for what this should look like? Or I can just guess.

What I can do is define a separate emscripten_signals.h which depending on defined(__EMSCRIPTEN__) defines two macros to either be no-ops or to do our checks, and an emscripten_signals.c file with the function definitions all guarded behind defined(__EMSCRIPTEN__). Then I just need to figure out how to adjust the Make stuff to link emscripten_signals.o into the build.

tiran · 2022-03-31T19:55:23Z

What I can do is define a separate emscripten_signals.h which depending on defined(__EMSCRIPTEN__) defines two macros to either be no-ops or to do our checks, and an emscripten_signals.c file with the function definitions all guarded behind defined(__EMSCRIPTEN__). Then I just need to figure out how to adjust the Make stuff to link emscripten_signals.o into the build.

Here you are https://github.com/python/cpython/compare/main...tiran:platform_objs?expand=1

… emscripten-signals

configure.ac

Co-authored-by: Brett Cannon <brett@python.org>

tiran · 2022-04-02T21:17:15Z

The PR needs a new blurb. You can either use the CLI tool "blurb" from PyPI or the blurb-it app (click on details next to bedevere/news check).

configure.ac

Co-authored-by: Christian Heimes <christian@python.org>

tiran · 2022-04-03T20:58:35Z

Thanks for your contribution, @hoodmane ! Much appreciated.

I'm merging the PR now so it gets into the next alpha release.

bpo-47176 Interrupt handling for wasm32-emscripten builds without pth…

af0e2b1

…reads

hoodmane requested a review from markshannon as a code owner March 31, 2022 05:00

bedevere-bot added the awaiting review label Mar 31, 2022

the-knights-who-say-ni added the CLA signed label Mar 31, 2022

hoodmane mentioned this pull request Mar 31, 2022

Review CPython patches / tests and contribute upstream pyodide/pyodide#2000

Closed

tiran changed the title ~~bpo-47176 Interrupt handling for wasm32-emscripten builds without pthreads~~ bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads Mar 31, 2022

tiran reviewed Mar 31, 2022

View reviewed changes

Modules/signalmodule.c Outdated Show resolved Hide resolved

Modules/signalmodule.c Outdated Show resolved Hide resolved

Python/ceval.c Outdated Show resolved Hide resolved

Python/ceval.c Outdated Show resolved Hide resolved

Modules/signalmodule.c Outdated Show resolved Hide resolved

Apply suggestions from code review

6215fc6

Co-authored-by: Christian Heimes <christian@python.org>

hoodmane added 5 commits March 31, 2022 08:35

Use EMSCRIPTEN_KEEP_ALIVE in Py_EMSCRIPTEN_SIGNAL_HANDLING, be a bit …

baf7b17

…more conservative with errors in CheckEmscriptenSignals

Formatting fixes

d7282eb

More formatting fixes

bb150f3

More formatting

cee2677

Add comment to #endif

4e7d87f

tiran reviewed Mar 31, 2022

View reviewed changes

Modules/signalmodule.c Outdated Show resolved Hide resolved

Update Modules/signalmodule.c

4bf397d

Co-authored-by: Christian Heimes <christian@python.org>

markshannon reviewed Mar 31, 2022

View reviewed changes

tiran and others added 9 commits March 31, 2022 17:47

Platform specific objects and headers

9cdb1c7

Move custom emscripten logic into emscripten_signals files

76d9a35

Merge branch 'emscripten-signals' of github.com:hoodmane/cpython into…

a614d4d

… emscripten-signals

Fix whitespace

4023353

Fix includes

69621a4

Run autoconf

356d739

include pyerrors in emscripten_signal.c

2a65549

Include python.h

ac6c546

Include Python.h

bb1ffbd

Whitespace

871e545

brettcannon reviewed Apr 1, 2022

View reviewed changes

configure.ac Outdated Show resolved Hide resolved

Update configure.ac

51f7255

Co-authored-by: Brett Cannon <brett@python.org>

hoodmane added 2 commits April 2, 2022 14:37

Add blurb

6b4f9e9

Respect Py_EMSCRIPTEN_SIGNAL_HANDLING flag

83c0803

tiran reviewed Apr 2, 2022

View reviewed changes

configure.ac Outdated Show resolved Hide resolved

hoodmane and others added 2 commits April 2, 2022 14:52

Update configure.ac

0d18561

Co-authored-by: Christian Heimes <christian@python.org>

Run autoconf

0bd86ce

tiran approved these changes Apr 3, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Apr 3, 2022

tiran merged commit 087d0fa into python:main Apr 3, 2022

bedevere-bot removed the awaiting merge label Apr 3, 2022

Uh oh!

bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads #32209

bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads #32209

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!