8000 bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads by hoodmane · Pull Request #32209 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads #32209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Apr 3, 2022

Conversation

hoodmane
Copy link
Contributor
@hoodmane hoodmane commented Mar 31, 2022

https://bugs.python.org/issue47176

I put the custom behavior behind a runtime flag Py_EMSCRIPTEN_SIGNAL_HANDLING. Pyodide's public API allows toggling signal handling on and off at runtime, so this flag allows us to maintain that same behavior. This patch rebases cleanly onto Python 3.10.2 except for ceval.c where CHECK_EVAL_BREAKER is new in Python 3.11.

See pyodide/pyodide#2332 where a similar patch is tested in Pyodide with Python 3.10.2. I tried to build Pyodide against Python 3.11.0a6 to test this exact version of the patch but I ran into a very large number of linker errors that I wasn't able to resolve so I gave up.

https://bugs.python.org/issue47176

@hoodmane hoodmane requested a review from markshannon as a code owner March 31, 2022 05:00
@tiran tiran changed the title bpo-47176 Interrupt handling for wasm32-emscripten builds without pthreads bpo-47176: Interrupt handling for wasm32-emscripten builds without pthreads Mar 31, 2022
Copy link
Member
@tiran tiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I added a couple of comments and code style suggestions.

Co-authored-by: Christian Heimes <christian@python.org>
@hoodmane
Copy link
Contributor Author

Thanks for the review @tiran! (And for the encouragement to make this PR.)

Co-authored-by: Christian Heimes <christian@python.org>
Python/ceval.c Outdated
@@ -1292,11 +1292,33 @@ eval_frame_handle_pending(PyThreadState *tstate)
}

#define CHECK_EVAL_BREAKER() \
CHECK_EMSCRIPTEN_SIGNALS(); \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't emscripten signals set the eval_breaker variable, like all other platforms?

Copy link
Contributor Author
@hoodmane hoodmane Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the "shared-nothing" threading model. There is no preemptive multitasking so the thread that Python is running in can't be responsible for writing to the memory when Python code is running. The thread handling the UI cannot write into the Python/wasm memory because it doesn't have access to it. The embedder has to manually create a single byte of shared memory between the UI and the Python thread to handle the signal. But this byte lives outside the wasm memory and cannot be accessed via a pointer (so it can't be accessed directly from C at all without calling out into Javascript).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's how browser's web workers are implemented. Browsers are weird platforms and have all sorts of limitations in place to protect users. Browsers have SharedArrayBuffer disabled to mitigate SPECTRE-like attacks. Without SharedArrayBuffer we need another way to modify a flag from JavaScript code and set it in WASM heap. The original patch https://github.com/pyodide/pyodide/blob/main/cpython/patches/0007-Patch-in-keyboard-interrupt-handling.patch explains the reason in more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems odd.

So, you can read a shared buffer from javascript, but not web assembly? But you can call javascript from web assembly, though? So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is a pointless limitation of web assembly, as just impairs performance, but offers no additional security?

Yes. There are plans to eventually lift this limitation some time in the next few years.
https://github.com/WebAssembly/multi-memory/blob/main/proposals/multi-memory/Overview.md
But we also need to get llvm + emscripten support, so even once wasm adds the feature it will take a while before it is accessible via our compiler toolchain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I just wanted to make sure I understood this correctly.

Copy link
Contributor Author
@hoodmane hoodmane Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To describe the situation more completely:

  1. Memory buffers can either be marked shared or private
  2. shared memory has worse performance than unshared memory because of mitigations for cache timing sidechannel attacks like Spectre and Heartbleed
  3. wasm currently only allows one memory

For performance reasons, we want the one memory to be private. So we're stuck with what this PR does.

With multiple memories, we could have a private memory and a shared memory. We would need some attribute that asks the linker to allocate certain things in the shared memory, and then we could mark eval_breaker and similar signal flags with this "shared" attribute. In any case this is blocked on the wasm multiple memories proposal.

@markshannon
Copy link
Member

Would it possible to move the web assembly code to its own file(s)? That way we would need fewer #ifs.
This isn't a problem for the web assembly code, specifically. Just that as we add more platforms, the code gets buried under #ifdefs.
We should probably do this for other platforms as well.

Python/ceval.c Outdated
extern int Py_EMSCRIPTEN_SIGNAL_HANDLING;
void _Py_CheckEmscriptenSignals(void);

#define PY_EMSCRIPTEN_SIGNAL_INTERVAL 50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would increase this a lot, as it is hot code. In CPython running natively, it gets hit several million times per second.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We benchmark this in Pyodide and surprisingly see no observable performance cost. These are benchmarks from our most recent commit in Pyodide. The columns marked (w/ ib) do this extra check once every 50 instructions. As you can see there is no measured performance impact. I agree it's not what I expected, but I guess this check is very fast?

Interrupt buffer benchmarks from Pyodide main branch

From:
https://app.circleci.com/pipelines/github/pyodide/pyodide/3736/workflows/8ba48b60-2cb9-430b-946f-152cca4fc3f5/jobs/40720

 -  pystone
    native: 0.198507  firefox: 0.732000  firefox(w/ ib): 0.752000  chrome: 0.646333  chrome(w/ ib): 0.656367  
 -  allpairs_distances
    native: 0.000153  firefox: 0.001333  firefox(w/ ib): 0.001111  chrome: 0.001322  chrome(w/ ib): 0.001344  
 -  allpairs_distances_loops
    native: 0.021808  firefox: 0.049889  firefox(w/ ib): 0.050667  chrome: 0.040656  chrome(w/ ib): 0.041411  
 -  arc_distance
    native: 0.001230  firefox: 0.003111  firefox(w/ ib): 0.003111  chrome: 0.003300  chrome(w/ ib): 0.003322  
 -  check_mask
    native: 0.003151  firefox: 0.011444  firefox(w/ ib): 0.011667  chrome: 0.008900  chrome(w/ ib): 0.009122  
 -  create_grid
    native: 0.011174  firefox: 0.013444  firefox(w/ ib): 0.013556  chrome: 0.013822  chrome(w/ ib): 0.013800  
 -  cronbach
    native: 0.005441  firefox: 0.009111  firefox(w/ ib): 0.009000  chrome: 0.009744  chrome(w/ ib): 0.009700  
 -  diffusion
    native: 0.021249  firefox: 0.051556  firefox(w/ ib): 0.051667  chrome: 0.052778  chrome(w/ ib): 0.052867  
 -  evolve
    native: 0.017652  firefox: 0.016000  firefox(w/ ib): 0.015667  chrome: 0.016867  chrome(w/ ib): 0.017067  
 -  fdtd
    native: 0.010733  firefox: 0.031333  firefox(w/ ib): 0.031333  chrome: 0.028111  chrome(w/ ib): 0.027500  
 -  grayscott
    native: 0.022387  firefox: 0.079667  firefox(w/ ib): 0.079889  chrome: 0.086411  chrome(w/ ib): 0.087311  
 -  grouping
    native: 0.006726  firefox: 0.013444  firefox(w/ ib): 0.013222  chrome: 0.012833  chrome(w/ ib): 0.012800  
 -  growcut
    native: 0.011023  firefox: 0.031889  firefox(w/ ib): 0.032222  chrome: 0.030489  chrome(w/ ib): 0.030322  
 -  harris
    native: 0.053286  firefox: 0.020444  firefox(w/ ib): 0.020222  chrome: 0.025444  chrome(w/ ib): 0.025522  
 -  hasting
    native: 0.000019  firefox: 0.000000  firefox(w/ ib): 0.000000  chrome: 0.000044  chrome(w/ ib): 0.000033  
 -  julia
    native: 0.001702  firefox: 0.004444  firefox(w/ ib): 0.004556  chrome: 0.004400  chrome(w/ ib): 0.004400  
 -  l2norm
    native: 0.002951  firefox: 0.006333  firefox(w/ ib): 0.006000  chrome: 0.006011  chrome(w/ ib): 0.006033  
 -  large_decimal_list
    native: 0.000359  firefox: 0.001222  firefox(w/ ib): 0.000889  chrome: 0.001100  chrome(w/ ib): 0.001144  
 -  local_maxima
    native: 0.014963  firefox: 0.054889  firefox(w/ ib): 0.056889  chrome: 0.044867  chrome(w/ ib): 0.045389  
 -  log_likelihood
    native: 0.008437  firefox: 0.014889  firefox(w/ ib): 0.015111  chrome: 0.015267  chrome(w/ ib): 0.015089  
 -  lstsqr
    native: 0.029089  firefox: 0.020667  firefox(w/ ib): 0.019333  chrome: 0.020833  chrome(w/ ib): 0.019722  
 -  mandel
    native: 0.026955  firefox: 0.090667  firefox(w/ ib): 0.091111  chrome: 0.097089  chrome(w/ ib): 0.095433  
 -  multiple_sum
    native: 0.007375  firefox: 0.023222  firefox(w/ ib): 0.023333  chrome: 0.023867  chrome(w/ ib): 0.023722  
 -  pairwise_loop
    native: 0.009979  firefox: 0.026889  firefox(w/ ib): 0.026667  chrome: 0.025222  chrome(w/ ib): 0.025567  
 -  periodic_dist
    native: 0.008906  firefox: 0.011444  firefox(w/ ib): 0.011333  chrome: 0.010033  chrome(w/ ib): 0.010133  
 -  repeating
    native: 0.009739  firefox: 0.009667  firefox(w/ ib): 0.009222  chrome: 0.007889  chrome(w/ ib): 0.007467  
 -  reverse_cumsum
    native: 0.028820  firefox: 0.014222  firefox(w/ ib): 0.013667  chrome: 0.014056  chrome(w/ ib): 0.013911  
 -  rosen
    native: 0.083447  firefox: 0.043000  firefox(w/ ib): 0.042556  chrome: 0.046611  chrome(w/ ib): 0.045756  
 -  slowparts
    native: 0.013151  firefox: 0.034222  firefox(w/ ib): 0.035000  chrome: 0.028389  chrome(w/ ib): 0.028656  
 -  smoothing
    native: 0.000003  firefox: 0.000000  firefox(w/ ib): 0.000000  chrome: 0.000011  chrome(w/ ib): 0.000000  
 -  specialconvolve
    native: 0.037426  firefox: 0.024556  firefox(w/ ib): 0.024556  chrome: 0.026622  chrome(w/ ib): 0.025900  
 -  vibr_energy
    native: 0.005246  firefox: 0.009889  firefox(w/ ib): 0.009778  chrome: 0.010333  chrome(w/ ib): 0.010356  
 -  wave
    native: 0.003782  firefox: 0.008889  firefox(w/ ib): 0.009111  chrome: 0.007578  chrome(w/ ib): 0.007756  
 -  canvas_custom_font
    native: nan  firefox: 0.967778  firefox(w/ ib): 0.992778  chrome: 0.751767  chrome(w/ ib): 0.739433  
 -  canvas_image
    native: nan  firefox: 0.644778  firefox(w/ ib): 0.656556  chrome: 0.557889  chrome(w/ ib): 0.555289  
 -  canvas_image_affine
    native: nan  firefox: 1.599778  firefox(w/ ib): 1.626333  chrome: 1.264300  chrome(w/ ib): 1.272900  
 -  canvas_rendering
    native: nan  firefox: 0.959444  firefox(w/ ib): 0.969889  chrome: 0.724700  chrome(w/ ib): 0.724033  
 -  canvas_text_rotated
    native: nan  firefox: 1.342222  firefox(w/ ib): 1.406556  chrome: 1.055044  chrome(w/ ib): 1.050233  
 -  wasm_custom_font
    native: nan  firefox: 0.613778  firefox(w/ ib): 0.614333  chrome: 0.501633  chrome(w/ ib): 0.497000  
 -  wasm_image
    native: nan  firefox: 0.514444  firefox(w/ ib): 0.516222  chrome: 0.440578  chrome(w/ ib): 0.433600  
 -  wasm_image_affine
    native: nan  firefox: 1.348667  firefox(w/ ib): 1.386889  chrome: 1.128511  chrome(w/ ib): 1.135756  
 -  wasm_rendering
    native: nan  firefox: 0.586778  firefox(w/ ib): 0.586778  chrome: 0.485778  chrome(w/ ib): 0.470600  
 -  wasm_text_rotated
    native: nan  firefox: 0.665778  firefox(w/ ib): 0.658444  chrome: 0.544767  chrome(w/ ib): 0.533867  

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to build Pyodide with Python 3.11.0a6 and this patch and see if I can benchmark the final patch, but I suspect it won't be dramatically different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the JS gets inlined. Does it differ much from browser to browser?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand that table.
w/ ib is this branch? Could you normalize the numbers?
The pyiodide main branch is based on 3.11?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the JS gets inlined.

Yeah something like this. Browser JITs can do amazing stuff sometimes, and making a read from a buffer like this is rather JIT friendly.

Does it differ much from browser to browser?

We are only benchmarking on Chrome and Firefox, but there isn't a big difference. On both Chrome and Firefox, it looks very similar with and without this change. I'm not sure why we don't benchmark Node...

The pyiodide main branch is based on 3.11?

Nope, main branch is 3.10. If the check is happening much more often in 3.11 then maybe the benchmark won't be accurate. I haven't successfully built Pyodide against Python 3.11 yet, working on it.

Could you normalize the numbers?

Yeah when I get a chance I will compute the geometric mean of the ratios with and without this check across benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the check is happening much more often in 3.11...

The check will happen more often, because 3.11 is faster 🙂 But no more often per work done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'd love to know how pyiodide 3.11 compares to 3.10, but that's way off topic for this PR)

Copy link
Contributor Author
@hoodmane hoodmane Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the ratio of (with signal polling) / (without signal polling) for each benchmark and took the geometric mean across all the benchmarks for three different trials (running the full benchmark suite) and got:

firefox: 0.996, 0.989, 1.008
chrome : 0.997, 1.012, 0.992

So either this patch has no performance impact or I am measuring wrong. This is benchmarked on Python 3.10.2.

I'd love to know how pyiodide 3.11 compares to 3.10

Right now I have the issue that I can't build numpy v1.21.4 against 3.11.0a6 because:

numpy/core/src/multiarray/arraytypes.c.src:65:26: error: expression is not assignable
    Py_TYPE(&new_fields) = NULL;

Copy link
Contributor Author
@hoodmane hoodmane Apr 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With @tiran's assistance I did manage to get Python 3.11.0a6 working with Pyodide though, and the current patch passes all our signal handling tests.

@hoodmane
Copy link
Contributor Author
hoodmane commented Mar 31, 2022

Would it possible to move the web assembly code to its own file(s)?

Yeah that makes sense to me. Do you have a model for what this should look like? Or I can just guess.

What I can do is define a separate emscripten_signals.h which depending on defined(__EMSCRIPTEN__) defines two macros to either be no-ops or to do our checks, and an emscripten_signals.c file with the function definitions all guarded behind defined(__EMSCRIPTEN__). Then I just need to figure out how to adjust the Make stuff to link emscripten_signals.o into the build.

@tiran
Copy link
Member
tiran commented Mar 31, 2022

What I can do is define a separate emscripten_signals.h which depending on defined(__EMSCRIPTEN__) defines two macros to either be no-ops or to do our checks, and an emscripten_signals.c file with the function definitions all guarded behind defined(__EMSCRIPTEN__). Then I just need to figure out how to adjust the Make stuff to link emscripten_signals.o into the build.

Here you are https://github.com/python/cpython/compare/main...tiran:platform_objs?expand=1

Co-authored-by: Brett Cannon <brett@python.org>
@tiran
Copy link
Member
tiran commented Apr 2, 2022

The PR needs a new blurb. You can either use the CLI tool "blurb" from PyPI or the blurb-it app (click on details next to bedevere/news check).

hoodmane and others added 2 commits April 2, 2022 14:52
Co-authored-by: Christian Heimes <christian@python.org>
@tiran
Copy link
Member
tiran commented Apr 3, 2022

Thanks for your contribution, @hoodmane ! Much appreciated.

I'm merging the PR now so it gets into the next alpha release.

@tiran tiran merged commit 087d0fa into python:main Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0