BUG: Remove non-threadsafe sigint handling from fft calculation #16532

seberg · 2020-06-08T14:12:07Z

The fft calculation is the only point in our code where this function
is used. Allowing Ctrl+C, in FFT specifically used have more reasons,
since before pocketfft, some array-sizes could lead to very large
run-times.
Pocketfft fixed that issue, and now FFT is not really any slower,
faster, or memory hungry than any other NumPy operation so it feels
it does not need this handling. Rather, if we can find a better
solution, it should also be added to more functions.

The reason for removal is that it is not only unsafe while the
FFT is running (in theory). Multiple, threaded FFT run can easily
leave the signal handler in a bad state, causing crashes if Ctrl+C
(sigint) is given at any point after the call.

It would be possible to patch that over, by only resetting the
signal handler if we actually changed it (or even more complex
tricks), or possibly only using this technique when on the main
thread.
But, all of these solutions seem to complicate things, when the
main reason for why allowing sigint seems useful is gone with
pocketfft.

This is an extremely annoying bug, and we should have done this long ago (or before pocketfft, maybe just a band-aid fix).

IMO, if we really want signal handling for C-functions we can look into it, but in that case up-streaming a solution may be something to think about. Unless maybe things like C++ just solve the issue?

seberg · 2020-06-08T14:32:14Z

Forgot to xref gh-11798 and gh-7545, Noticed related we had once a request for sigint handling in linalg (gh-11081), which I assume is basiclaly related (and I do not want to apply this type of fix there, but I think now with pocketfft, linalg would probably be a much better target for such handling if we can figure it out).

seberg · 2020-06-18T19:57:47Z

Noticed this, I had not deleted all occurrences, now they are gone. I could keep the docstring around, I guess. But unless we figure out how to use these safely, I am not sure there is much point in documentation describing how something works and breaks which should not be used.

eric-wieser · 2020-06-19T11:31:17Z

This seems reasonable to me. It should be fairly straighforward to handle sigint in an outer ufunc loop somewhere, by calling the python C api to check the interrupted flag. It's not as nice as the immediate interruption that this approach gives, but I suppose its probably safer.

eric-wieser · 2020-06-19T11:36:51Z

numpy/core/src/multiarray/multiarraymodule.c

-static PyObject *
-test_interrupt(PyObject *NPY_UNUSED(self), PyObject *args)
-{
-    int kind = 0;
-    int a = 0;
-
-    if (!PyArg_ParseTuple(args, "|i:test_interrupt", &kind)) {
-        return NULL;
-    }
-    if (kind) {
-        Py_BEGIN_ALLOW_THREADS;
-        while (a >= 0) {
-            if ((a % 1000 == 0) && PyOS_InterruptOccurred()) {
-                break;
-            }
-            a += 1;
-        }
-        Py_END_ALLOW_THREADS;
-    }
-    else {
-        NPY_SIGINT_ON
-        while(a >= 0) {
-            a += 1;
-        }
-        NPY_SIGINT_OFF
-    }
-    return PyInt_FromLong(a);
-}


It might be worth extracting the if / else branches into the comment in npy_interrupt, showing how to replace uses of the SIGINT api.

Ah, I thought that might require the GIL to be held, which seemed a bit useless for us. But, it seems like that has been fixed reading through all of: https://bugs.python.org/issue40826

Didn't they end up deciding that it does require GIL?
python/cpython#20618
It looks like the above will crash the process with fatal error in the future...

Oh, damn, so my first read was the right one, I guess I read the fact that it was backported, as that they made it not require the GIL (since I did not see why to backport otherwise) :(.

I think I will open a BPO, it would be nice if it was possible to at least be able to query this without the GIL if you pass in the threadstate or so...

I opened https://bugs.python.org/issue41037 seems there already is a private function for this. Not sure if we should/could petition more to make this function public, but I suppose it may be the best bet if we want interrupt capabilities at some point.

Stumbled on it, so updated the comment now to mention PyErr_CheckSignals() and also link to the bpo.

The fft calculation is the only point in our code where this function is used. Allowing Ctrl+C, in FFT specifically used have more reasons, since before pocketfft, some array-sizes could lead to very large run-times. Pocketfft fixed that issue, and now FFT is not really any slower, faster, or memory hungry than any other NumPy operation so it feels it does not need this handling. Rather, if we can find a better solution, it should also be added to more functions. The reason for removal is that it is not only unsafe while the FFT is running (in theory). Multiple, threaded FFT run can easily leave the signal handler in a bad state, causing crashes if Ctrl+C (sigint) is given at any point after the call. It would be possible to patch that over, by only resetting the signal handler if we actually changed it (or even more complex tricks), or possibly only using this technique when on the main thread. But, all of these solutions seem to complicate things, when the main reason for why allowing sigint seems useful is gone with pocketfft.

seberg · 2020-07-16T19:42:28Z

I feel this is also a very clear improvement, even if it just deletes a bunch of code. The old behaviour is annoyingly buggy (I used to have a colleague who lost hour long ipython sessions regularly to this...) and the need for it is just not there anymore (at least much less) to be honest.

mattip · 2020-07-17T05:17:26Z

Thanks @seberg

jonashaag · 2020-08-18T18:58:52Z

Thanks for fixing this! I just hit the bug when using fft with multiprocessing with PyTorch. Pressing ^C in the main program causes very curious things like

ERROR: Unexpected segmentation fault encountered in worker.

and

*** longjmp causes uninitialized stack frame ***

I wonder why this PR hasn't been released in 1.19.1? (The merge predates the 1.19.1 release.)

seberg · 2020-08-18T19:18:05Z

Yeah, this one probably has been seriously annoying me for the better part of a decade... (we really should have done this at least as soon as we had bluestein's aglorith/pocketfft)

We did not think of backporting it, but I agree that there is nothing wrong with backporting it into 1.19. Tagging it @charris can make the call.

seberg added 00 - Bug component: numpy.fft labels Jun 8, 2020

seberg force-pushed the delete-sigint-handling branch from 7ca5d67 to 3d2f80f Compare June 9, 2020 03:56

seberg force-pushed the delete-sigint-handling branch from 3d2f80f to 6efda93 Compare June 18, 2020 19:56

eric-wieser reviewed Jun 19, 2020

View reviewed changes

seberg force-pushed the delete-sigint-handling branch 2 times, most recently from b956a6b to 4ff2d3b Compare June 19, 2020 13:47

seberg force-pushed the delete-sigint-handling branch from 4ff2d3b to ec62be2 Compare June 26, 2020 01:45

mattip merged commit d3eae8b into numpy:master Jul 17, 2020

seberg deleted the delete-sigint-handling branch July 17, 2020 13:14

seberg added the 09 - Backport-Candidate PRs tagged should be backported label Aug 18, 2020

charris mentioned this pull request Aug 21, 2020

BUG: Remove non-threadsafe sigint handling from fft calculation #17125

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Remove non-threadsafe sigint handling from fft calculation #16532

BUG: Remove non-threadsafe sigint handling from fft calculation #16532

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Remove non-threadsafe sigint handling from fft calculation #16532

BUG: Remove non-threadsafe sigint handling from fft calculation #16532

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!