-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Remove non-threadsafe sigint handling from fft calculation #16532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Forgot to xref gh-11798 and gh-7545, Noticed related we had once a request for sigint handling in linalg (gh-11081), which I assume is basiclaly related (and I do not want to apply this type of fix there, but I think now with pocketfft, linalg would probably be a much better target for such handling if we can figure it out). |
7ca5d67
to
3d2f80f
Compare
3d2f80f
to
6efda93
Compare
Noticed this, I had not deleted all occurrences, now they are gone. I could keep the docstring around, I guess. But unless we figure out how to use these safely, I am not sure there is much point in documentation describing how something works and breaks which should not be used. |
This seems reasonable to me. It should be fairly straighforward to handle sigint in an outer ufunc loop somewhere, by calling the python C api to check the interrupted flag. It's not as nice as the immediate interruption that this approach gives, but I suppose its probably safer. |
static PyObject * | ||
test_interrupt(PyObject *NPY_UNUSED(self), PyObject *args) | ||
{ | ||
int kind = 0; | ||
int a = 0; | ||
|
||
if (!PyArg_ParseTuple(args, "|i:test_interrupt", &kind)) { | ||
return NULL; | ||
} | ||
if (kind) { | ||
Py_BEGIN_ALLOW_THREADS; | ||
while (a >= 0) { | ||
if ((a % 1000 == 0) && PyOS_InterruptOccurred()) { | ||
break; | ||
} | ||
a += 1; | ||
} | ||
Py_END_ALLOW_THREADS; | ||
} | ||
else { | ||
NPY_SIGINT_ON | ||
while(a >= 0) { | ||
a += 1; | ||
} | ||
NPY_SIGINT_OFF | ||
} | ||
return PyInt_FromLong(a); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth extracting the if
/ else
branches into the comment in npy_interrupt
, showing how to replace uses of the SIGINT
api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I thought that might require the GIL to be held, which seemed 8000 a bit useless for us. But, it seems like that has been fixed reading through all of: https://bugs.python.org/issue40826
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't they end up deciding that it does require GIL?
python/cpython#20618
It looks like the above will crash the process with fatal error in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, damn, so my first read was the right one, I guess I read the fact that it was backported, as that they made it not require the GIL (since I did not see why to backport otherwise) :(.
I think I will open a BPO, it would be nice if it was possible to at least be able to query this without the GIL if you pass in the threadstate or so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened https://bugs.python.org/issue41037 seems there already is a private function for this. Not sure if we should/could petition more to make this function public, but I suppose it may be the best bet if we want interrupt capabilities at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stumbled on it, so updated the comment now to mention PyErr_CheckSignals()
and also link to the bpo.
b956a6b
to
4ff2d3b
Compare
The fft calculation is the only point in our code where this function is used. Allowing Ctrl+C, in FFT specifically used have more reasons, since before pocketfft, some array-sizes could lead to very large run-times. Pocketfft fixed that issue, and now FFT is not really any slower, faster, or memory hungry than any other NumPy operation so it feels it does not need this handling. Rather, if we can find a better solution, it should also be added to more functions. The reason for removal is that it is not only unsafe while the FFT is running (in theory). Multiple, threaded FFT run can easily leave the signal handler in a bad state, causing crashes if Ctrl+C (sigint) is given at any point after the call. It would be possible to patch that over, by only resetting the signal handler if we actually changed it (or even more complex tricks), or possibly only using this technique when on the main thread. But, all of these solutions seem to complicate things, when the main reason for why allowing sigint seems useful is gone with pocketfft.
4ff2d3b
to
ec62be2
Compare
I feel this is also a very clear improvement, even if it just deletes a bunch of code. The old behaviour is annoyingly buggy (I used to have a colleague who lost hour long ipython sessions regularly to this...) and the need for it is just not there anymore (at least much less) to be honest. |
Thanks @seberg |
Thanks for fixing this! I just hit the bug when using fft with multiprocessing with PyTorch. Pressing ^C in the main program causes very curious things like
and
I wonder why this PR hasn't been released in 1.19.1? (The merge predates the 1.19.1 release.) |
Yeah, this one probably has been seriously annoying me for the better part of a decade... (we really should have done this at least as soon as we had bluestein's aglorith/pocketfft) We did not think of backporting it, but I agree that there is nothing wrong with backporting it into 1.19. Tagging it @charris can make the call. |
The fft calculation is the only point in our code where this function
is used. Allowing Ctrl+C, in FFT specifically used have more reasons,
since before pocketfft, some array-sizes could lead to very large
run-times.
Pocketfft fixed that issue, and now FFT is not really any slower,
faster, or memory hungry than any other NumPy operation so it feels
it does not need this handling. Rather, if we can find a better
solution, it should also be added to more functions.
The reason for removal is that it is not only unsafe while the
FFT is running (in theory). Multiple, threaded FFT run can easily
leave the signal handler in a bad state, causing crashes if Ctrl+C
(sigint) is given at any point after the call.
It would be possible to patch that over, by only resetting the
signal handler if we actually changed it (or even more complex
tricks), or possibly only using this technique when on the main
thread.
But, all of these solutions seem to complicate things, when the
main reason for why allowing sigint seems useful is gone with
pocketfft.
This is an extremely annoying bug, and we should have done this long ago (or before pocketfft, maybe just a band-aid fix).
IMO, if we really want signal handling for C-functions we can look into it, but in that case up-streaming a solution may be something to think about. Unless maybe things like C++ just solve the issue?