8000 BUG: Remove non-threadsafe sigint handling from fft calculation by seberg · Pull Request #16532 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Remove non-threadsafe sigint handling from fft calculation #16532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 17, 2020

Conversation

seberg
Copy link
Member
@seberg seberg commented Jun 8, 2020

The fft calculation is the only point in our code where this function
is used. Allowing Ctrl+C, in FFT specifically used have more reasons,
since before pocketfft, some array-sizes could lead to very large
run-times.
Pocketfft fixed that issue, and now FFT is not really any slower,
faster, or memory hungry than any other NumPy operation so it feels
it does not need this handling. Rather, if we can find a better
solution, it should also be added to more functions.

The reason for removal is that it is not only unsafe while the
FFT is running (in theory). Multiple, threaded FFT run can easily
leave the signal handler in a bad state, causing crashes if Ctrl+C
(sigint) is given at any point after the call.

It would be possible to patch that over, by only resetting the
signal handler if we actually changed it (or even more complex
tricks), or possibly only using this technique when on the main
thread.
But, all of these solutions seem to complicate things, when the
main reason for why allowing sigint seems useful is gone with
pocketfft.


This is an extremely annoying bug, and we should have done this long ago (or before pocketfft, maybe just a band-aid fix).

IMO, if we really want signal handling for C-functions we can look into it, but in that case up-streaming a solution may be something to think about. Unless maybe things like C++ just solve the issue?

@seberg
Copy link
Member Author
seberg commented Jun 8, 2020

Forgot to xref gh-11798 and gh-7545, Noticed related we had once a request for sigint handling in linalg (gh-11081), which I assume is basiclaly related (and I do not want to apply this type of fix there, but I think now with pocketfft, linalg would probably be a much better target for such handling if we can figure it out).

@seberg seberg force-pushed the delete-sigint-handling branch from 7ca5d67 to 3d2f80f Compare June 9, 2020 03:56
@seberg seberg force-pushed the delete-sigint-handling branch from 3d2f80f to 6efda93 Compare June 18, 2020 19:56
@seberg
Copy link
Member Author
seberg commented Jun 18, 2020

Noticed this, I had not deleted all occurrences, now they are gone. I could keep the docstring around, I guess. But unless we figure out how to use these safely, I am not sure there is much point in documentation describing how something works and breaks which should not be used.

@eric-wieser
Copy link
Member

This seems reasonable to me. It should be fairly straighforward to handle sigint in an outer ufunc loop somewhere, by calling the python C api to check the interrupted flag. It's not as nice as the immediate interruption that this approach gives, but I suppose its probably safer.

Comment on lines -3798 to -3825
static PyObject *
test_interrupt(PyObject *NPY_UNUSED(self), PyObject *args)
{
int kind = 0;
int a = 0;

if (!PyArg_ParseTuple(args, "|i:test_interrupt", &kind)) {
return NULL;
}
if (kind) {
Py_BEGIN_ALLOW_THREADS;
while (a >= 0) {
if ((a % 1000 == 0) && PyOS_InterruptOccurred()) {
break;
}
a += 1;
}
Py_END_ALLOW_THREADS;
}
else {
NPY_SIGINT_ON
while(a >= 0) {
a += 1;
}
NPY_SIGINT_OFF
}
return PyInt_FromLong(a);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth extracting the if / else branches into the comment in npy_interrupt, showing how to replace uses of the SIGINT api.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I thought that might require the GIL to be held, which seemed 8000 a bit useless for us. But, it seems like that has been fixed reading through all of: https://bugs.python.org/issue40826

Copy link
Member
@pv pv Jun 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't they end up deciding that it does require GIL?
python/cpython#20618
It looks like the above will crash the process with fatal error in the future...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, damn, so my first read was the right one, I guess I read the fact that it was backported, as that they made it not require the GIL (since I did not see why to backport otherwise) :(.

I think I will open a BPO, it would be nice if it was possible to at least be able to query this without the GIL if you pass in the threadstate or so...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened https://bugs.python.org/issue41037 seems there already is a private function for this. Not sure if we should/could petition more to make this function public, but I suppose it may be the best bet if we want interrupt capabilities at some point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stumbled on it, so updated the comment now to mention PyErr_CheckSignals() and also link to the bpo.

@seberg seberg force-pushed the delete-sigint-handling branch 2 times, most recently from b956a6b to 4ff2d3b Compare June 19, 2020 13:47
The fft calculation is the only point in our code where this function
is used.  Allowing Ctrl+C, in FFT specifically used have more reasons,
since before pocketfft, some array-sizes could lead to very large
run-times.
Pocketfft fixed that issue, and now FFT is not really any slower,
faster, or memory hungry than any other NumPy operation so it feels
it does not need this handling.  Rather, if we can find a better
solution, it should also be added to more functions.

The reason for removal is that it is not only unsafe while the
FFT is running (in theory). Multiple, threaded FFT run can easily
leave the signal handler in a bad state, causing crashes if Ctrl+C
(sigint) is given at any point after the call.

It would be possible to patch that over, by only resetting the
signal handler if we actually changed it (or even more complex
tricks), or possibly only using this technique when on the main
thread.
But, all of these solutions seem to complicate things, when the
main reason for why allowing sigint seems useful is gone with
pocketfft.
@seberg seberg force-pushed the delete-sigint-handling branch from 4ff2d3b to ec62be2 Compare June 26, 2020 01:45
@seberg
Copy link
Member Author
8000 seberg commented Jul 16, 2020

I feel this is also a very clear improvement, even if it just deletes a bunch of code. The old behaviour is annoyingly buggy (I used to have a colleague who lost hour long ipython sessions regularly to this...) and the need for it is just not there anymore (at least much less) to be honest.

@mattip mattip merged commit d3eae8b into numpy:master Jul 17, 2020
@mattip
Copy link
Member
mattip commented Jul 17, 2020

Thanks @seberg

@seberg seberg deleted the delete-sigint-handling branch July 17, 2020 13:14
@jonashaag
Copy link
jonashaag commented Aug 18, 2020

Thanks for fixing this! I just hit the bug when using fft with multiprocessing with PyTorch. Pressing ^C in the main program causes very curious things like

ERROR: Unexpected segmentation fault encountered in worker.

and

*** longjmp causes uninitialized stack frame ***

I wonder why this PR hasn't been released in 1.19.1? (The merge predates the 1.19.1 release.)

@seberg
Copy link
8000
Member Author
seberg commented Aug 18, 2020

Yeah, this one probably has been seriously annoying me for the better part of a decade... (we really should have done this at least as soon as we had bluestein's aglorith/pocketfft)

We did not think of backporting it, but I agree that there is nothing wrong with backporting it into 1.19. Tagging it @charris can make the call.

@seberg seberg added the 09 - Backport-Candidate PRs tagged should be backported label Aug 18, 2020
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0