-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: fixes np.random.zipf() hanging on pathological input #10912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
5051603
to
8a97ed3
Compare
* Fixes Issue numpy#9829 by allowing a keyboard interruption to break out of zipf() computations that run for very long times because of pathological inputs * Added a unit test that verifies handling of SIGINT by zipf()
8a97ed3
to
3e98c39
Compare
I adjusted so that it works (at least in my hands) on windows & linux with a keyboard interrupt; we'll see if the CI agrees. Even if it does, I still have some residual concerns about the potential fragility. |
* approach is used here; | ||
* Related to Issue #9829 | ||
*/ | ||
signal(SIGINT, kb_interrupt_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handler is installed on every call to rk_zipf? When do you remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the wrong place anyway. The problem is that the while loop never exits because the acceptance-rejection based implementation achieves near 100% rejection for parameters near one on account of an unfortunate scaling.
So if I understand right, you install a C-level signal handler that aborts the process? That seems like an unfortunate solution :-(. There is an existing approach to this problem in Given the simplicity of this function, it seems like checking for signals occasionally would be a better approach? E.g., calling The trickier issue is that right now, there's a strict separation between the Python wrapper and the C distribution code, but to make this work right we would need to add a C API call ( Paging @rkern... |
We could also fix the function if we stop worrying about maintaining the current stream. The problem with |
Note that the problem is also predictable, so we could also simply raise an error and bail for certain values of the distribution parameter. |
Yeah, I did try to put @njsmith if we assume I've botched the GIL handling the related C API docs seem to suggest something like this: PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
PyErr_CheckSignals();
// Release the thread.
PyGILState_Release(gstate); Does that look about right or is there a somewhat different convention in NumPy? |
That GIL code looks plausible to me at a first glance, anyway. I guess you also need to check the return value from |
One possibility that occurred to me was to simply count the number of sequential rejections and bail at some point. Just to clarify why there is a problem, the distribution becomes singular as a -> 1 (harmonic series) and almost all trial samples will be floating infinite and get rejected as too big for long. The cure is to cut off the distribution at the size of long before generating the trial values, i.e., scale the mapping from (0, 1] to the majorizing distribution. That gives speedups of many thousands for values that aren't even that near to 1. |
I believe @rkern is working on a NEP for breaking distribution compatibility. If we don't want to wait for that though then erroring out on problematic values seems reasonable. If we do go the error route, then I think it's better to trigger that based on parameters exceeding some numeric threshold, rather than checking the rejection count. Using the rejection count means that the same call with the same parameters might sometimes work and sometimes error out depending on the seed. This seems particularly miserable if you have a script using parameters that are mostly safe, but then fail for like 1/100 seeds... |
How about making |
It should not be difficult to compute the cutoff for the parameter, we just need to decide what ratio of acceptance to failure we are willing to live with. In general, I would regard an algorithm of this type as pretty poor if only half the trials were accepted ... |
My approach wasn't flying here, and a lot of the older random stuff will probably end up preserved for back-compat anyway, based on the NEP. If this becomes a priority it can be reopened or restarted, etc. |
A cleaner fix might be possible if I were to i.e., move the C algorithm into Cython; leaving the
while
loop in C seems to require a bit of low-level intervention to enable Keyboard Interrupt handling & using the Python C API as suggested in the referenced issue results in a segfault after catching ctrl+c.I'm also not a big fan of the unit test structure with Python tempfile script (!) (not even sure that will work on CI)--perhaps there are better suggestions--probing
SIGINT
handling on pathological input seems like a tricky thing to test.Fixes Issue BUG: np.random.zipf hangs the interpreter on pathological input #9829 by allowing a keyboard interruption
to break out of zipf() computations that run for very
long times because of pathological inputs
Added a unit test that verifies handling of SIGINT
by zipf()