-
-
Notifications
You must be signed in to change notification settings - Fork 11k
DOC, MAINT: Clarify error message of random.randint. #14333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM. Any reason not to merge? |
numpy/random/generator.pyx
Outdated
@@ -434,6 +434,8 @@ cdef class Generator: | |||
|
|||
""" | |||
if high is None: | |||
if isinstance(low, (int, np.integer)) and low <= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really a fan of this type of detection - it doesn't work on things that are coercible to integers, and is generally fragile at actually preventing the original error from occuring
Could you instead use something like the following?
high_and_low_swapped = high is None:
if high_and_low_swapped:
high = low
low = 0
try:
# the chain of ifs below
except ValueError as e:
if high_and_low_swapped and 'low >= high' == str(e):
raise ValueError(...) from None
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eric-wieser . Just pushed the changes.
numpy/random/generator.pyx
Outdated
ret = _rand_bool(low, high, size, _masked, endpoint, &self._bitgen, self.lock) | ||
except ValueError as e: | ||
if high_and_low_swapped and 'low >= high' == str(e): | ||
e.args = ("low must be greater than 0 when high is not given.", ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this works correctly in displayed stack traces. What's the trade-off between doing this and raising a brand new exception from None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the stack trace is good. As an example, the code
import numpy as np
rng = np.random.default_rng()
rng.integers(0)
produces the stacktrace
Traceback (most recent call last):
File "/Users/maxwellaladago/Documents/pub/numpy-max/perms.py", line 4, in <module>
rng.integers(0)
File "generator.pyx", line 458, in numpy.random.generator.Generator.integers
File "bounded_integers.pyx", line 1228, in numpy.random.bounded_integers._rand_int64
ValueError: low must be greater than 0 when hig
10000
h is not given.
The problem with the raise from None
is that it contains the phrase raise from None
which can be confusing. Additionally, it doesn't capture the trace of the original exception. For example,
import numpy as np
try:
rng = np.random.default_rng()
rng.integers(0)
except ValueError as e:
raise ValueError("just a demo") from None
will produce the stacktrace
Traceback (most recent call last):
File "/Users/maxwellaladago/Documents/pub/numpy-max/perms.py", line 6, in <module>
raise ValueError("just a demo") from None
ValueError: just a demo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about just preempting the whole thing and raising the error early on? That seems simpler to parse to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry :(, I completely missed that high can be a very large array, making the check a bit expensive maybe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rolled back to the previous implementations @seberg. If there was a quicker way to check for violations in a potentially large 'high', preemptive checks would have been the ideal thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could time it, maybe it doesn't make a difference in either case since random number generation is much more expensive. I agree with Eric's point that editing the message seems a bit strange.
Your above example with the from None
is outside the actual function, if you lose one level of the original backtrace (which is fully inside numpy), that does not actually seem to matter. Could you just check how it looks if you change the actual code? My expectation is that it is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Raising the error from none
does work well. My guess is that the number of calls with high=None
and low
having one or more values less than 1 will be small relative to the number of calls overall.
It seems strange to have one error check in this file (which is then duplicated). Why not just expand the current error message to be |
That involves changing the error message in all the functions inside the try catch block which may affect other parts of the codebase. |
@maxwell-aladago There is only 1 function. They are templated.
|
The lines are here: https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L71 and https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L145 just append the second part and all is fixed. |
Two more places, https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L163 Could easily refactor to a shared error message (before any of the functions, about here: https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L22,
and then the four
Increasing complexity for an error message does not seem worth the cost to me. |
@maxwell-aladago thoughts? I think the general direction is to close this as "too complicated". Maybe there is a better way? |
@mattip Given randint is legacy only it might make sense to leave it. If there are changes needed, they should be clarified to apply to |
Improve the exception when low is 0 in case the single input form was used. closes numpy#14333
Improve the error raised by
random.randint
whenlow <= 0
andhigh
is not given.The previous behaviour to throw
ValueError: ValueError: low >= high
which is correct but confusing unless you read the source code. I don't think adding that extra check early on will degrade performance.