8000 DOC, MAINT: Clarify error message of random.randint. by maxwell-aladago · Pull Request #14333 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC, MAINT: Clarify error message of random.randint. #14333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

maxwell-aladago
Copy link
Contributor
@maxwell-aladago maxwell-aladago commented Aug 22, 2019

Improve the error raised by random.randint when low <= 0 and high is not given.

The previous behaviour to throw ValueError: ValueError: low >= high which is correct but confusing unless you read the source code. I don't think adding that extra check early on will degrade performance.

@charris charris changed the title EHN: clarified error message of random.randint when low <= 0 when hig… DOC, MAINT: Clarify error message of random.randint. Aug 23, 2019
@mattip
Copy link
Member
mattip commented Aug 24, 2019

LGTM. Any reason not to merge?

@eric-wieser eric-wieser self-requested a review August 24, 2019 17:58
10000
@@ -434,6 +434,8 @@ cdef class Generator:

"""
if high is None:
if isinstance(low, (int, np.integer)) and low <= 0:
Copy link
Member
@eric-wieser eric-wieser Aug 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really a fan of this type of detection - it doesn't work on things that are coercible to integers, and is generally fragile at actually preventing the original error from occuring

Could you instead use something like the following?

high_and_low_swapped = high is None:
if high_and_low_swapped:
    high = low
    low = 0

try:
    # the chain of ifs below
except ValueError as e:
    if high_and_low_swapped and 'low >= high' == str(e):
        raise ValueError(...) from None
    raise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @eric-wieser . Just pushed the changes.

ret = _rand_bool(low, high, size, _masked, endpoint, &self._bitgen, self.lock)
except ValueError as e:
if high_and_low_swapped and 'low >= high' == str(e):
e.args = ("low must be greater than 0 when high is not given.", )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this works correctly in displayed stack traces. What's the trade-off between doing this and raising a brand new exception from None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the stack trace is good. As an example, the code

import numpy as np
rng = np.random.default_rng()
rng.integers(0)

produces the stacktrace

Traceback (most recent call last):
  File "/Users/maxwellaladago/Documents/pub/numpy-max/perms.py", line 4, in <module>
    rng.integers(0)
  File "generator.pyx", line 458, in numpy.random.generator.Generator.integers
  File "bounded_integers.pyx", line 1228, in numpy.random.bounded_integers._rand_int64
ValueError: low must be greater than 0 when hig
10000
h is not given.

The problem with the raise from None is that it contains the phrase raise from None which can be confusing. Additionally, it doesn't capture the trace of the original exception. For example,

import numpy as np
try:
    rng = np.random.default_rng()
    rng.integers(0)
except ValueError as e:
    raise ValueError("just a demo") from None

will produce the stacktrace

Traceback (most recent call last):
  File "/Users/maxwellaladago/Documents/pub/numpy-max/perms.py", line 6, in <module>
    raise ValueError("just a demo") from None
ValueError: just a demo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just preempting the whole thing and raising the error early on? That seems simpler to parse to me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry :(, I completely missed that high can be a very large array, making the check a bit expensive maybe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolled back to the previous implementations @seberg. If there was a quicker way to check for violations in a potentially large 'high', preemptive checks would have been the ideal thing to do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could time it, maybe it doesn't make a difference in either case since random number generation is much more expensive. I agree with Eric's point that editing the message seems a bit strange.
Your above example with the from None is outside the actual function, if you lose one level of the original backtrace (which is fully inside numpy), that does not actually seem to matter. Could you just check how it looks if you change the actual code? My expectation is that it is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Raising the error from none does work well. My guess is that the number of calls with high=None and low having one or more values less than 1 will be small relative to the number of calls overall.

@bashtage
Copy link
Contributor
bashtage commented Sep 13, 2019

It seems strange to have one error check in this file (which is then duplicated).

Why not just expand the current error message to be "low >= high. If high was not provided, then low must be greater than 0.". This is really simple and avoids extra complexity. The end-user will get the correct message.

@maxwell-aladago
Copy link
Contributor Author
maxwell-aladago commented Sep 13, 2019

It seems strange to have one error check in this file (which is then duplicated).

Why not just expand the current error message to be "low >= high. If high was not provided, then low must be greater than 0.". This is really simple and avoids extra complexity. The end-user will get the correct message.

That involves changing the error message in all the functions inside the try catch block which may affect other parts of the codebase.

@bashtage
Copy link
Contributor
bashtage commented Sep 13, 2019

@maxwell-aladago There is only 1 function. They are templated.

  • Probably 2, but I wrote them, and changing an exception message doesn't affect the code base.

@bashtage
Copy link
Contributor

@bashtage
Copy link
Contributor

Two more places,

https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L163
https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L289

Could easily refactor to a shared error message (before any of the functions, about here: https://github.com/numpy/numpy/blob/master/numpy/random/bounded_integers.pyx.in#L22,

low_high_error = 'low {comp} high. If high was not provided, then low must be greater than 0.'

and then the four ValueErrors become

raise ValueError('low_high_error'.format(comp=comp)).

Increasing complexity for an error message does not seem worth the cost to me.

@mattip
Copy link
Member
mattip commented Dec 3, 2019

@maxwell-aladago thoughts? I think the general direction is to close this as "too complicated". Maybe there is a better way?

@bashtage
Copy link
Contributor

@mattip Given randint is legacy only it might make sense to leave it. If there are changes needed, they should be clarified to apply to Generator.intergers

@mattip mattip added the 57 - Close? Issues which may be closable unless discussion continued label Oct 12, 2020
Base automatically changed from master to main March 4, 2021 02:04
bashtage added a commit to bashtage/numpy that referenced this pull request Mar 17, 2021
Improve the exception when low is 0 in case the single input
form was used.

closes numpy#14333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
03 - Maintenance 04 - Documentation 57 - Close? Issues which may be closable unless discussion continued component: numpy.random
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0