8000 DOC: Clarify behavior in np.random.uniform by gfyoung · Pull Request #7026 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

DOC: Clarify behavior in np.random.uniform #7026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 29, 2016
Merged

DOC: Clarify behavior in np.random.uniform #7026

merged 1 commit into from
Jan 29, 2016

Conversation

gfyoung
Copy link
Contributor
@gfyoung gfyoung commented Jan 16, 2016

Fixes issue in np.random.uniform in which it was generating numbers for invalid bounds such as
low = 0 and high = -1. It now raises a ValueError instead like np.random.randint when such arguments are passed in.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

Note that this PR only checks for whether low > high. Technically, I believe it should be checking for whether low >= high. However, as I have been seeing with #6938, casting to floats causes all sorts of rounding issues, I think it would be best to leave that "=" part out. Thoughts?

@gfyoung gfyoung changed the title BUG: Add Bounds Checking for np.random.uniform MAINT: Add bounds checking for np.random.uniform Jan 16, 2016
@jaimefrio
Copy link
Member

I think you should add the =. If the rounding causes high to be less than or equal to low, rather than making up values for an undefined situation, it is better to let the user know that his input is unsafe.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

Could I put a warning perhaps for equality? I'm afraid to throw errors because seemingly innocuous inputs can cause errors like these inputs low = np.int64(np.iinfo(np.int64).max), high = np.uint64(np.iinfo(np.int64).max + 1000)

@jaimefrio
Copy link
Member

In closer look, the case for low == high is well defined: the random distribution will have all values equal to low, so it is not such a terrible thing. Perhaps what is needed is a note in the docstring about low = high yielding a constant distribution?

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

Hmmm...I suppose so. I guess you could expand the doc string to say that it returns [low, high) for high > low and a constant distribution should they be equal. As the doc-string currently stands, it doesn't make much sense to be generating numbers uniformly over the distribution [0, 0) for example. How does that sound?

@argriffing
Copy link
Contributor

If a user wants to generate random numbers between a and b without regard to their relative order, they can currently do x = np.random.uniform(a, b) if I understand correctly. This change would break their code, right?

@jaimefrio
Copy link
Member

Yup, maybe all this needs is a docstring change and no error raising. The case where a > b generates numbers in (b, a]...

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

Hmmm...I guess it's a question of consistency in the codebase then. We check that low and high are properly passed in for randint but not for uniform?

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

Also, if we go with allowing such flexibility, I might also suggest changing the variable names since they would no longer be completely representative of what goes on behind the scenes. Though given that they are keyword args, that may not be plausible at the moment for backwards-compatibility reasons.

@jaimefrio
Copy link
Member

I'd say the main reason randint raises an error is because the low-level function it calls relies on a positive range being provided, not some high level user interface policy. Even if we were to decide today that the consistent user interface is a good thing, we can't simply change behavior when it risks breaking other people's code. So I am now leaning towards "raising an error requires a deprecation warning first, and some discussion on the list." I'm starting to think that raising is an unnecessarily rigid restriction, but would like to hear from others.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 16, 2016

@jaimefrio : Ah, that is true. rk_uniform does not rely on having a positive difference for it to function. I'm not necessarily proposing a common interface, but rather, making the interface to np.random.uniform clear in terms of the arguments it can accept and any necessary restrictions imposed on the relationships between these arguments.

I'm perfectly alright with keeping np.random.uniform (i.e. no bounds checking). However, if that is the case, I certainly do think a documentation change is needed to make sure that is clear. Also, the function signature I think should then be changed to something like end_pt_one=0.0, end_pt_two=1.0, size=None.

Those would be the two things I would propose. The first one looks quite straightforward, but the second one, not as much. I'd be curious to hear what others think as well about these proposals.

@gfyoung gfyoung changed the title MAINT: Add bounds checking for np.random.uniform DOC: Clarify behavior in np.random.uniform Jan 18, 2016
@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 18, 2016

@jaimefrio : I've decided to just update documentation to make sure that users are aware of this behavior, given that the C code is written to allow this flexibility.

As this is just a change in documentation, this PR should be good to merge unless someone has differing opinions about how to handle this functionality for np.random.uniform.

cases when ``low`` == ``high`` and ``low`` > ``high``. In
the former case, an ndarray containing of ints all equal to
``low`` will be returned. In the latter case, numbers will
be drawn from the uniform distribution [``high``, ``low``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think high is always the open side of the interval, and low always the closed one, so if low > high, then the interval is (high, low].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that is right. I'll make the correction as soon as I can sit down in front of a proper computer screen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaimefrio : Fixed the documentation.

@jaimefrio
Copy link
Member

I have sent a message to the mailing list. I would like to hear what people with more hands on experience using this function in the real world have to say. If there is no substantial discussion there, and I haven't merged this by the end of the week, please ping me and I will.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 19, 2016

👍 Sounds good. Thanks for sending the email out!

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 24, 2016

@jaimefrio : In light of the discussion, what do you think is best at this point?

8000

@rkern
Copy link
Member
rkern commented Jan 24, 2016

I would simplify it down to something like the following:

When ``high == low``, values of ``low`` will be returned. If ``high < low``, the
results are officially undefined and may eventually raise an error.

@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 24, 2016

@rkern : When you say "officially undefined," do you mean that when you call something like np.random.uniform(10, 1), the output you get is not valid?

@rkern
Copy link
Member
rkern commented Jan 24, 2016

It just means that we don't guarantee what the results will be in the future or that it will continue to return a value without an Exception.

Although the arguments are specified as 'high'
and 'low', it is possible to pass in values for
'low' and 'high' where 'low' >= 'high' and still
obtain well-defined behavior. The documentation
has been expanded to reflect this fact, with a
note to discourage passing in arguments satisfying
'low' > 'high'.
@gfyoung
Copy link
Contributor Author
gfyoung commented Jan 29, 2016

@jaimefrio , @rkern : Comment has been updated. Should be good to merge since this change has no effect on any code.

@rkern
Copy link
Member
rkern commented Jan 29, 2016

LGTM.

rkern added a commit that referenced this pull request Jan 29, 2016
DOC: Clarify behavior in np.random.uniform
@rkern rkern merged commit 22176f9 into numpy:master Jan 29, 2016
@gfyoung gfyoung deleted the uniform_bounds_bug branch January 29, 2016 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0