-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Raise if histogram cannot create finite bin sizes #27148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, `histogram` then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them. Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size. Closes numpy#27142.
@@ -270,7 +270,7 @@ def test_object_array_of_0d(self): | |||
histogram, [np.array(0.4) for i in range(10)] + [np.inf]) | |||
|
|||
# these should not crash | |||
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000001]) | |||
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000002]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: This test was failing because one of the created bins had zero width. I assume this was not the intention of the test. It was added in #10268, which is about coercing values of object arrays. I've increased the value minimally to not run into the zero-width bin case.
Not sure if the pypy failure is real. |
Can't tell. But even if it's real, it seems unrelated to the PR, because it's an assertion rewrite problem in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me, and tests (including the new one) are passing. This even caught an edge case in an existing test.
The pypy failure went away when I re-triggered it so it must be flake. Thanks @timhoffm! |
Sorry for commenting on a merged PR, but I'm not sure this is issue worthy. I'm having trouble consistently producing the error message added here. If I copy the test case then I get the error. If I increase the base number of the arrays then it works fine: In [31]: np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[31], line 1
----> 1 np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
...
ValueError: Too many bins for data range. Cannot create 10 finite-sized bins.
In [32]: np.histogram_bin_edges(np.array([2.0, 2.0 + 2e-16] * 10))
Out[32]: array([1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5]) Why does starting the array with 1.0 give different bins that starting with 2.0? For context, I'm trying to reproduce one of my unit tests that has started failing with this new error, but copying and pasting the |
|
When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now,
histogram
then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them.Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size.
Closes #27142.