8000 BUG: Raise if histogram cannot create finite bin sizes by timhoffm · Pull Request #27148 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Raise if histogram cannot create finite bin sizes #27148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

timhoffm
Copy link
Contributor
@timhoffm timhoffm commented Aug 8, 2024

When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, histogram then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them.

Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size.

Closes #27142.

When many bins are requested in a small value region,
it may not be possible to create enough distinct bin
edges due to limited numeric precision. Up to now,
`histogram` then returned identical subsequent bin
edges, which would mean a bin width of 0. These bins
could also have counts associated with them.

Instead of returning such unlogical bin distributions,
this PR raises a value error if the calculated bins
do not all have a finite size.

Closes numpy#27142.
@@ -270,7 +270,7 @@ def test_object_array_of_0d(self):
histogram, [np.array(0.4) for i in range(10)] + [np.inf])

# these should not crash
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000001])
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000002])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This test was failing because one of the created bins had zero width. I assume this was not the intention of the test. It was added in #10268, which is about coercing values of object arrays. I've increased the value minimally to not run into the zero-width bin case.

@ngoldbaum
Copy link
Member

Not sure if the pypy failure is real.

@timhoffm
Copy link
Contributor Author
timhoffm commented Aug 8, 2024

Not sure if the pypy failure is real.

Can't tell. But even if it's real, it seems unrelated to the PR, because it's an assertion rewrite problem in test_simd.py.

Copy link
Member
@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, and tests (including the new one) are passing. This even caught an edge case in an existing test.

@ngoldbaum ngoldbaum merged commit 251f7e1 into numpy:main Aug 9, 2024
67 checks passed
@ngoldbaum
Copy link
Member

The pypy failure went away when I re-triggered it so it must be flake.

Thanks @timhoffm!

@timhoffm timhoffm deleted the histogram-small-range branch August 9, 2024 13:13
@djhoese
Copy link
Contributor
djhoese commented Aug 22, 2024

Sorry for commenting on a merged PR, but I'm not sure this is issue worthy. I'm having trouble consistently producing the error message added here. If I copy the test case then I get the error. If I increase the base number of the arrays then it works fine:

In [31]: np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[31], line 1
----> 1 np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
...
ValueError: Too many bins for data range. Cannot create 10 finite-sized bins.

In [32]: np.histogram_bin_edges(np.array([2.0, 2.0 + 2e-16] * 10))
Out[32]: array([1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5])

Why does starting the array with 1.0 give different bins that starting with 2.0?

For context, I'm trying to reproduce one of my unit tests that has started failing with this new error, but copying and pasting the repr of the array (np.array([999.9999989758, 999.9999989758, 999.9999989758, 999.9999989758])) involved does not trigger the error interactively like it does during the test.

@timhoffm
Copy link
Contributor Author

np.array([2.0, 2.0 + 2e-16]).ptp() is 0. Precision limits put all values to the same number. There is special handling for a histogram of all the same number: Because there is no intrinsic scale but it's a valid use case, some "suitable" bins around the value are chosen. This expansion is not triggered if there are (even tiny) differences in the numbers, which again because of floating point precision is the case for the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: zero-width histogram bins if the data values are in a small range close to numeric precision
4 participants
0