BUG: Raise if histogram cannot create finite bin sizes #27148

timhoffm · 2024-08-08T21:27:20Z

When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, histogram then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them.

Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size.

Closes #27142.

When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, `histogram` then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them. Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size. Closes numpy#27142.

timhoffm · 2024-08-08T21:35:00Z

numpy/lib/tests/test_histograms.py

@@ -270,7 +270,7 @@ def test_object_array_of_0d(self):
            histogram, [np.array(0.4) for i in range(10)] + [np.inf])

        # these should not crash
-        np.histogram([np.array(0.5) for i in range(10)] + [.500000000000001])
+        np.histogram([np.array(0.5) for i in range(10)] + [.500000000000002])


Note: This test was failing because one of the created bins had zero width. I assume this was not the intention of the test. It was added in #10268, which is about coercing values of object arrays. I've increased the value minimally to not run into the zero-width bin case.

ngoldbaum · 2024-08-08T21:56:16Z

Not sure if the pypy failure is real.

timhoffm · 2024-08-08T22:38:44Z

Not sure if the pypy failure is real.

Can't tell. But even if it's real, it seems unrelated to the PR, because it's an assertion rewrite problem in test_simd.py.

mattip

Makes sense to me, and tests (including the new one) are passing. This even caught an edge case in an existing test.

ngoldbaum · 2024-08-09T13:01:54Z

The pypy failure went away when I re-triggered it so it must be flake.

Thanks @timhoffm!

djhoese · 2024-08-22T19:42:26Z

Sorry for commenting on a merged PR, but I'm not sure this is issue worthy. I'm having trouble consistently producing the error message added here. If I copy the test case then I get the error. If I increase the base number of the arrays then it works fine:

In [31]: np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[31], line 1
----> 1 np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
...
ValueError: Too many bins for data range. Cannot create 10 finite-sized bins.

In [32]: np.histogram_bin_edges(np.array([2.0, 2.0 + 2e-16] * 10))
Out[32]: array([1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5])

Why does starting the array with 1.0 give different bins that starting with 2.0?

For context, I'm trying to reproduce one of my unit tests that has started failing with this new error, but copying and pasting the repr of the array (np.array([999.9999989758, 999.9999989758, 999.9999989758, 999.9999989758])) involved does not trigger the error interactively like it does during the test.

timhoffm · 2024-08-22T20:27:23Z

np.array([2.0, 2.0 + 2e-16]).ptp() is 0. Precision limits put all values to the same number. There is special handling for a histogram of all the same number: Because there is no intrinsic scale but it's a valid use case, some "suitable" bins around the value are chosen. This expansion is not triggered if there are (even tiny) differences in the numbers, which again because of floating point precision is the case for the test.

github-actions bot added the 00 - Bug label Aug 8, 2024

timhoffm commented Aug 8, 2024

View reviewed changes

mattip approved these changes Aug 9, 2024

View reviewed changes

ngoldbaum merged commit 251f7e1 into numpy:main Aug 9, 2024
67 checks passed

timhoffm deleted the histogram-small-range branch August 9, 2024 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Raise if histogram cannot create finite bin sizes #27148

BUG: Raise if histogram cannot create finite bin sizes #27148

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: Raise if histogram cannot create finite bin sizes #27148

BUG: Raise if histogram cannot create finite bin sizes #27148

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!