10000 BUG: histogram small range robust · numpy/numpy@4d8a833 · GitHub
[go: up one dir, main page]

Skip to content

Commit 4d8a833

Browse files
committed
BUG: histogram small range robust
* Fixes #23110 * the histogram `norm` variable is used to determine the bin index of input values, and `norm` is calculated in some cases by dividing `n_equal_bins` by the range of the data; when the range of the data is extraordinarily small, the `norm` can become floating point infinity * in this patch, we delay calculating `norm` to increase resistance to the generation of infinite values--for example, a really small input value divided by a really small range is more resistant to generating infinity, so we effectively just change the order of operations a bit * however, I haven't considered whether this is broadly superior for resisting floating point non-finite values for other `histogram` input/extreme value permutations--one might speculate that this is just patching one extreme case that happened to show up in the wild, but may increase likelihood of some other extreme case that isn't in our testsuite yet * the main logic for this patch is that it fixes an issue that occurred in the wild and adds a test for it--if another extreme value case eventually pops up, at least this case will have a regression guard to keep guiding us in the right direction
1 parent c19ce9c commit 4d8a833

File tree

2 files changed

+12
-2
lines changed

2 files changed

+12
-2
lines changed

numpy/lib/histograms.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -811,7 +811,8 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
811811
n = np.zeros(n_equal_bins, ntype)
812812

813813
# Pre-compute histogram scaling factor
814-
norm = n_equal_bins / _unsigned_subtract(last_edge, first_edge)
814+
norm_numerator = n_equal_bins
815+
norm_denom = _unsigned_subtract(last_edge, first_edge)
815816

816817
# We iterate over blocks here for two reasons: the first is that for
817818
# large arrays, it is actually faster (for example for a 10^8 array it
@@ -839,7 +840,8 @@ def histogram(a, bins=10, range=None, density=None, weights=None):
839840

840841
# Compute the bin indices, and for values that lie exactly on
841842
# last_edge we need to subtract one
842-
f_indices = _unsigned_subtract(tmp_a, first_edge) * norm
843+
f_indices = ((_unsigned_subtract(tmp_a, first_edge) / norm_denom)
844+
* norm_numerator)
843845
indices = f_indices.astype(np.intp)
844846
indices[indices == n_equal_bins] -= 1
845847

numpy/lib/tests/test_histograms.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -409,6 +409,14 @@ def test_big_arrays(self):
409409
assert_equal(type(hist), type((1, 2)))
410410

411411

412+
def test_gh_23110(self):
413+
hist, e = np.histogram(np.array([-0.9e-308], dtype='>f8'),
414+
bins=2,
415+
range=(-1e-308, -2e-313))
416+
expected_hist = np.array([1, 0])
417+
asse 532A rt_array_equal(hist, expected_hist)
418+
419+
412420
class TestHistogramOptimBinNums:
413421
"""
414422
Provide test coverage when using provided estimators for optimal number of

0 commit comments

Comments
 (0)
0