8000 BUG: histogram small range robust by tylerjereddy · Pull Request #24161 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: histogram small range robust #24161

New issue
8000

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2023
Merged

Conversation

tylerjereddy
Copy link
Contributor
  • Fixes BUG: IndexError when using np.histogram with small values #23110

  • the histogram norm variable is used to determine the bin index of input values, and norm is calculated in some cases by dividing n_equal_bins by the range of the data; when the range of the data is extraordinarily small, the norm can become floating point infinity

  • in this patch, we delay calculating norm to increase resistance to the generation of infinite values--for example, a really small input value divided by a really small range is more resistant to generating infinity, so we effectively just change the order of operations a bit

  • however, I haven't considered whether this is broadly superior for resisting floating point non-finite values for other histogram input/extreme value permutations--one might speculate that this is just patching one extreme case that happened to show up in the wild, but may increase likelihood of some other extreme case that isn't in our testsuite yet

  • the main logic for this patch is that it fixes an issue that occurred in the wild and adds a test for
    it--if another extreme value case eventually pops up, at least this case will have a regression guard to keep guiding us in the right direction

* Fixes numpy#23110

* the histogram `norm` variable is used to determine the bin
index of input values, and `norm` is calculated in some cases
by dividing `n_equal_bins` by the range of the data; when the
range of the data is extraordinarily small, the `norm` can become
floating point infinity

* in this patch, we delay calculating `norm` to increase resistance
to the generation of infinite values--for example, a really small
input value divided by a really small range is more resistant
to generating infinity, so we effectively just change the order
of operations a bit

* however, I haven't considered whether this is broadly superior
for resisting floating point non-finite values for other `histogram`
input/extreme value permutations--one might speculate that this is just
patching one extreme case that happened to show up in the wild, but
may increase likelihood of some other extreme case that isn't in our
testsuite yet

* the main logic for this patch is that it fixes an issue that
occurred in the wild and adds a test for
it--if another extreme value case eventually pops up, at least
this case will have a regression guard to keep guiding us in the right
direction
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jul 10, 2023
@seberg seberg merged commit d0a8c72 into numpy:main Jul 11, 2023
@seberg
Copy link
Member
seberg commented Jul 11, 2023

Let's try things, thanks Tyler. Not sure I care for backporting, since Tyler says he isn't quite sure about it being generally better for robubstness (and I guess also identical).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: IndexError when using np.histogram with small values
3 participants
0