-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Fix crashes when using float32 values in uniform histograms #10324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/lib/tests/test_histograms.py
Outdated
|
||
# gh-10322 means that the float64 decays to float32. | ||
assert_equal(x_loc.dtype, np.float32) | ||
assert_equal(count, [1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit at odds with this code... I have modified np.result_type
to only special case Python float, integer, (and complex). Even adding a warning on it about the switch, there are only a handful of places in NumPy such a change. And all of them seem to explicitly test corner cases (except possibly the default value in np.select
which arguably should be modified to None
which then could use result.dtype.type()
as default).
So if you look at the code chunk with comments below, the result_type
returns the "imprecise" version currently. If that would opt into a more precise version, this changes to [0]
as commented. However, warnings about such a change seems pretty drastic/weird and potentially impossible to get right?: We have to pull the plug on gh-10322 eventually!
In longer words: I am in a conundrum, that I we need to fix a behaviour here (potentially with semi-public functions to opt-in to the"future" behaviour early).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe its not actually a big issue, we just a lot of warnings and users can only avoid them by casting their inputs. The problem may just be that I pulled the plug on np.result_type
for testing, but did not modify the ufunc code yet (and that is too big of a throw to do at once).
Then, in the future, this will switch to 0
when bins are typed at least.
Just to be clear, this does all the computations in |
numpy/lib/histograms.py
Outdated
# gh-10322 forces us to request this now to avoid inconsistency | ||
bin_type = np.result_type(first_edge, last_edge, a) | ||
if np.issubdtype(bin_type, np.integer): | ||
bin_type = np.result_type(bin_type, float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the result type here with Python float
? Might add a note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be float
no matter what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we can just write bin_type = float
? Seems reasonable. (Or None
, since linspace
will take care.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems more flexible in case we add an int128
that only safely casts into a quad double or something in the distant future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just let linspace
deal with that case, i.e., set bin_type = None
here (or at float32
to the one above). Obviously, not a big deal though.
Yes. This is fine, because we already have special handling of loss of precision in the uniform search |
What about |
I'll update the test to loop over all the floating point types |
numpy/lib/histograms.py
Outdated
@@ -318,9 +318,15 @@ def _get_bin_edges(a, bins, range, weights): | |||
raise ValueError('`bins` must be 1d, when an array') | |||
|
|||
if n_equal_bins is not None: | |||
# gh-10322 forces us to request this now to avoid inconsistency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and in the other comment below, ideally explain in a sentence why we are doing this, and then add "see gh-10322" - future developers will thank you!
Fixes numpy#8123, closes numpy#9189, fixes numpy#10319 This is a workaround to numpy#10322, not a fix for it. Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
7b79175
to
1122303
Compare
Updated with better comments and more thorough tests. Dropped the ball a little on this one. |
Thanks Eric. |
This covers the changes made in numpygh-10324 and numpygh-11023
Fixes #8123, closes #9189, fixes #10319
This is a workaround to #10322, not a fix for it.