8000 BUG: Fix crashes when using float32 values in uniform histograms by eric-wieser · Pull Request #10324 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Fix crashes when using float32 values in uniform histograms #10324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 9, 2018

Conversation

eric-wieser
Copy link
Member

Fixes #8123, closes #9189, fixes #10319

This is a workaround to #10322, not a fix for it.


# gh-10322 means that the float64 decays to float32.
assert_equal(x_loc.dtype, np.float32)
assert_equal(count, [1])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #9189, the results here were np.float64 and [0], which are self-consistent, but at odds with #10322.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit at odds with this code... I have modified np.result_type to only special case Python float, integer, (and complex). Even adding a warning on it about the switch, there are only a handful of places in NumPy such a change. And all of them seem to explicitly test corner cases (except possibly the default value in np.select which arguably should be modified to None which then could use result.dtype.type() as default).

So if you look at the code chunk with comments below, the result_type returns the "imprecise" version currently. If that would opt into a more precise version, this changes to [0] as commented. However, warnings about such a change seems pretty drastic/weird and potentially impossible to get right?: We have to pull the plug on gh-10322 eventually!

In longer words: I am in a conundrum, that I we need to fix a behaviour here (potentially with semi-public functions to opt-in to the"future" behaviour early).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its not actually a big issue, we just a lot of warnings and users can only avoid them by casting their inputs. The problem may just be that I pulled the plug on np.result_type for testing, but did not modify the ufunc code yet (and that is too big of a throw to do at once).

Then, in the future, this will switch to 0 when bins are typed at least.

@charris
Copy link
Member
charris commented Jan 4, 2018

Just to be clear, this does all the computations in float32 for equally spaced bins and float32 data?

# gh-10322 forces us to request this now to avoid inconsistency
bin_type = np.result_type(first_edge, last_edge, a)
if np.issubdtype(bin_type, np.integer):
bin_type = np.result_type(bin_type, float)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the result type here with Python float? Might add a note.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be float no matter what.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we can just write bin_type = float? Seems reasonable. (Or None, since linspace will take care.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems more flexible in case we add an int128 that only safely casts into a quad double or something in the distant future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just let linspace deal with that case, i.e., set bin_type = None here (or at float32 to the one above). Obviously, not a big deal though.

@eric-wieser
Copy link
Member Author

this does all the computations in float32

Yes. This is fine, because we already have special handling of loss of precision in the uniform search

@charris
Copy link
Member
charris commented Jan 4, 2018

What about float16? Is it supported?

@eric-wieser
Copy link
Member Author

I'll update the test to loop over all the floating point types

@@ -318,9 +318,15 @@ def _get_bin_edges(a, bins, range, weights):
raise ValueError('`bins` must be 1d, when an array')

if n_equal_bins is not None:
# gh-10322 forces us to request this now to avoid inconsistency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in the other comment below, ideally explain in a sentence why we are doing this, and then add "see gh-10322" - future developers will thank you!

Fixes numpy#8123, closes numpy#9189, fixes numpy#10319

This is a workaround to numpy#10322, not a fix for it.

Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
@eric-wieser eric-wieser force-pushed the histogram-range-comparison branch from 7b79175 to 1122303 Compare February 2, 2018 09:04
@eric-wieser
Copy link
Member Author

Updated with better comments and more thorough tests. Dropped the ball a little on this one.

@charris charris merged commit 7c4c213 into numpy:master Feb 9, 2018
@charris
Copy link
Member
charris commented Feb 9, 2018

Thanks Eric.

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in histogram: incorrectly handles float32 inputs just below lower bin limit histogram failure due to loss of precision
4 participants
0