-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c20310e
to
39117c5
Compare
This needs a test-case for the thing you're trying to fix. Commit message should start with Also, you'll need to get the other tests to pass too ;). Thanks for solving the rebasing issues |
ok, tnx. I'll do the changes. The tests are failing locally too, just wanted to push to have the fix on github. |
i think i solved the problem. Finding a clean test case was a bit tricky since the data set where i encountered this problem was like 2 GB and had to bracket down the offending number. Turned out |
I don't like that with this solution the result will depend on the I think the real reason this goes wrong is that the array comparison with a numpy scalar is not properly done (which is actually a bigger bug...): # not good: float32 array with float64 array scalar (or regular scalar)
np.array([1.], dtype=np.float32)>=np.array(1.00000001, dtype=np.float64)
# array([ True], dtype=bool)
# not good float32 array with python float
np.array([1.], dtype=np.float32)>=1.00000001
# array([ True], dtype=bool)
# OK float32 array with float64 array
np.array([1.], dtype=np.float32)>=np.array([1.00000001], dtype=np.float64)
# array([False], dtype=bool)
# also OK: (array) scalar with (array) scalar
np.array(1., dtype=np.float32)>=np.array(1.00000001, dtype=np.float64)
# False one solution here is to cast |
p.s. I fear solving the real problem will take some discussion about what the desired behaviour for all ufuncs is, so perhaps we should go ahead and address the issue here. Which would mean avoiding array scalars. Or maybe more cleanly, do the inclusion of values in the right range after calculating the indices (i.e., remove indices that are too large or small). |
actually I thought about that too. that 1 < 1 + tiny so the count should be zero. maybe casting to the input data and the range (mn, mx) to the highest precision of both ? we should keep in mind that there is also numpy.float128
|
Good point about |
scratch/test.py
Outdated
range=np.array([1.0 + tiny_shift, 2.0]).astype('f8') | ||
) | ||
|
||
print('done') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was just a temporary test (probably i committed it by mistake)
a20c42e
to
89fa4bb
Compare
I've gone ahead and rebased this - some of the fixes were already applied in #10278
|
89fa4bb
to
c5e7b2d
Compare
@@ -237,6 +237,10 @@ def _get_outer_edges(a, range): | |||
first_edge = first_edge - 0.5 | |||
last_edge = last_edge + 0.5 | |||
|
|||
if a.dtype.kind not in ['i']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be np.issubdtype(a.dtype, np.inexact)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here why we are doing this: I think we're still ensuring ufunc casting does not cause problems...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative fix might just be to promote the bounds to 0d arrays
I think the rebased PR looks good, modulo the small request for adding a comment on why the dtype case is done. |
Fixes numpy#8123, closes numpy#9189, fixes numpy#10319 This is a workaround to numpy#10322, not a fix for it. Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
Fixes numpy#8123, closes numpy#9189, fixes numpy#10319 This is a workaround to numpy#10322, not a fix for it. Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
When the input data hsa a lower precision than the specified ranges and error is raised in bincount.
This has been addressed in issues #8123
The problem is caused by higher precision
mn
,mx
. In this pull request, i castmn
andmx
to the same type as the input data(messed up the previous pull request...could not reopen it)