BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189

mherkazandjian · 2017-05-30T23:27:09Z

When the input data hsa a lower precision than the specified ranges and error is raised in bincount.
This has been addressed in issues #8123

The problem is caused by higher precision mn, mx. In this pull request, i cast mn and mx to the same type as the input data

(messed up the previous pull request...could not reopen it)

eric-wieser · 2017-05-30T23:42:00Z

This needs a test-case for the thing you're trying to fix. Commit message should start with BUG: and include something like fixes #8123 to make it auto-close the issue.

Also, you'll need to get the other tests to pass too ;). Thanks for solving the rebasing issues

mherkazandjian · 2017-05-30T23:47:55Z

ok, tnx. I'll do the changes. The tests are failing locally too, just wanted to push to have the fix on github.

mherkazandjian · 2017-05-31T16:49:06Z

i think i solved the problem. Finding a clean test case was a bit tricky since the data set where i encountered this problem was like 2 GB and had to bracket down the offending number. Turned out
the problem can be reproduced with just one input value.

mhvk · 2017-05-31T18:13:38Z

I don't like that with this solution the result will depend on the dtype of the value even for a perfectly well-represented value like 1: for float32 it is counted as inside the bin, but for float64 and int you get 0. I think in your test case, the value should simply be counted as being out of range.

I think the real reason this goes wrong is that the array comparison with a numpy scalar is not properly done (which is actually a bigger bug...):

# not good: float32 array with float64 array scalar (or regular scalar)
np.array([1.], dtype=np.float32)>=np.array(1.00000001, dtype=np.float64)
# array([ True], dtype=bool)

# not good float32 array with python float
np.array([1.], dtype=np.float32)>=1.00000001
# array([ True], dtype=bool)

# OK float32 array with float64 array
np.array([1.], dtype=np.float32)>=np.array([1.00000001], dtype=np.float64)
# array([False], dtype=bool)

# also OK: (array) scalar with (array) scalar
np.array(1., dtype=np.float32)>=np.array(1.00000001, dtype=np.float64)
# False

one solution here is to cast mn, mx always as float64 arrays with shape (1,). But perhaps we should fix the real problem... See gh-9194

mhvk · 2017-05-31T18:20:04Z

p.s. I fear solving the real problem will take some discussion about what the desired behaviour for all ufuncs is, so perhaps we should go ahead and address the issue here. Which would mean avoiding array scalars. Or maybe more cleanly, do the inclusion of values in the right range after calculating the indices (i.e., remove indices that are too large or small).

mherkazandjian · 2017-05-31T18:37:26Z

actually I thought about that too. that 1 < 1 + tiny so the count should be zero.

maybe casting to the input data and the range (mn, mx) to the highest precision of both ?

we should keep in mind that there is also numpy.float128

  type                   precision
  ------                 ------------
  float32                1.1920929e-07
  float64                2.2204460492503131e-16
  float128               1.084202172485504434e-19

mhvk · 2017-05-31T18:45:06Z

Good point about float128 -- my own feeling would be to have as little as possible of these complications in histogram itself. Indeed, I would suggest to remove those + 0.0, which seem only necessary to allow an in-place subtraction/addition of 0.5 -- instead, just write those as explicit assignments: mn = mn - 0.5, etc. And then check the indices instead of the array before sending off to bincount.

eric-wieser · 2017-10-19T07:38:50Z

scratch/test.py

+    range=np.array([1.0 + tiny_shift, 2.0]).astype('f8')
+)
+
+print('done')


This file should be removed

yes, this was just a temporary test (probably i committed it by mistake)

eric-wieser · 2017-12-27T07:37:50Z

I've gone ahead and rebased this - some of the fixes were already applied in #10278
@mherkazandjian, if you want to continue working on this, you should do

git fetch
git checkout histogram-precision
git reset --hard origin/histogram-precision

Fixes numpy#8123

eric-wieser · 2017-12-27T07:40:11Z

numpy/lib/histograms.py

@@ -237,6 +237,10 @@ def _get_outer_edges(a, range):
        first_edge = first_edge - 0.5
        last_edge = last_edge + 0.5

+    if a.dtype.kind not in ['i']:


Perhaps this should be np.issubdtype(a.dtype, np.inexact)?

Add a comment here why we are doing this: I think we're still ensuring ufunc casting does not cause problems...

An alternative fix might just be to promote the bounds to 0d arrays

eric-wieser · 2017-12-27T07:41:27Z

@mhvk: Your comments were addressed in #10278, but it seems that the extra patch here (or similar) is needed too.

mhvk · 2017-12-29T19:29:04Z

I think the rebased PR looks good, modulo the small request for adding a comment on why the dtype case is done.

Fixes numpy#8123, closes numpy#9189, fixes numpy#10319 This is a workaround to numpy#10322, not a fix for it. Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.

mherkazandjian closed this May 30, 2017

mherkazandjian force-pushed the histogram-precision branch from c20310e to 39117c5 Compare May 30, 2017 23:31

mherkazandjian reopened this May 30, 2017

mherkazandjian changed the title ~~Histogram precision bugfix due to higher precision in internal mn, mx~~ BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx May 30, 2017

mhvk mentioned this pull request May 31, 2017

BUG: comparisons between scalars and arrays with different precision #9194

Closed

charris added 00 - Bug component: numpy.lib labels May 31, 2017

eric-wieser reviewed Oct 19, 2017

View reviewed changes

eric-wieser force-pushed the histogram-precision branch 2 times, most recently from a20c42e to 89fa4bb Compare December 27, 2017 07:37

BUG: Match precision of range argument to array

c5e7b2d

Fixes numpy#8123

eric-wieser force-pushed the histogram-precision branch from 89fa4bb to c5e7b2d Compare December 27, 2017 07:38

eric-wieser reviewed Dec 27, 2017

View reviewed changes

This was referenced Jan 3, 2018

Bug in histogram: incorrectly handles float32 inputs just below lower bin limit #10319

Closed

BUG: Fix crashes when using float32 values in uniform histograms #10324

Merged

charris closed this in #10324 Feb 9, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189

BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189

BUG: fixes #8123 Histogram precision issue due to higher precision in internal mn, mx #9189

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!