8000 The new range handling method will raise ValueError for arrays including NaN · Issue #7503 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

The new range handling method will raise ValueError for arrays including NaN #7503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hopehhchen opened this issue Apr 3, 2016 · 15 comments
Closed

Comments

@hopehhchen
Copy link

This issue appears after one of the latest changes to function_base.py (around Row 550 in 127eb9e on March 16, 2016). Resetting the range to array minimum and maximum by using a.min() and a.max() will raise ValueError: 'range parameter must be finite.' if the array includes NaN. Before the change, the codes assigned a pair of range parameters only when they were included in the input explicitly.

@seberg
Copy link
Member
seberg commented Apr 4, 2016

To note for everyone, this is np.histogram.

@hopehhchen can you explain what you expect? It seems to me that the old behaviour returned useless results, so I am not sure I wouldn't consider the error an improvement?

EDIT: Of course there may always be a case for roll back + deprecation if it is annoying.

@seberg
Copy link
Member
seberg commented Apr 4, 2016

For anyone else who might look at this, this seems to be the old behaviour (in 1.8.2):

In [8]: np.histogram([1, 2, np.nan, 1e10, -1e19])
Out[8]: 
(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]),
 array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan]))

which seems nonsense, the numbers don't even add up.

@hopehhchen
Copy link
Author

@seberg Thanks for the looking at this as well. Sorry I wasn't being super clear. I'm not suggesting going back to roll back, but just thinking if, instead, using np.nanmin(a) and np.nanmax(a) in the new code would make more sense. Doing this would save the users extra troubles to remove the NaNs or to manually set range = (np.nanmin(array), np.nanmax(array)).

@leewalsh
Copy link

I like @hopehhchen's suggestion of using the nan functions, though it's a little bit more of a change in behavior. And for @seberg's example an error is probably preferable.

However, this issue breaks cases when bins is a sequence. In that case, I believe that range should be determined using bins.min() and bins.max() (or maybe bins[0] and bins[-1]).

That is, these two calls should be equivalent:

In [9]: np.histogram([1, 2, np.nan, 1e10, -1e19], bins=[0, 1, 2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-37e33a638eb7> in <module>()
----> 1 np.histogram([1, 2, np.nan, 1e10, -1e19], bins=[0, 1, 2])

/usr/local/lib/python2.7/site-packages/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights, density)
    503     if not np.all(np.isfinite([mn, mx])):
    504         raise ValueError(
--> 505             'range parameter must be finite.')
    506     if mn == mx:
    507         mn -= 0.5

ValueError: range parameter must be finite.

In [10]: np.histogram([1, 2, np.nan, 1e10, -1e19], bins=[0, 1, 2], range=(0, 2))
Out[10]: (array([0, 2]), array([0, 1, 2]))

leewalsh added a commit to leewalsh/square-tracking that referenced this issue Apr 24, 2016
@seberg
Copy link
Member
seberg commented Apr 24, 2016

I think if bins are used, range is probably not used in any case? Which seems like the right behaviour to me.
Filtering out NaNs, sounds like it might be a good option, whether or not it should be default, I don't know. Maybe the default should just be to error, and ask the user to specfiy "ignore_nan=True" or something.
Or frankly maybe it is best to just error with "histogram does not support NaN, use blabla to remove NaNs first."? @leewalsh if you are interested in this issue, we always welcome pull requests!

@seberg
Copy link
Member
seberg commented Apr 24, 2016

Ah, sorry, missed that. So I guess right now range actually does matter and will filter NaNs (and probably Infs, etc.)? Seems not to matter in older versions. It might be good to double check how this behaviour has changed over the last few versions.

@johnarban
Copy link

I am just seconding the issue @hopehhchen raised. I am using version 1.11.0
A minimal example is

bins = np.asarray([0,2,4,6,8,10])
values = np.asarray([1,2,4,6,4,6,7,8,5,2,2,4,6,np.nan,7,5,4,22,3,57,8,9,6,4,3,2,1,46,8,9])
# broken version. Worked in previous stable release in conda
h, bins = np.histogram(values, bins=bins)
# now this is required. 
h, bins = np.histogram(values, bins=bins, range=(bins.min(),bins.max()))

@tritemio
Copy link
tritemio commented May 20, 2017

This error affects matplotlib plt.hist (see matplotlib/matplotlib#6483). If the input data contains NaNs, plt.hist will raise an error even if the full bins array is specified. Adding range=(bins.min(),bins.max()) works but is extremely verbose.

@eric-wieser
Copy link
Member
eric-wieser commented Dec 10, 2017

If the input data contains NaNs, plt.hist will raise an error even if the full bins array is specified.

I've a patch in the works for Numpy 1.15.0 that happens to fix this. Should have a PR to link to in a few weeks, once #10186 is dealt with

@eric-wieser
Copy link
Member

Note that we could still consider using nanmin and nanmax to compute the range when no bin edges are given

@michaelklachko
Copy link
michaelklachko commented Feb 27, 2018

I just got this error even though there are no NaN values in my array:

print np.argwhere(np.isnan(array))
[]

array_hist, _ = np.histogram(array, range=(0.0, 1.0))
print array_hist
[ 8461  4600  4306  4344  3827  3636  3734  3607  4399 12718] 

print array    //all values appear to be normal (too large to show here)

plt.hist(array, bins='sqrt')

/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce
  return umr_minimum(a, axis, None, out, keepdims)
/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce
  return umr_maximum(a, axis, None, out, keepdims)
Traceback (most recent call last):
  File "/home/michael/main.py", line 212, in <module>
    class.train()
  File "/home/michael/code/__init__.py", line 339, in train
    plt.hist(array, bins='sqrt')
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/pyplot.py", line 3081, in hist
    stacked=stacked, data=data, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/__init__.py", line 1898, in inner
    return func(ax, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/matplotlib/axes/_axes.py", line 6195, in hist
    m, bins = np.histogram(x[i], bins, weights=w[i], **hist_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 670, in histogram
    'range parameter must be finite.')
ValueError: range parameter must be finite.

Process finished with exit code 1

I'm using Numpy version 1.14, matplotlib 2.0.2

@eric-wieser
Copy link
Member

Your backtrace doesn't match the code you're running:

plt.hist(array, bins='sqrt')

vs

plt.hist(train_acc_per_sample, bins='sqrt', range=(0.0, 1.0))

@michaelklachko
Copy link

Oh yes, I added range argument after the fact, to see if it helps. Original error happened without range argument in plt.hist call. Let me edit the post.

@eric-wieser
Copy link
Member

Did it help? It should have.

@michaelklachko
Copy link

Yes, it helped. This time there were nan values in the array, and I got the following warnings:

/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:780: RuntimeWarning: invalid value encountered in greater_equal
  keep = (tmp_a >= first_edge)
/usr/local/lib/python2.7/dist-packages/numpy/lib/function_base.py:781: RuntimeWarning: invalid value encountered in less_equal
  keep &= (tmp_a <= last_edge)

but the code execution continued after that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
0