IsotonicRegression gives NANs on normal data #2507

Felix-neko · 2013-10-09T22:03:04Z

I have a problem with IsotonicRegression: it gives some NANs in case of fitting some data with values close to zero (but greater than sys.float_info.min).

I pickled some breaking data and uploaded them to SendSpace:

http://www.sendspace.com/file/0i18ib

And below is the crashing example.

# -*- coding: utf-8 -*-

import pickle

[xArray, yArray, weightArray, pPredicted] = pickle.load(open("bugreport.dmp", 'r'))

#xArray and yArray are the raw data I want to fit, weightArray are the sample weights. There are no NANs among them.

import sklearn
print sklearn.__version__ #SKLEARN v. 0.14.1

import sklearn.isotonic

regression = sklearn.isotonic.IsotonicRegression()

regression.fit(xArray, yArray, sample_weight=weightArray)
print regression.predict(xArray) # Oh no! It gives some NANs!

The text was updated successfully, but these errors were encountered:

Felix-neko · 2013-10-11T09:29:06Z

P.S. Found on Python 2.7 (64-bit version) / Win7.

ogrisel · 2013-10-11T10:33:23Z

Unfortunately I cannot unpickle the data on my box with numpy 1.7.1:

Traceback (most recent call last):
  File "<ipython-input-8-0d0d340ac566>", line 1, in <module>
    [xArray, yArray, weightArray, pPredicted] = pickle.load(open("bugreport.dmp", 'r'))
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named multiarray

Maybe you can save the data with a version independent format, for instance with numpy.savez or numpy.savez_compressed: http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html

Felix-neko · 2013-10-11T10:56:50Z

Here is an NPZ file: http://www.sendspace.com/file/hx06eh

The code changed to:

# -*- coding: utf-8 -*-

import pickle
import numpy as np
loadedData = np.load("bugreport.npz")

[xArray, yArray, weightArray, pPredicted] = loadedData["a"]

#xArray and yArray are the raw data I want to fit, weightArray are the sample weights. There are no NANs among them.

import sklearn
print sklearn.__version__ #SKLEARN v. 0.14.1

import sklearn.isotonic

regression = sklearn.isotonic.IsotonicRegression()

regression.fit(xArray, yArray, sample_weight=weightArray)
print regression.predict(xArray) # Oh no! It gives some NANs!

ogrisel · 2013-10-11T11:13:30Z

Indeed I can reproduce the failures on a 64 bits OSX 10.8 box. So this is neither windows nor 32 bits specific. Maybe @agramfort or @NelleV might want to have a look.

Felix-neko · 2013-10-11T11:19:54Z

I used 64-bit version of Windows 7, if it's important.

NelleV · 2013-10-17T07:39:34Z

As mention by mail, this is due to the fact that there are zeros in the weights.
We can easily fix the problem in our code by removing X and y with 0 weights, but in this case, there would still be a problem, as the 0 are in the bottom range and top range of X, we wouldn't be able to predict for those either (and the code would return nan instead of raising an error).
Another possible fix would be to add some epsilon to the weights, in order not to divide by 0.
The last fix would be to remove the 0 weighted Xs, and then predict by the nearest value, but that seems a hack to me.

ogrisel · 2013-10-17T08:52:39Z

Wouldn't it make more sense to just drop samples with 0 weights prior to fitting?

ogrisel · 2013-10-17T08:54:21Z

Otherwise, if zero weight samples are meaningless we can always throw a ValueError exception stating explicitly that zero is an invalid value for a weight.

NelleV · 2013-10-17T08:58:49Z

It would totally make sense, but we only are able to predict the regression for new values between X.min() and X.max(). Hence, in this case, the function still would return Nan where it does right now (because we set the interp1d scipy function returns Nan when it is outside of the range of the fit, and not raise an exception).

GaelVaroquaux · 2013-10-17T09:46:06Z

Wouldn't it make more sense to just drop samples with 0 weights prior to fitting?

And warn, in such situation.

Felix-neko · 2013-10-19T19:07:02Z

I think, that it will be really useful to add the extrapolation:
if x values for prediction are below the min(xTraining), the predict(x) should not break, but give us the value from left borders of range of definition.

Felix-neko · 2013-10-19T19:11:55Z

About dropping x values with zero values - I think, that this shound be done on the side of the library. For us, the users, We often forget to check our values for zero values. Or have to write some wrapping code around IsotonicRegression class to do it automatically. But it's much better, when it's done just-from-the-box.

amueller · 2015-01-29T15:54:33Z

Unfortunately the data is not available any more.
I tried reproducing with something like this:

import numpy as np
import sklearn.isotonic

regression = sklearn.isotonic.IsotonicRegression()
n_samples = 60

x = np.linspace(-3, 3, n_samples)
y = x + np.random.uniform(size=n_samples)
w = np.random.uniform(size=n_samples)
w[5:8] = 0

regression.fit(x, y, sample_weight=w)
plt.plot(x, y)
plt.plot(x, regression.predict(x))
print regression.predict(x)

I'm not sure if this was the idea.
This actually just runs and never finishes as far as I can tell.

amueller · 2015-01-29T16:03:34Z

The out of bounds issue seems to have been tackled here: ~~#3147~~ #3199

mjbommar · 2015-01-29T16:29:11Z

Just today had issues with a client on 0.16-git where fit, transform gives
very different results than fit_transform. I think some regressions or
issues may have been introduced in the recent weighting.

Confirmed regression testing between 0.15.2 and HEAD, but won't have time
to prepare clean test case without client data until later this week.
Maybe someone can look?

Thanks,
Michael J. Bommarito II, CEO
Bommarito Consulting, LLC
Web: http://www.bommaritollc.com
Mobile: +1 (646) 450-3387

On Thu, Jan 29, 2015 at 11:03 AM, Andreas Mueller notifications@github.com
wrote:

The out of bounds issue seems to have been tackled here: #3147
#3147.

—
Reply to this email directly or view it on GitHub
#2507 (comment)
.

amueller · 2015-01-29T16:37:20Z

I am just looking at the code. I don't see how this could happen.
I have

import numpy as np
import sklearn.isotonic
from sklearn.utils import shuffle


regression = sklearn.isotonic.IsotonicRegression()
n_samples = 60

x = np.linspace(-3, 3, n_samples)
y = x + np.random.uniform(size=n_samples)
w = 
8000
np.random.uniform(size=n_samples)
#w[5:8] = 0
x, y, w = shuffle(x, y, w)

regression.fit(x, y, sample_weight=w)
fit_then_transform = regression.fit(x, y, sample_weight=w).transform(x)
fit_transform = regression.fit_transform(x, y, sample_weight=w)
print(np.abs(fit_transform - fit_then_transform).max())

4.4408920985e-16

amueller · 2015-01-29T16:41:31Z

Just masking out zero-weight points doesn't work with the current implementation of fit_transform. I think we need to handle them separately in the cython code. Or remove them and add them back in, but that seems more hassle.

mjbommar · 2015-01-29T22:53:41Z

Not sure how many of these are related, but here's an example with clean data that shows that fit and fit_transform return different results for data with ties in X. This was generated with HEAD as about two hours ago.
http://nbviewer.ipython.org/urls/gist.githubusercontent.com/mjbommar/74fcefdcd0f2b1a5f708/raw/4742691db799101091598922cd0808f1eb5f07f2/isotonic_test_case_20150129.json

I have another 0.16-git install which shows site-packages/sklearn date of Dec 4. I compared md5sum's and confirmed it's at commit 81613d2.
https://github.com/scikit-learn/scikit-learn/tree/81613d2d2fa07bc310fb7d4abb4231ce78772fad/sklearn/isotonic.py

I patched this install with isotonic.py from commits d255866 and a9ea55f; the former has norm of 0 with ties case, and the latter reproduces the non-zero norm and error case. In other words, I think we can ping @ragv about a9ea55f as the culprit.

mjbommar · 2015-01-29T23:19:05Z

BTW, I responded to this originally by email, not realizing how old the issue was...should we create a new issue for this regression, @amueller ?

amueller · 2015-01-29T23:49:43Z

Yes please create a new issues as I think it is unrelated.

amueller · 2015-01-29T23:50:18Z

Damn a9ea55f was my idea... lets see what it broke...

amueller · 2015-03-23T15:56:20Z

Fixed by #4297.

amueller added the Bug label Jan 23, 2015

mjbommar mentioned this issue Jan 30, 2015

IsotonicRegression results differ between fit/transform and fit_transform with ties in X #4184

Closed

This was referenced Feb 1, 2015

Adding unit test to cover ties/duplicate x values in Isotonic Regression... #4185

Closed

[MRG+1] Isotonic calibration #1176

Closed

amueller added this to the 0.16 milestone Feb 26, 2015

ogrisel mentioned this issue Feb 26, 2015

Infinite loop when running isotonic regression with some zero-valued weights #4297

Closed

amueller closed this as completed Mar 23, 2015

tiagozortea mentioned this issue Feb 2, 2016

[MRG+1] Much faster prediction with isotonic regression #6206

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

IsotonicRegression gives NANs on normal data #2507

IsotonicRegression gives NANs on normal data #2507

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IsotonicRegression gives NANs on normal data #2507

IsotonicRegression gives NANs on normal data #2507

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!