-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
IsotonicRegression gives NANs on normal data #2507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
P.S. Found on Python 2.7 (64-bit version) / Win7. |
Unfortunately I cannot unpickle the data on my box with numpy 1.7.1:
Maybe you can save the data with a version independent format, for instance with |
Here is an NPZ file: http://www.sendspace.com/file/hx06eh The code changed to:
|
Indeed I can reproduce the failures on a 64 bits OSX 10.8 box. So this is neither windows nor 32 bits specific. Maybe @agramfort or @NelleV might want to have a look. |
I used 64-bit version of Windows 7, if it's important. |
As mention by mail, this is due to the fact that there are zeros in the weights. |
Wouldn't it make more sense to just drop samples with 0 weights prior to fitting? |
Otherwise, if zero weight samples are meaningless we can always throw a |
It would totally make sense, but we only are able to predict the regression for new values between X.min() and X.max(). Hence, in this case, the function still would return Nan where it does right now (because we set the interp1d scipy function returns Nan when it is outside of the range of the fit, and not raise an exception). |
And warn, in such situation. |
I think, that it will be really useful to add the extrapolation: |
About dropping x values with zero values - I think, that this shound be done on the side of the library. For us, the users, We often forget to check our values for zero values. Or have to write some wrapping code around IsotonicRegression class to do it automatically. But it's much better, when it's done just-from-the-box. |
Unfortunately the data is not available any more. import numpy as np
import sklearn.isotonic
regression = sklearn.isotonic.IsotonicRegression()
n_samples = 60
x = np.linspace(-3, 3, n_samples)
y = x + np.random.uniform(size=n_samples)
w = np.random.uniform(size=n_samples)
w[5:8] = 0
regression.fit(x, y, sample_weight=w)
plt.plot(x, y)
plt.plot(x, regression.predict(x))
print regression.predict(x) I'm not sure if this was the idea. |
Just today had issues with a client on 0.16-git where fit, transform gives Confirmed regression testing between 0.15.2 and HEAD, but won't have time Thanks, On Thu, Jan 29, 2015 at 11:03 AM, Andreas Mueller notifications@github.com
|
I am just looking at the code. I don't see how this could happen. import numpy as np
import sklearn.isotonic
from sklearn.utils import shuffle
regression = sklearn.isotonic.IsotonicRegression()
n_samples = 60
x = np.linspace(-3, 3, n_samples)
y = x + np.random.uniform(size=n_samples)
w =
8000
np.random.uniform(size=n_samples)
#w[5:8] = 0
x, y, w = shuffle(x, y, w)
regression.fit(x, y, sample_weight=w)
fit_then_transform = regression.fit(x, y, sample_weight=w).transform(x)
fit_transform = regression.fit_transform(x, y, sample_weight=w)
print(np.abs(fit_transform - fit_then_transform).max())
|
Just masking out zero-weight points doesn't work with the current implementation of |
Not sure how many of these are related, but here's an example with clean data that shows that fit and fit_transform return different results for data with ties in X. This was generated with HEAD as about two hours ago. I have another 0.16-git install which shows site-packages/sklearn date of Dec 4. I compared md5sum's and confirmed it's at commit 81613d2. I patched this install with isotonic.py from commits d255866 and a9ea55f; the former has norm of 0 with ties case, and the latter reproduces the non-zero norm and error case. In other words, I think we can ping @ragv about a9ea55f as the culprit. |
BTW, I responded to this originally by email, not realizing how old the issue was...should we create a new issue for this regression, @amueller ? |
Yes please create a new issues as I think it is unrelated. |
Damn a9ea55f was my idea... lets see what it broke... |
Fixed by #4297. |
I have a problem with IsotonicRegression: it gives some NANs in case of fitting some data with values close to zero (but greater than sys.float_info.min).
I pickled some breaking data and uploaded them to SendSpace:
http://www.sendspace.com/file/0i18ib
And below is the crashing example.
The text was updated successfully, but these errors were encountered: