8000 Vague Error Message for Linear Regression when X is 1D · Issue #4466 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Vague Error Message for Linear Regression when X is 1D #4466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ahwillia opened this issue Mar 30, 2015 · 8 comments
Closed

Vague Error Message for Linear Regression when X is 1D #4466

ahwillia opened this issue Mar 30, 2015 · 8 comments

Comments

@ahwillia
Copy link

I was recently trying to do just a very simple linear regression on x vs. y -- I got this error message:

IndexError: tuple index out of range

I wasted 3 minutes trying to figure out if my x and y vectors didn't have the same lengths (they did), whether the problem was that they were lists, not numpy arrays (they were). Then I found the culprit, through stackoverflow:

http://stackoverflow.com/questions/27107057/sklearn-linear-regression-python

The trick is to replace:

regr.fit(x,y)

With:

regr.fit(x[:,np.newaxis],y)

I know this is only a small inconvenience for me, but I think this needs to be addressed for the sake of usability. This provides an unnecessarily difficult entry barrier to new users.

A related thought/comment: Do the inputs have to be numpy arrays? Why not try to salvage the input if the user passes lists?

@amueller
Copy link
Member

We to try our best to salvage what the user gave us. I'd agree we failed here. We want consistent treatment of X of 1d shape, and we don't have that yet.

@ahwillia
Copy link
Author

Someone pointed out to me that fit() should be calling numpy.atleast_2d(), specifically on this line:

array = np.atleast_2d(array)

So this is either a numpy problem (seems unlikely to me) or the check_X_y function is handling the inputs in an unexpected way.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Mar 30, 2015 via email

@ahwillia
Copy link
Author

Why not cast to 2D and print a warning? The x[:,np.newaxis] fix seems cumbersome.

Also, it seems like the intention of check_X_y was to cast the shape to 2D... Has the thinking changed?

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Mar 30, 2015 via email

@GrantRVD
Copy link

From looking at the code for check_array(), it looks like the intention is, by default, to cast the input. If check_X_y() shouldn't do this, then it should probably call check_array() with the ensure_2d flag as False or perform additional validation after np.atleast_2d() returns, raising a useful error if shape[1] is still Null. I'm not sure why NumPy considers a matrix with a null dimension is considered any more 2D than a 1D list or ndarray, mathematically speaking.

@amueller
Copy link
Member

Actually check_X_y does cast to 2d, as this was the default behavior before. It does so in the same odd way that atleast2d did before, which is X[np.newaxis, :].
@GaelVaroquaux we have been accepting it inconsistently for a long time, and we can't really go back to not accepting it anywhere. We should accept it as number of samples everywhere, which would be the transpose of what is currently happening.
I didn't change that when introducing check_X_y because it was backward-incompatible and I had enough common tests to add / fix.

@amueller
Copy link
Member
amueller commented Sep 9, 2015

Fixed by #5152.

@amueller amueller closed this as completed Sep 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0