8000 Ensure that fitting on 1D input data is consistent across estimators · Issue #4252 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
8000

Ensure that fitting on 1D input data is consistent across estimators #4252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Feb 16, 2015 · 4 comments
Closed

Ensure that fitting on 1D input data is consistent across estimators #4252

ogrisel opened this issue Feb 16, 2015 · 4 comments
Milestone

Comments

@ogrisel
Copy link
Member
ogrisel commented Feb 16, 2015

As noted by @amueller in issue #3440 "Input validation refactoring", fitting on 1D input is currently not consistent across estimators.

Depending on the model the following case can either be treated as x is array of 100 samples and 1 feature (like MinMaxScaler does for instance) or as 1 sample with 100 features as most models currently do even if almost always counter-intuitive to do so.

x = np.linspace(0, 1, 100)
estimator.fit(x, y)

Most models use check_X_y(X, y) or check_array(X) and leave the ensure_2d=True kwarg to its default value. The current behavior of ensure_2d is to cast 1d array as row-vectors (1 sample with len(X) features). This was done so for backward compatibility reasons.

I think this behavior is counter-intuitive and we should break backward compat for this edge-case to always treat 1d arrays as multi-sample collections of a single features rather than the opposite.

I mark this issue as an API discussion. It should be tackled before version 1.0.

@ogrisel ogrisel added this to the 1.0 milestone Feb 16, 2015
@ogrisel ogrisel changed the title Ensure that fitting on 1D input data is consistent accross estimators Ensure that fitting on 1D input data is consistent across estimators Feb 16, 2015
@amueller
Copy link
Member

I would think it is consistent now, b 8000 ut indeed there is no test.
I agree that the current behavior is somewhat counter-intiuitive, in particular since the first shape of X is always n_samples.
We should add a test to see how inconsistent it is currently.

@amueller
Copy link
Member

I now claim the opposite of past-me, see #4511 and the discussion on the ml.

@amueller
Copy link
Member
amueller commented Sep 9, 2015

Fixed via #5152.
For the record, this now refuses 1d X in all estimators.
We don't have common tests for all functions, so functions might still do weird things, see #4512.

Should we close this, as it is fixed for classes? Having common tests for functions is a bit non-trivial.

@ogrisel
Copy link
Member Author
ogrisel commented Sep 10, 2015

It's fixed for estimators which was the subject of this issue. Let's close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0