You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As noted by @amueller in issue #3440 "Input validation refactoring", fitting on 1D input is currently not consistent across estimators.
Depending on the model the following case can either be treated as x is array of 100 samples and 1 feature (like MinMaxScaler does for instance) or as 1 sample with 100 features as most models currently do even if almost always counter-intuitive to do so.
x=np.linspace(0, 1, 100)
estimator.fit(x, y)
Most models use check_X_y(X, y) or check_array(X) and leave the ensure_2d=True kwarg to its default value. The current behavior of ensure_2d is to cast 1d array as row-vectors (1 sample with len(X) features). This was done so for backward compatibility reasons.
I think this behavior is counter-intuitive and we should break backward compat for this edge-case to always treat 1d arrays as multi-sample collections of a single features rather than the opposite.
I mark this issue as an API discussion. It should be tackled before version 1.0.
The text was updated successfully, but these errors were encountered:
ogrisel
changed the title
Ensure that fitting on 1D input data is consistent accross estimators
Ensure that fitting on 1D input data is consistent across estimators
Feb 16, 2015
I would think it is consistent now, b
8000
ut indeed there is no test.
I agree that the current behavior is somewhat counter-intiuitive, in particular since the first shape of X is always n_samples.
We should add a test to see how inconsistent it is currently.
Fixed via #5152.
For the record, this now refuses 1d X in all estimators.
We don't have common tests for all functions, so functions might still do weird things, see #4512.
Should we close this, as it is fixed for classes? Having common tests for functions is a bit non-trivial.
As noted by @amueller in issue #3440 "Input validation refactoring", fitting on 1D input is currently not consistent across estimators.
Depending on the model the following case can either be treated as x is array of 100 samples and 1 feature (like
MinMaxScaler
does for instance) or as 1 sample with 100 features as most models currently do even if almost always counter-intuitive to do so.Most models use
check_X_y(X, y)
orcheck_array(X)
and leave theensure_2d=True
kwarg to its default value. The current behavior ofensure_2d
is to cast 1d array as row-vectors (1 sample with len(X) features). This was done so for backward compatibility reasons.I think this behavior is counter-intuitive and we should break backward compat for this edge-case to always treat 1d arrays as multi-sample collections of a single features rather than the opposite.
I mark this issue as an API discussion. It should be tackled before version 1.0.
The text was updated successfully, but these errors were encountered: