Description
As noted by @amueller in issue #3440 "Input validation refactoring", fitting on 1D input is currently not consistent across estimators.
Depending on the model the following case can either be treated as x is array of 100 samples and 1 feature (like MinMaxScaler
does for instance) or as 1 sample with 100 features as most models currently do even if almost always counter-intuitive to do so.
x = np.linspace(0, 1, 100)
estimator.fit(x, y)
Most models use check_X_y(X, y)
or check_array(X)
and leave the ensure_2d=True
kwarg to its default value. The current behavior of ensure_2d
is to cast 1d array as row-vectors (1 sample with len(X) features). This was done so for backward compatibility reasons.
I think this behavior is counter-intuitive and we should break backward compat for this edge-case to always treat 1d arrays as multi-sample collections of a single features rather than the opposite.
I mark this issue as an API discussion. It should be tackled before version 1.0.