Ensure that fitting on 1D input data is consistent across estimators

@amueller

As noted by @amueller in issue #3440 "Input validation refactoring", fitting on 1D input is currently not consistent across estimators.

Depending on the model the following case can either be treated as x is array of 100 samples and 1 feature (like MinMaxScaler does for instance) or as 1 sample with 100 features as most models currently do even if almost always counter-intuitive to do so.

x = np.linspace(0, 1, 100)
estimator.fit(x, y)

Most models use check_X_y(X, y) or check_array(X) and leave the ensure_2d=True kwarg to its default value. The current behavior of ensure_2d is to cast 1d array as row-vectors (1 sample with len(X) features). This was done so for backward compatibility reasons.

I think this behavior is counter-intuitive and we should break backward compat for this edge-case to always treat 1d arrays as multi-sample collections of a single features rather than the opposite.

I mark this issue as an API discussion. It should be tackled before version 1.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions