-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
"normalize" parameter in sklearn.linear_model should be "standardize" #16445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
agree, current |
Agreed. Pull request welcome deprecating the old name. See our developer
docs for how to do this.
|
Another solution is to deprecate an remove this parameter in favor of using a StandardScaler in a Pipeline, see #3020 (comment) since it may not be applied consistently at the moment as well. |
@rth Not sure it is as straightforward. It might be less efficient (if a user does know some internal) because there is some in-place operation in the current behaviour and in case of sparse input, we don't remove the mean. When making a pipeline, one will have to set I also find the documentation misleading when introducing the Would we consider that this And final question: do we have any linear model (regressor and classifier) which would not benefit from standardizing the data? If actually all models, in the general use-case would benefit of such preprocessing and that some model are already having it, we could introduce it inside the base class to be shared across all of them. I think that there is some case where we should not scale (as in MNIST with logistic regression) and thus we should have the option to do so. |
Oh I forgot to ping @agramfort |
I think in most cases making one extra copy of the data will not matter too much calculation time wise as it's negligible as compared to the optimizer run time. For the rare case where it is a problem (e.g. due to memory constrains), the user can specify For the sparse input, yes I also find the current default annoying as they are unusable with sparse. Maybe we should make
For me a precondition should be transparent to the user, as is the case in liblinear or as proposed in that PR if I remember correctly, i.e. not change the computed coefficients. Generally I outlined the motivation for this deprecation in #3020 (comment) the main being that linear models are currently inconsistent, and I'm not sure that adding this parameter to models that don't have it (linear or otherwise), or ensuring that it is consistently applied in combination with other parameters (e.g. |
I suggested in the past to rename normalize to standardize I am also no worried about the memory copy that pipeline with StandardScaler would do as we have a copy in linear models in this case too. The problem with sparse input where StandardScaler does not center sparse data is for me the issue that is not easy to fix. |
Describe the issue linked to the documentation
In different sklearn.linear_model classes such as ridge and ridgeCV, the
normalize
parameter means actually standardize. This misnomer can cause lots of unnecessary confusion.What
normalize
means in general is to make the length of vector norm 1. This is clearly not ridge regression or lasso or other regularized linear model does.Suggest a potential alternative/fix
rename the parameter as
standardize
instead.Please see the discussion here:
https://stackoverflow.com/questions/60216879/what-does-sklearn-linear-model-ridgecv-normalize-parameter-exactly-do/60233425#60233425
The text was updated successfully, but these errors were encountered: