8000 Normalize only applies if fit_intercept=True · Issue #3020 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Normalize only applies if fit_intercept=True #3020

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Mar 30, 2014 · 27 comments
Closed

Normalize only applies if fit_intercept=True #3020

jnothman opened this issue Mar 30, 2014 · 27 comments
Labels
API Bug Moderate Anything that requires some knowledge of conventions and best practices
Milestone

Comments

@jnothman
Copy link
Member

A number of linear models have a normalize option to scale all features to equal variance.

However, this only has any effect if fit_intercept=True. This interaction does not appear to be documented.

So this issue requests that one of the following actions be taken:

  • the fit_intercept=True requirement should be noted in the comment for normalize; or
  • normalize should apply when fit_intercept=False, assuming the data is already centered
@agramfort
Copy link
Member

I agree. It's a bit awkward...

I think normalize=True should also work with fit_intercept=False

I am not sure how to fix this without breaking people's code...

@amueller
Copy link
Member

Maybe just document the current behavior?

@agramfort
Copy link
Member

one option would be to rename normalize to standardize or scale and
deprecate normalize and then unify all estimators.

@amueller
Copy link
Member

standardize actually sounds good :)

@agramfort
Copy link
Member

looks like it's going to be a fun deprecation cycle...

@jayflo
Copy link
jayflo commented Aug 6, 2015

I think I can handle this one, but has a final decision been made on this?

@amueller
Copy link
Member
amueller commented Aug 6, 2015

I think renaming normalize to standardize is the way to go (using a deprecation) and then making it normalize no matter what the value of fit_intercept is.
If we do this renaming, we need to make sure that afterwards we have consistent behavior in all linear models.

@jayflo
Copy link
jayflo commented Aug 6, 2015

Got it. Also, when making a fix that requires a deprecation, do you:

  • submit a pull request having just the deprecation warning (stating the change to be enacted in two releases)
  • get that pr merged (and afterward...)
  • make the actual changes on the same branch which will be merged in the appropriate release

Thanks again.

@amueller
Copy link
Member
amueller commented Aug 6, 2015

no, deprecation and fix should be in the same PR.

@jayflo
Copy link
jayflo commented Aug 11, 2015

Do methods which are public to a module but not to the containing package need to support the old name (normalize) for the next two releases? E.g. the method linear_model.base.sparse_center_data accepts normalize as an argument, but this can be changed to standardize while still supporting the old functionality by preprocessing these arguments in the caller.

Of course this could break any code (not in sklearn) which uses this method directly, but it is also not public to the package.

@jayflo
Copy link
jayflo commented Aug 11, 2015

The issue is that this is not just a renaming. Since the behavior of standardize is slightly different, we will need all public classes/methods which accept normalize to also accept standardize. Once normalize is officially deprecated, it can be removed from the code. Until then, we will only be able to remove normalize from internal code by preprocessing the normalize and standardize arguments, i.e. standardize = standardize or (fit_intercept and normalize), in those public classes/methods.

@amueller
Copy link
Member

yeah, it is a bit tricky. So you say it is not worth trying to remove the normalize from internal functions?
Functions should be either public or explicitly made private by a leading underscore. That sparse_center_data is public is not great.

@jayflo
Copy link
jayflo commented Aug 13, 2015

So you say it is not worth trying to remove the normalize from internal functions?

It's looking like it is possible (so far) to remove normalize from internal functions, but it would need remain in public classes/methods along size standardize for compatibility. Once we deprecate normalize, we can then do the renaming and remove any other logic used for compatibility.

Functions should be either public or explicitly made private by a leading underscore. That
sparse_center_data is public is not great.

My question here is whether sparse_center_data is even considered public. It is not public "to the public," i.e. not imported to linear_model/__init__.py, but it is "public" inside the (linear_model) package itself (and is in fact used in linear_model/coordinate_descent.py). My inclination is to remove normalize as much as possible, i.e. from every class/method not available "to the public" (which would include sparse_center_data). What do you think?

@amueller
Copy link
Member

I think you we should rename sparse_center_data to _sparse_center_data and deprecate the old one and remove normalize from the new private one.
Not sure it is worth the 💃 though.
It should be considered public if it didn't start with an underscore.

@amueller amueller modified the milestone: 0.19 Sep 29, 2016
@jnothman jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017
@glemaitre glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018
@jnothman jnothman added help wanted Moderate Anything that requires some knowledge of conventions and best practices labels Apr 11, 2019
@jnothman jnothman modified the milestones: 0.21, 0.22 Apr 11, 2019
@jnothman
Copy link
Member Author

This has been hanging around a long time and may deserve a fresh look

@rth
Copy link
Member
rth commented Jun 7, 2019

Another solution could be to deprecate normalize in linear models altogether and suggest using a StandardScaler or some other appropriate scaler in a Pipeline. The advantage is that,

  • it makes switching models more natural (no need to think about normalization by re-using a common pre-processing step)
  • makes users think about the right scaler for their data
  • lowers the cognitive load when reading linear model parameters

the issue is that there is a constant factor difference between scaling by std and scaling by the L2 norm so that would be not strictly equivalent. Suggesting,

make_pipeline(
   StandardScaler(with_std=False),
   Normalizer()
)

to get strictly identical results could work but is not very elegant..

In terms of API consistency, it's not clear why linear models have a normalize parameters while most work somewhat without it. At the same time, say SVM don't have it, while they cannot be used without feature scaling.

Also even in linear models, it's not consistent. Out of 32 linear estimators, 22 have the normalize parameter, while 10 don't,

  • HuberRegressor
  • LogisticRegression
  • LogisticRegressionCV
  • PassiveAggressiveClassifier
  • PassiveAggressiveClassifier
  • Perceptron
  • RANSACRegressor
  • SGDClassifier (ref Add scaling to SGDClassifier #5248)
  • SGDRegressor
  • TheilSenRegressor

and of those that do have it, most have a default at normalize=False while the following have a default at normalize=True,

  • OrthogonalMatchingPursuit
  • Lars
  • LassoLars
  • LarsCV
  • LassoLarsCV

@agramfort
Copy link
Member
agramfort commented Jun 10, 2019 via email

@adrinjalali
Copy link
Member

Removing from the milestone for now. This needs a fair amount of love to be solved.

@agramfort
Copy link
Member

ok so after a long discussion with @maikia and @glemaitre here is our summary.

Initial problem

we want dense and sparse cases to match for linear models (eg Lasso) in the case were we standardize the features.

Proposal from @rth

Use a Pipeline(StandardScaler, Lasso).
Problem I saw is that StandardScaler does not behave the same way for sparse and dense (no centering for sparse) hence Pipeline(StandardScaler, Lasso) would break the contract of having same results for sparse and dense data. But there is hope if we still can fit_intercept with Lasso.

Explanation:

y = (x - x_mean).dot(coef) + intercept

is equivalent to:

y = x.dot(coef) + intercept - x_mean.dot(coef)

for it means that for Pipeline(StandardScaler, Lasso) if X is sparse we could have
the same lasso.coef_ but the lasso.intercept_ becomes lasso.intercept_ - x_mean.dot(coef). In other words the models Pipeline(StandardScaler, Lasso) are the same in terms of behavior for sparse and dense but lasso.intercept_ will differ. This should be made a unit test.

Summary

let's remove normalize parameter from the API, suggesting to use a pipeline and let's just make sure we can fit intercepts properly without regularizing them (like liblinear still does :( )

sounds like a plan @maikia @glemaitre @rth @jnothman ?

@maikia
Copy link
Contributor
maikia commented Jun 23, 2020

thanks @agramfort for the explanation. I will work on this

@maikia
Copy link
Contributor
maikia commented Jun 26, 2020

As this is a quite large PR, after discussing with @glemaitre we decided it is better to subdivide it to smaller PRs.
Here is how I suggest to subdivide it:

  1. ElasticNet
  2. Lasso
  3. LassoCV
  4. ElasticNetCV
  5. MultiTaskElasticNet
  6. MultiTaskLasso
  7. MultiTaskElasticNetCV
  8. MultiTaskLassoCV
  1. LinearRegression
  1. BayesianRidge
  2. ARDRegression
  1. Lars
  2. LassoLars
  3. LarsCV
  4. LassoLarsCV
  5. LassoLarsIC
  1. OrthogonalMatchingPursuit
  2. OrthogonalMatchingPursuitCV
  1. Ridge
  2. RidgeClassifier
  3. RidgeCV
  4. RidgeClassifierCV
  • G. documentation

23 Rewrite an example based on #2601 (comment) to show the impact of standardization with StandardScaler on the optimal alpha for Ridge and Lasso dependent on normalization.

@rth
Copy link
Member
rth commented Aug 17, 2020

The fact that Pipeline currently doesn't support sample_weight fit param directly (#18159) is somewhat of an issue for suggesting to use make_pipeline(StandardScaler(), Lasso()) etc IMO.

@agramfort
Copy link
Member
agramfort commented Aug 18, 2020 via email

@lorentzenchr
Copy link
Member

In #3020 (comment), the last point is G. documentation. What exactly is expected in order to close this issue (and #2601)?

@agramfort
Copy link
Member
agramfort commented Jun 27, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Bug Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants
0