8000 [MRG] separate penalty factors for GLM regressions by xiaowei1234 · Pull Request #22485 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] separate penalty factors for GLM regressions #22485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

xiaowei1234
Copy link
@xiaowei1234 xiaowei1234 commented Feb 15, 2022

Task list

  • Finished unit tests and added unit test coverage
  • Checked for PEP8 violations and ran black for code style
  • double check functionality using statsmodels

Reference Issues/PRs

#11566

What does this implement/fix? Explain your changes.

  1. Added check for full rank design matrix when alpha = 0.0 (reverted due to comment from lorentzenchr)
  2. Allows alpha to either be an iterable of non negative values in addition to being a scalar value
  3. If alpha is an iterable then run checks on every value of alpha and also check length of alpha is equal to 2nd dimension of design matrix

Proof of correctness in script here:
https://github.com/MPCS-51042/final-project-xiaowei1234/blob/3b52d7ee67f2360aeace81b9479161b05c1c0ecb/proof.py

Link to output of proof script:
https://github.com/MPCS-51042/final-project-xiaowei1234/blob/3b52d7ee67f2360aeace81b9479161b05c1c0ecb/files/proof_output.log

@xiaowei1234 xiaowei1234 changed the title [WIP] separate penalty factors for GLM regressions [MRG] separate penalty factors for GLM regressions Feb 21, 2022
Copy link
Member
@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I don't see any actual update to the code logic I am not sure what you add here...

min_val=0.0,
include_boundaries="left",
)
self.alpha = np.asarray(self.alpha, dtype=np.float_).ravel()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than np.float_ I would use X.dtype

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change to np.float64, I don't like the idea of using X.dtype in the off chance that X.dtype is integer dtype. Even if X is all integers it would be very wrong for the penalty terms to be integers

@@ -268,7 +264,29 @@ def fit(self, X, y, sample_weight=None):
y_numeric=True,
multi_output=False,
)

if hasattr(self.alpha, "__iter__") and not isinstance(self.alpha, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see how testing for array-like is done elsewhere. Looking at private attributes for this is not clean

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the checks I saw would first convert the variable to a numpy array and then check the size, problem here is if the regression only has one variable then the code would get clunky when checking

np.asarray(self.alpha).size == 1

so instead I used Iterable class from collections. It does result in additional library being imported but it is a core Python class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internally we use:

def _is_arraylike_not_scalar(array):
"""Return True if array is array-like and not a scalar"""
return _is_arraylike(array) and not np.isscalar(array)

(The "not np.isscalar" part handles the string case)

@xiaowei1234
Copy link
Author

as I don't see any actual update to the code logic I am not sure what you add here...

the logic for using an array instead of a scalar already works in the code! All that was needed was to allow for the class to take an iterable in addition to a scalar

@glemaitre
Copy link
Member

Could you merge main into your branch to only have the changes related to GLM. There are plenty of changes in CI files that are not linked with this PR.

@xiaowei1234
Copy link
Author

Could you merge main into your branch to only have the changes related to GLM. There are plenty of changes in CI files that are not linked with this PR.

done!

@agramfort
Copy link
Member
agramfort commented Mar 4, 2022 via email

@xiaowei1234
Copy link
Author

Setting a different alpha per feature can be done when using a different scaling per columns.
Message ID: @.*** com>
With an L1 reg If you multiply by 2 a feature it's like dividing by 2 it's regularization. I would use this idea to have a unit test.

In proof of correctness script you can see that the penalty is actually an L2 penalty and not an L1 penalty.

The gist below which you can run on your own machine. You can see that Lasso regression produces the exact results you expect but not the GLM

https://gist.github.com/xiaowei1234/08dbd72695e6d263546c973a9a5a330b

Even if the penalty was L1 penalty the results cannot be reproduced in the same way because GLM uses maximum likelihood whereas Lasso regression you are finding the minima of a gradient.

@agramfort
Copy link
Member
agramfort commented Mar 4, 2022 via email

@xiaowei1234
Copy link
Author

PoissonRegressor is using L2 regularized (squared L2 norm) so if you scale X by 2 than you need to multiply alpha by 4 not 2

Well I did as you asked and the results aren't matching
https://gist.github.com/xiaowei1234/b2ebaac57769e2e50df98d62234904ae

I managed to get Ridge OLS to match by multiplying alpha by 2 and X by 4 but that doesn't work for the GLM

@agramfort
C 6D40 opy link
Member

I don't have time to look too deeply but to me it's surprising that it does not work with Poisson. Maybe @lorentzenchr sees the issue more immediately...

@xiaowei1234
Copy link
Author

I think the best course forward is I'll use the toy data set and run it using statsmodels locally and record down the coefficients. Then write a unit test using the toy dataset and those coefficients produced by statsmodels

@thomasjpfan
Copy link
Member
thomasjpfan commented Mar 6, 2022

Here is a quick test that shows the relationship between alpha and scaling X:

from numpy.testing import assert_allclose
from sklearn.linear_model import Ridge
from sklearn.datasets import make_low_rank_matrix
from sklearn.linear_model import PoissonRegressor

n_features = 10
rng = np.random.RandomState(42)
X = make_low_rank_matrix(n_samples=500, n_features=n_features, random_state=rng)
coef = rng.uniform(low=-2, high=2, size=n_features) / np.max(X, axis=0)
y = rng.poisson(lam=np.exp(X @ coef))

# Comparing coefs in ridge
ridge1 = Ridge(alpha=1).fit(X, y)
ridge4 = Ridge(alpha=4).fit(2*X, y)
assert_allclose(2 * ridge4.coef_, ridge1.coef_, atol=1e-4)

# Comparing coefs in poisson
poiss1 = PoissonRegressor(alpha=1).fit(X, y)
poiss4 = PoissonRegressor(alpha=4).fit(2 * X, y)
assert_allclose(2 * poiss4.coef_, poiss1.coef_, atol=1e-4)

Note the factor of 2 when comparing the coefs. (One can derive this relationship by writing out the minimization problem and seeing the relationship between X, alpha, and the coefficents)

@xiaowei1234
Copy link
Author

Here is a quick test that shows the relationship between alpha and scaling X:

Thanks! I didn't think of actually hand deriving it myself. I included a new unit test according to the scaling laid out in my latest push

Copy link
Member
@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiaowei1234 Thanks for working on this.

Copy link
Member
@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff looks good but we need to see how to deal with inconsistencies with Ridge classes

@xiaowei1234
Copy link
Author

ok cool. None of the GLM's support multiple targets so it would require a fundamental rewrite to make it consistent.

@lorentzenchr
Copy link
Member

but we need to see how to deal with inconsistencies with Ridge classes

None of the GLM's support multiple targets so it would require a fundamental rewrite to make it consistent.

This API discussion is better continued (and hopefully worked out) in the issue #11566.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0