-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] separate penalty factors for GLM regressions #22485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I don't see any actual update to the code logic I am not sure what you add here...
sklearn/linear_model/_glm/glm.py
Outdated
min_val=0.0, | ||
include_boundaries="left", | ||
) | ||
self.alpha = np.asarray(self.alpha, dtype=np.float_).ravel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than np.float_ I would use X.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will change to np.float64, I don't like the idea of using X.dtype in the off chance that X.dtype is integer dtype. Even if X is all integers it would be very wrong for the penalty terms to be integers
sklearn/linear_model/_glm/glm.py
Outdated
@@ -268,7 +264,29 @@ def fit(self, X, y, sample_weight=None): | |||
y_numeric=True, | |||
multi_output=False, | |||
) | |||
|
|||
if hasattr(self.alpha, "__iter__") and not isinstance(self.alpha, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see how testing for array-like is done elsewhere. Looking at private attributes for this is not clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the checks I saw would first convert the variable to a numpy array and then check the size, problem here is if the regression only has one variable then the code would get clunky when checking
np.asarray(self.alpha).size == 1
so instead I used Iterable class from collections. It does result in additional library being imported but it is a core Python class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally we use:
scikit-learn/sklearn/utils/validation.py
Lines 248 to 250 in 742d39c
def _is_arraylike_not_scalar(array): | |
"""Return True if array is array-like and not a scalar""" | |
return _is_arraylike(array) and not np.isscalar(array) |
(The "not np.isscalar" part handles the string case)
the logic for using an array instead of a scalar already works in the code! All that was needed was to allow for the class to take an iterable in addition to a scalar |
Could you merge |
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org>
done! |
Setting a different alpha per feature can be done when using a different
scaling per columns.
Message ID: ***@***.***
com>
With an L1 reg If you multiply by 2 a feature it's like dividing by 2 it's
regularization. I would use
this idea to have a unit test.
|
In proof of correctness script you can see that the penalty is actually an L2 penalty and not an L1 penalty. The gist below which you can run on your own machine. You can see that Lasso regression produces the exact results you expect but not the GLM https://gist.github.com/xiaowei1234/08dbd72695e6d263546c973a9a5a330b Even if the penalty was L1 penalty the results cannot be reproduced in the same way because GLM uses maximum likelihood whereas Lasso regression you are finding the minima of a gradient. |
PoissonRegressor is using L2 regularized (squared L2 norm) so if you scale
X by 2 than you need to multiply alpha by 4 not 2
|
Well I did as you asked and the results aren't matching I managed to get Ridge OLS to match by multiplying alpha by 2 and X by 4 but that doesn't work for the GLM |
I don't have time to look too deeply but to me it's surprising that it does not work with Poisson. Maybe @lorentzenchr sees the issue more immediately... |
I think the best course forward is I'll use the toy data set and run it using statsmodels locally and record down the coefficients. Then write a unit test using the toy dataset and those coefficients produced by statsmodels |
Here is a quick test that shows the relationship between alpha and scaling from numpy.testing import assert_allclose
from sklearn.linear_model import Ridge
from sklearn.datasets import make_low_rank_matrix
from sklearn.linear_model import PoissonRegressor
n_features = 10
rng = np.random.RandomState(42)
X = make_low_rank_matrix(n_samples=500, n_features=n_features, random_state=rng)
coef = rng.uniform(low=-2, high=2, size=n_features) / np.max(X, axis=0)
y = rng.poisson(lam=np.exp(X @ coef))
# Comparing coefs in ridge
ridge1 = Ridge(alpha=1).fit(X, y)
ridge4 = Ridge(alpha=4).fit(2*X, y)
assert_allclose(2 * ridge4.coef_, ridge1.coef_, atol=1e-4)
# Comparing coefs in poisson
poiss1 = PoissonRegressor(alpha=1).fit(X, y)
poiss4 = PoissonRegressor(alpha=4).fit(2 * X, y)
assert_allclose(2 * poiss4.coef_, poiss1.coef_, atol=1e-4) Note the factor of 2 when comparing the coefs. (One can derive this relationship by writing out the minimization problem and seeing the relationship between |
Thanks! I didn't think of actually hand deriving it myself. I included a new unit test according to the scaling laid out in my latest push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiaowei1234 Thanks for working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
diff looks good but we need to see how to deal with inconsistencies with Ridge classes
ok cool. None of the GLM's support multiple targets so it would require a fundamental rewrite to make it consistent. |
This API discussion is better continued (and hopefully worked out) in the issue #11566. |
Task list
Reference Issues/PRs
#11566
What does this implement/fix? Explain your changes.
Proof of correctness in script here:
https://github.com/MPCS-51042/final-project-xiaowei1234/blob/3b52d7ee67f2360aeace81b9479161b05c1c0ecb/proof.py
Link to output of proof script:
https://github.com/MPCS-51042/final-project-xiaowei1234/blob/3b52d7ee67f2360aeace81b9479161b05c1c0ecb/files/proof_output.log