-
-
Notifications
You must be signed in to change notification settings - Fork 26k
don't overwrite precompute=True in lassocv #14591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
don't overwrite precompute=True in lassocv #14591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(could be tested with a mock)
Oh you'd mock the model attribute and check that it's passed to that correctly? how would that work? |
I'd mock the path function or something and check whether precompute gets
passed
|
cc @agramfort maybe? |
model.precompute = False | ||
precompute = getattr(self, "precompute", None) | ||
if isinstance(precompute, str) and precompute == "auto": | ||
model.precompute = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't there be an,
elif isinstance(precompute, bool):
model.precompute = precompute
?
Say in Lasso
the default for dense is precompute=False
, which means that even if precompute=True
is passed to LassoCV
, it would still not be used. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the right fix. The _pre_fit function uses:
scikit-learn//sklearn/linear_model/base.py
Line 546 in 605ae39
precompute = (n_samples > n_features) |
I would introduce a private function
def _get_precompute(X, precompute):
if isinstance(precompute, str) and precompute == "auto":
n_samples, n_features = X.shape
return (n_samples > n_features)
else:
return precompute
that gets called in both _pre_fit and here taking as input self.precompute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I yes I agree we should use the same strategy for the path and the single fit at the end. Otherwise it's dangerous as seen with the issue reported.
now this being said it's quite uncommon to use a Lasso with such n_samples >> n_features @Meta95
that's why i guess the problem was never reported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rth model.precompute was already set above to precompute
as part of common_params
.
@agramfort In Lasso, we removed the "auto" function because it was not helpful according to #3249. So I assume for a single fit setting it to False
is a better choice than using the heuristic here. That would mean that if it's set to "auto", the path algorithm might use precompute=True
while the last fit doesn't. But I think that makes sense, given that the path algorithm reuses results while the single fit doesn't.
That's purely based on the discussion in #3249 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok you're right. it's the right fix as auto was removed from Lasso. Now I realize that 'auto' could/should have been kept with a rule like (n_samples > 3 * n_features)
for example (3 or more based on a bench). Anyhow +1 for MRG as it is. thx @amueller for taking a stab at this.
TST Adds test to precompute
super().fit(X, y) | ||
assert self.precompute == inner_precompute | ||
|
||
monkeypatch.setattr("sklearn.linear_model.coordinate_descent.Lasso", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You humoured me!
However this test can pass if LassoCV does not use Lasso. So you need to assert (or believe coverage) that the mock is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan did ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly monkeypatch doesn't seem to keep track? I could probably make it inherit from mock or something but this ugly solution seems to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's new?
@jnothman will add tomorrow. I would love to hear back from @agramfort about whether he agrees with the current behavior. |
merge? |
thx @amueller |
Fixes #11014, alternative to #11021
Unfortunately this can't be tested (in a way I can see) because
LassoCV
doesn't store the underlying model