10000 don't overwrite precompute=True in lassocv by amueller · Pull Request #14591 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

don't overwrite precompute=True in lassocv #14591

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Aug 22, 2019

Conversation

amueller
Copy link
Member
@amueller amueller commented Aug 7, 2019

Fixes #11014, alternative to #11021

Unfortunately this can't be tested (in a way I can see) because LassoCV doesn't store the underlying model

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(could be tested with a mock)

@amueller
Copy link
Member Author
amueller commented Aug 8, 2019

Oh you'd mock the model attribute and check that it's passed to that correctly? how would that work?
don't think it's necessary here, but I'd love to learn from your mocking skills ;)

@jnothman
Copy link
Member
jnothman commented Aug 8, 2019 via email

@amueller
Copy link
Member Author
amueller commented Aug 9, 2019

cc @agramfort maybe?

model.precompute = False
precompute = getattr(self, "precompute", None)
if isinstance(precompute, str) and precompute == "auto":
model.precompute = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't there be an,

elif isinstance(precompute, bool):
    model.precompute = precompute

?
Say in Lasso the default for dense is precompute=False, which means that even if precompute=True is passed to LassoCV, it would still not be used. Or am I missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the right fix. The _pre_fit function uses:

precompute = (n_samples > n_features)

I would introduce a private function

def _get_precompute(X, precompute):
      if isinstance(precompute, str) and precompute == "auto":
           n_samples, n_features = X.shape
           return (n_samples > n_features)
      else:
           return precompute

that gets called in both _pre_fit and here taking as input self.precompute

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I yes I agree we should use the same strategy for the path and the single fit at the end. Otherwise it's dangerous as seen with the issue reported.

now this being said it's quite uncommon to use a Lasso with such n_samples >> n_features @Meta95
that's why i guess the problem was never reported.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rth model.precompute was already set above to precompute as part of common_params.
@agramfort In Lasso, we removed the "auto" function because it was not helpful according to #3249. So I assume for a single fit setting it to False is a better choice than using the heuristic here. That would mean that if it's set to "auto", the path algorithm might use precompute=True while the last fit doesn't. But I think that makes sense, given that the path algorithm reuses results while the single fit doesn't.
That's purely based on the discussion in #3249 though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok you're right. it's the right fix as auto was removed from Lasso. Now I realize that 'auto' could/should have been kept with a rule like (n_samples > 3 * n_features) for example (3 or more based on a bench). Anyhow +1 for MRG as it is. thx @amueller for taking a stab at this.

TST Adds test to precompute
super().fit(X, y)
assert self.precompute == inner_precompute

monkeypatch.setattr("sklearn.linear_model.coordinate_descent.Lasso",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You humoured me!

However this test can pass if LassoCV does not use Lasso. So you need to assert (or believe coverage) that the mock is used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan did ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly monkeypatch doesn't seem to keep track? I could probably make it inherit from mock or something but this ugly solution seems to work?

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's new?

@amueller
Copy link
Member Author

@jnothman will add tomorrow. I would love to hear back from @agramfort about whether he agrees with the current behavior.

@amueller
Copy link
Member Author

merge?

@agramfort agramfort merged commit 17786ae into scikit-learn:master Aug 22, 2019
@agramfort
Copy link
Member

thx @amueller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LassoCV always sets precompute to False before fitting the chosen alpha value
5 participants
0