-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG + 1] BUG: Uses self.scoring for score function #11192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] BUG: Uses self.scoring for score function #11192
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
sklearn/linear_model/logistic.py
Outdated
if scoring is None: | ||
return (super(LogisticRegressionCV, self) | ||
.score(X, y, sample_weight=sample_weight)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably raise a ChangedBehaviorWarning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that scoring should default to "accuracy" in the init so you don't need this branching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the scoring param in the init should default to "accuracy" so you don't need this branching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other sklearn classes that accepts a scoring
parameter, stores it into self.scoring
without considering a default.
To avoid branching, I added the following to the score
function:
scoring = self.scoring or 'accuracy'
This hardcodes accuracy as the default scorer.
@@ -89,6 +91,30 @@ def test_error(): | |||
assert_raise_message(ValueError, msg, LR(max_iter="test").fit, X, Y1) | |||
|
|||
|
|||
def test_logistic_cv_neg_mean_squared_error(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit awkward that a test by this name does not check that the provided score is maximised. I assume there is an existing test that does. Either merge this test into that one, or make it explicit that this test is about the score method.
RFC: Score emits ChangedBehaviorWarning
I updated this PR with the following:
|
custom_score = assert_warns(ChangedBehaviorWarning, lr.score, X, pred) | ||
|
||
assert_equal(custom_score, mock_scorer.scores[0]) | ||
assert_equal(mock_scorer.calls, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test that by default there is no warning? Or is that overkill? Otherwise looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added test_logistic_cv_score_does_not_warn_by_default
to test that there is no warning by default.
I notice how this library is transitioning to using pytest features in its tests. I have updated the original warning check to use pytest: with pytest.warns(ChangedBehaviorWarning):
custom_score = lr.score(X, lr.predict(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
lr.fit(X, Y1) | ||
|
||
# Cs[2] has the highest score (0.8) from MockScorer | ||
assert_equal(lr.C_[0], [Cs[2]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the transition to pytest, assert_{equal,true,false}
should be avoided in new tests. use bare assert
sklearn/linear_model/logistic.py
Outdated
if self.scoring is not None: | ||
warnings.warn("The long-standing behavior to use the " | ||
"accuracy score has changed. The scoring " | ||
"parameter is now used", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Append ". This warning will disappear in version 0.22" so we remember to remove it!
RFC: Add message to warning
I updated this PR with the following:
|
# reset mock_scorer | ||
mock_scorer.calls = 0 | ||
with pytest.warns(ChangedBehaviorWarning): | ||
custom_score = lr.score(X, lr.predict(X)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're not asserting that it warns, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pytest.warns
context manager asserts that the inner block does warn. In this context, a ChangedBehaviorWarning
warning appears when scoring
is not None
. This assertion is used to test the following lines in the LogisticRegressionCV.score
function:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, haven't used those much.
looks good, needs an entry in whats_new. |
I added an entry to the whats_new docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
sklearn/linear_model/logistic.py
Outdated
X : array-like, shape = (n_samples, n_features) | ||
Test samples. | ||
|
||
y : array-like, shape = (n_samples) or (n_samples, n_outputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support (n_samples, n_outputs)
? Seems that in fit
, we only support (n_samples,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. It should be (n_samples,)
e92ba53
to
d690a1c
Compare
This PR was updated with the documentation fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @thomasjpfan
Reference Issues/PRs
Fixes #10998.
What does this implement/fix? Explain your changes.
Adds a
score
function toLogisticRegressionCV
that usesself.scoring
.