8000 [MRG + 1] BUG: Uses self.scoring for score function by thomasjpfan · Pull Request #11192 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG + 1] BUG: Uses self.scoring for score function #11192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #10998.

What does this implement/fix? Explain your changes.

Adds a score function to LogisticRegressionCV that uses self.scoring.

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

if scoring is None:
return (super(LogisticRegressionCV, self)
.score(X, y, sample_weight=sample_weight))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably raise a ChangedBehaviorWarning here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that scoring should default to "accuracy" in the init so you don't need this branching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the scoring param in the init should default to "accuracy" so you don't need this branching.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other sklearn classes that accepts a scoring parameter, stores it into self.scoring without considering a default.

To avoid branching, I added the following to the score function:

scoring = self.scoring or 'accuracy'

This hardcodes accuracy as the default scorer.

@@ -89,6 +91,30 @@ def test_error():
assert_raise_message(ValueError, msg, LR(max_iter="test").fit, X, Y1)


def test_logistic_cv_neg_mean_squared_error():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward that a test by this name does not check that the provided score is maximised. I assume there is an existing test that does. Either merge this test into that one, or make it explicit that this test is about the score method.

RFC: Score emits ChangedBehaviorWarning
@thomasjpfan
Copy link
Member Author

I updated this PR with the following:

  1. Adds ChangedBehaviorWarning to the score function.
  2. There did not seem to be a test for custom scoring functions. test_logistic_cv_mock_scorer now uses a mock to test the score function and to check that the provided score is maximized.

custom_score = assert_warns(ChangedBehaviorWarning, lr.score, X, pred)

assert_equal(custom_score, mock_scorer.scores[0])
assert_equal(mock_scorer.calls, 1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test that by default there is no warning? Or is that overkill? Otherwise looks good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added test_logistic_cv_score_does_not_warn_by_default to test that there is no warning by default.

@thomasjpfan
Copy link
Member Author

I notice how this library is transitioning to using pytest features in its tests. I have updated the original warning check to use pytest:

with pytest.warns(ChangedBehaviorWarning):
    custom_score = lr.score(X, lr.predict(X))

Copy link
Member
@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

lr.fit(X, Y1)

# Cs[2] has the highest score (0.8) from MockScorer
assert_equal(lr.C_[0], [Cs[2]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the transition to pytest, assert_{equal,true,false} should be avoided in new tests. use bare assert

if self.scoring is not None:
warnings.warn("The long-standing behavior to use the "
"accuracy score has changed. The scoring "
"parameter is now used",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Append ". This warning will disappear in version 0.22" so we remember to remove it!

RFC: Add message to warning
@thomasjpfan
Copy link
Member Author

I updated this PR with the following:

  1. Uses bare asserts.
  2. Adds "This warning will disappear in version 0.22" to warning.

# reset mock_scorer
mock_scorer.calls = 0
with pytest.warns(ChangedBehaviorWarning):
custom_score = lr.score(X, lr.predict(X))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not asserting that it warns, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pytest.warns context manager asserts that the inner block does warn. In this context, a ChangedBehaviorWarning warning appears when scoring is not None. This assertion is used to test the following lines in the LogisticRegressionCV.score function:

https://github.com/thomasjpfan/scikit-learn/blob/21af4b596023d3b3d66aaae3b3df8952a66a9e6a/sklearn/linear_model/logistic.py#L1816-L1821

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, haven't used those much.

@amueller amueller changed the title BUG: Uses self.scoring for score function [MRG + 1] BUG: Uses self.scoring for score function Jun 15, 2018
@amueller
Copy link
Member

looks good, needs an entry in whats_new.

@thomasjpfan
Copy link
Member Author

I added an entry to the whats_new docs.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

X : array-like, shape = (n_samples, n_features)
Test samples.

y : array-like, shape = (n_samples) or (n_samples, n_outputs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support (n_samples, n_outputs)? Seems that in fit, we only support (n_samples,)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. It should be (n_samples,)

@thomasjpfan thomasjpfan force-pushed the LogisticRegressionCV-scoring branch from e92ba53 to d690a1c Compare June 17, 2018 13:41
@thomasjpfan
Copy link
Member Author

This PR was updated with the documentation fix.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @thomasjpfan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LogisticRegressionCV.score doesn't respect scoring, inconsistent with GridSearchCV
5 participants
0