8000 [MRG+1] TST Move roc_auc_score from METRIC_UNDEFINED_BINARY to METRIC_UNDEFINED_MULTICLASS by qinhanmin2014 · Pull Request #9786 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] TST Move roc_auc_score from METRIC_UNDEFINED_BINARY to METRIC_UNDEFINED_MULTICLASS #9786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Sep 27, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ random sampling procedures.

- :class:`decomposition.IncrementalPCA` in Python 2 (bug fix)
- :class:`isotonic.IsotonicRegression` (bug fix)
- :class:`metrics.roc_auc_score` (bug fix)

Details are listed in the changelog below.

Expand Down Expand Up @@ -58,8 +59,6 @@ Classifiers and regressors
:class:`sklearn.naive_bayes.GaussianNB` to give a precise control over
variances calculation. :issue:`9681` by :user:`Dmitry Mottl <Mottl>`.



Model evaluation and meta-estimators

- A scorer based on :func:`metrics.brier_score_loss` is also available.
Expand Down Expand Up @@ -108,6 +107,11 @@ Decomposition, manifold learning and clustering
- Fixed a bug in :func:`datasets.fetch_kddcup99`, where data were not properly
shuffled. :issue:`9731` by `Nicolas Goix`_.

Metrics

- Fixed a bug due to floating point error in :func:`metrics.roc_auc_score` with
non-integer sample weights. :issue:`9786` by :user:`Hanmin Qin <qinhanmin2014>`.

API changes summary
-------------------

Expand Down
8 changes: 5 additions & 3 deletions sklearn/metrics/ranking.py
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ def _binary_roc_auc_score(y_true, y_score, sample_weight=None):

fpr, tpr, tresholds = roc_curve(y_true, y_score,
sample_weight=sample_weight)
return auc(fpr, tpr, reorder=True)
return auc(fpr, tpr)

return _average_binary_score(
_binary_roc_auc_score, y_true, y_score, average,
Expand Down Expand Up @@ -299,7 +299,7 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
thresholds : array, shape = [n_thresholds]
Decreasing score values.
"""
check_consistent_length(y_true, y_score)
check_consistent_length(y_true, y_score, sample_weight)
y_true = column_or_1d(y_true)
y_score = column_or_1d(y_score)
assert_all_finite(y_true)
Expand Down Expand Up @@ -341,7 +341,9 @@ def _binary_clf_curve(y_true, y_score, pos_label=None, sample_weight=None):
# accumulate the true positives with decreasing threshold
tps = stable_cumsum(y_true * weight)[threshold_idxs]
if sample_weight is not None:
fps = stable_cumsum(weight)[threshold_idxs] - tps
# express fps as a cumsum to ensure fps is increasing even in
# the presense of floating point errors
fps = stable_cumsum((1 - y_true) * weight)[threshold_idxs]
else:
fps = 1 + threshold_idxs - tps
return fps, tps, y_score[threshold_idxs]
Expand Down
12 changes: 6 additions & 6 deletions sklearn/metrics/tests/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,12 +198,6 @@
"samples_recall_score",
"coverage_error",

"roc_auc_score",
"micro_roc_auc",
"weighted_roc_auc",
"macro_roc_auc",
"samples_roc_auc",

"average_precision_score",
"weighted_average_precision_score",
"micro_average_precision_score",
Expand All @@ -218,6 +212,12 @@
METRIC_UNDEFINED_MULTICLASS = [
"brier_score_loss",

"roc_auc_score",
"micro_roc_auc",
"weighted_roc_auc",
"macro_roc_auc",
"samples_roc_auc",

# with default average='binary', multiclass is prohibited
"precision_score",
"recall_score",
Expand Down
12 changes: 12 additions & 0 deletions sklearn/metrics/tests/test_ranking.py
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,18 @@ def test_roc_curve_drop_intermediate():
[1.0, 0.9, 0.7, 0.6, 0.])


def test_roc_curve_fpr_tpr_increasing():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the fact that elements are sorted for one random sample isn't a very strong assurance. There are edge cases that could be further tested (such as having repeated thresholds), too, but I'm not sure what reasonable edge cases for this test are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the edge cases are when the definition of fps are not equal because of floating point errors:

tps = stable_cumsum(y_true * weight)[threshold_idxs]
fps = stable_cumsum(weight)[threshold_idxs] - tps
fps = stable_cumsum((1 - y_true) * weight)[threshold_idxs]

It is not obvious to me how to construct simply an example that does not work but maybe with a little bit of thought there is a way to put a simpler one together.

For full details the best is to look at the definition of _binary_clf_curve, especially how the other variables are defined.

Copy link
Member
@lesteve lesteve Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I found a simpler example:

def test_roc_curve_fpr_tpr_increasing():
    # Ensure that fpr and tpr returned by roc_curve are increasing
    # Construct an edge case with float y_score and sample_weight
    # when some adjacent values of fpr and tpr are the same.
    y_true = [0, 0, 1, 1, 1]
    y_score = [0.1, 0.7, 0.3, 0.4, 0.5]
    sample_weight = np.repeat(0.2, 5)
    fpr, tpr, _ = roc_curve(y_true, y_score,
                            sample_weight=sample_weight)
    assert_equal((np.diff(fpr) < 0).sum(), 0)
    assert_equal((np.diff(tpr) < 0).sum(), 0)

Are you happier with this one @jnothman?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Pdb) np.diff(fpr)
array([  5.00000000e-01,   0.00000000e+00,   2.22044605e-16,
        -3.33066907e-16,   5.00000000e-01])

# Ensure that fpr and tpr returned by roc_curve are increasing.
# Construct an edge case with float y_score and sample_weight
# when some adjacent values of fpr and tpr are actually the same.
y_true = [0, 0, 1, 1, 1]
y_score = [0.1, 0.7, 0.3, 0.4, 0.5]
sample_weight = np.repeat(0.2, 5)
fpr, tpr, _ = roc_curve(y_true, y_score, sample_weight=sample_weight)
assert_equal((np.diff(fpr) < 0).sum(), 0)
assert_equal((np.diff(tpr) < 0).sum(), 0)


def test_auc():
# Test Area Under Curve (AUC) computation
x = [0, 1]
Expand Down
0