8000 clarify doc-string of roc_auc_score, add references · scikit-learn/scikit-learn@a76ec97 · GitHub
[go: up one dir, main page]

Skip to content

Commit a76ec97

Browse files
committed
clarify doc-string of roc_auc_score, add references
1 parent 0bbe923 commit a76ec97

File tree

3 files changed

+55
-31
lines changed

3 files changed

+55
-31
lines changed

doc/modules/model_evaluation.rst

+10-6
Original file line numberDiff line numberDiff line change
@@ -1348,8 +1348,8 @@ the one-vs-rest algorithm computes the average of the ROC AUC scores for each
13481348
class against all other classes. In both cases, the predicted labels are
13491349
provided in an array with values from 0 to ``n_classes``, and the scores
13501350
correspond to the probability estimates that a sample belongs to a particular
1351-
class. The OvO and OvR algorithms supports weighting uniformly
1352-
(``average='macro'``) and weighting by the prevalence (``average='weighted'``).
1351+
class. The OvO and OvR algorithms support weighting uniformly
1352+
(``average='macro'``) and by prevalence (``average='weighted'``).
13531353

13541354
**One-vs-one Algorithm**: Computes the average AUC of all possible pairwise
13551355
combinations of classes. [HT2001]_ defines a multiclass AUC metric weighted
@@ -1380,10 +1380,10 @@ the keyword argument ``multiclass`` to ``'ovo'`` and ``average`` to
13801380
``'weighted'``. The ``'weighted'`` option returns a prevalence-weighted average
13811381
as described in [FC2009]_.
13821382

1383-
**One-vs-rest Algorithm**: Computes the AUC of each class against the rest.
1384-
The algorithm is functionally the same as the multilabel case. To enable this
1385-
algorithm set the keyword argument ``multiclass`` to ``'ovr'``. Similar to
1386-
OvO, OvR supports two types of averaging: ``'macro'`` [F2006]_ and
1383+
**One-vs-rest Algorithm**: Computes the AUC of each class against the rest
1384+
[PD2000]_. The algorithm is functionally the same as the multilabel case. To
1385+
enable this algorithm set the keyword argument ``multiclass`` to ``'ovr'``.
1386+
Like OvO, OvR supports two types of averaging: ``'macro'`` [F2006]_ and
13871387
``'weighted'`` [F2001]_.
13881388

13891389
In applications where a high false positive rate is not tolerable the parameter
@@ -1422,6 +1422,10 @@ to the given limit.
14221422
<https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf>`_
14231423
Pattern Recognition Letters. 30. 27-38.
14241424
1425+
.. [PD2000] Provost, F., Domingos, P. (2000). Well-trained PETs: Improving
1426+
probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04,
1427+
Stern School of Business, New York University.
1428+
14251429
.. [F2006] Fawcett, T., 2006. `An introduction to ROC analysis.
14261430
<http://www.sciencedirect.com/science/article/pii/S016786550500303X>`_
14271431
Pattern Recognition Letters, 27(8), pp. 861-874.

sklearn/metrics/_ranking.py

+44-24
Original file line numberDiff line numberDiff line change
@@ -248,25 +248,29 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
248248
"""Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
249249
from prediction scores.
250250
251-
Note: this implementation is restricted to the binary classification task
252-
or multilabel classification task in label indicator format.
251+
Note: this implementation can be used with binary, multiclass and
252+
multilabel classification, but some restrictions apply (see Parameters).
253253
254254
Read more in the :ref:`User Guide <roc_metrics>`.
255255
256256
Parameters
257257
----------
258258
y_true : array, shape = [n_samples] or [n_samples, n_classes]
259-
True binary labels or binary label indicators.
260-
The multiclass case expects shape = [n_samples] and labels
261-
with values in ``range(n_classes)``.
259+
True labels or binary label indicators. The binary and multiclass cases
260+
expect labels with shape = [n_samples], the multilabel case expects
261+
binary label indicators with shape = [n_samples, n_classes].
262262
263263
y_score : array, shape = [n_samples] or [n_samples, n_classes]
264-
Target scores, can either be probability estimates of the positive
265-
class, confidence values, or non-thresholded measure of decisions
266-
(as returned by "decision_function" on some classifiers). For binary
267-
y_true, y_score is supposed to be the score of the class with greater
268-
label. The multiclass case expects shape = [n_samples, n_classes]
269-
where the scores correspond to probability estimates.
264+
Target scores. In the binary and multilabel cases, these can be either
265+
probability estimates or non-thresholded decision values (as returned
266+
by "decision_function" on some classifiers). In the multiclass case,
267+
these must be probability estimates which sum to 1. The binary
268+
case expects shape = [n_samples], and the scores must be the scores of
269+
the class with the greater label. The multiclass and multilabel
270+
cases expect shape = [n_samples, n_classes]. In the multiclass case,
271+
the order of the class scores must correspond to the order of
272+
``labels``, if provided, or else to the numerical or lexicographical
273+
order of the labels in ``y_true``.
270274
271275
average : string, [None, 'micro', 'macro' (default), 'samples', 'weighted']
272276
If ``None``, the scores for each class are returned. Otherwise,
@@ -292,25 +296,31 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
292296
Sample weights.
293297
294298
max_fpr : float > 0 and <= 1, optional
295-
If not ``None``, the standardized partial AUC [3]_ over the range
299+
If not ``None``, the standardized partial AUC [2]_ over the range
296300
[0, max_fpr] is returned. For the multiclass case, ``max_fpr``,
297301
should be either equal to ``None`` or ``1.0`` as AUC ROC partial
298302
computation currently is not supported for multiclass.
299303
300304
multi_class : string, 'ovr' or 'ovo', optional(default='raise')
301-
Determines the type of multiclass configuration to use.
302-
``multi_class`` must be provided when ``y_true`` is multiclass.
305+
Multiclass only. Determines the type of configuration to use. The
306+
default value raises an error, so either ``'ovr'`` or ``'ovo'`` must be
307+
passed explicitly.
303308
304309
``'ovr'``:
305-
Calculate metrics for the multiclass case using the one-vs-rest
306-
approach.
310+
Computes the AUC of each class against the rest [3]_ [4]_. This
311+
treats the multiclass case in the same way as the multilabel case.
312+
Sensitive to class imbalance even when ``average == 'macro'``,
313+
because class imbalance affects the composition of each of the
314+
'rest' groupings.
307315
``'ovo'``:
308-
Calculate metrics for the multiclass case using the one-vs-one
309-
approach.
316+
Computes the average AUC of all possible pairwise combinations of
317+
classes [5]_. Insensitive to class imbalance when
318+
``average == 'macro'``.
310319
311320
labels : array, shape = [n_classes] or None, optional (default=None)
312-
List of labels to index ``y_score`` used for multiclass. If ``None``,
313-
the lexicon order of ``y_true`` is used to index ``y_score``.
321+
Multiclass only. List of labels that index the classes in ``y_score``.
322+
If ``None``, the numerical or lexicographical order of the labels in
323+
``y_true`` is used.
314324
315325
Returns
316326
-------
@@ -321,12 +331,22 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
321331
.. [1] `Wikipedia entry for the Receiver operating characteristic
322332
<https://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
323333
324-
.. [2] Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition
325-
Letters, 2006, 27(8):861-874.
326-
327-
.. [3] `Analyzing a portion of the ROC curve. McClish, 1989
334+
.. [2] `Analyzing a portion of the ROC curve. McClish, 1989
328335
<https://www.ncbi.nlm.nih.gov/pubmed/2668680>`_
329336
337+
.. [3] Provost, F., Domingos, P. (2000). Well-trained PETs: Improving
338+
probability estimation trees (Section 6.2), CeDER Working Paper
339+
#IS-00-04, Stern School of Business, New York University.
340+
341+
.. [4] `Fawcett, T. (2006). An introduction to ROC analysis. Pattern
342+
Recognition Letters, 27(8), 861-874.
343+
<https://www.sciencedirect.com/science/article/pii/S016786550500303X>`_
344+
345+
.. [5] `Hand, D.J., Till, R.J. (2001). A Simple Generalisation of the Area
346+
Under the ROC Curve for Multiple Class Classification Problems.
347+
Machine Learning, 45(2), 171-186.
348+
<http://link.springer.com/article/10.1023/A:1010920819831>`_
349+
330350
See also
331351
--------
332352
average_precision_score : Area under the precision-recall curve

sklearn/metrics/tests/test_ranking.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -554,7 +554,7 @@ def test_multiclass_ovr_roc_auc_toydata(y_true, labels):
554554
result_unweighted)
555555

556556
# Tests the weighted, one-vs-rest multiclass ROC AUC algorithm
557-
# on the same input (Provost & Domingos, 2001)
557+
# on the same input (Provost & Domingos, 2000)
558558
result_weighted = out_0 * 0.25 + out_1 * 0.25 + out_2 * 0.5
559559
assert_almost_equal(
560560
roc_auc_score(

0 commit comments

Comments
 (0)
0