8000 DOC clarify doc-string of roc_auc_score and add references (#15293) · scikit-learn/scikit-learn@2e90b89 · GitHub
[go: up one dir, main page]

Skip to content

Commit 2e90b89

Browse files
oulenzglemaitre
authored andcommitted
DOC clarify doc-string of roc_auc_score and add references (#15293)
1 parent cc8d2d2 commit 2e90b89

File tree

3 files changed

+64
-40
lines changed

3 files changed

+64
-40
lines changed

doc/modules/model_evaluation.rst

+10-6
Original file line numberDiff line numberDiff line change
@@ -1348,8 +1348,8 @@ the one-vs-rest algorithm computes the average of the ROC AUC scores for each
13481348
class against all other classes. In both cases, the predicted labels are
13491349
provided in an array with values from 0 to ``n_classes``, and the scores
13501350
correspond to the probability estimates that a sample belongs to a particular
1351-
class. The OvO and OvR algorithms supports weighting uniformly
1352-
(``average='macro'``) and weighting by the prevalence (``average='weighted'``).
1351+
class. The OvO and OvR algorithms support weighting uniformly
1352+
(``average='macro'``) and by prevalence (``average='weighted'``).
13531353

13541354
**One-vs-one Algorithm**: Computes the average AUC of all possible pairwise
13551355
combinations of classes. [HT2001]_ defines a multiclass AUC metric weighted
@@ -1380,10 +1380,10 @@ the keyword argument ``multiclass`` to ``'ovo'`` and ``average`` to
13801380
``'weighted'``. The ``'weighted'`` option returns a prevalence-weighted average
13811381
as described in [FC2009]_.
13821382

1383-
**One-vs-rest Algorithm**: Computes the AUC of each class against the rest.
1384-
The algorithm is functionally the same as the multilabel case. To enable this
1385-
algorithm set the keyword argument ``multiclass`` to ``'ovr'``. Similar to
1386-
OvO, OvR supports two types of averaging: ``'macro'`` [F2006]_ and
1383+
**One-vs-rest Algorithm**: Computes the AUC of each class against the rest
1384+
[PD2000]_. The algorithm is functionally the same as the multilabel case. To
1385+
enable this algorithm set the keyword argument ``multiclass`` to ``'ovr'``.
1386+
Like OvO, OvR supports two types of averaging: ``'macro'`` [F2006]_ and
13871387
``'weighted'`` [F2001]_.
13881388

13891389
In applications where a high false positive rate is not tolerable the parameter
@@ -1422,6 +1422,10 @@ to the given limit.
14221422
<https://www.math.ucdavis.edu/~saito/data/roc/ferri-class-perf-metrics.pdf>`_
14231423
Pattern Recognition Letters. 30. 27-38.
14241424
1425+
.. [PD2000] Provost, F., Domingos, P. (2000). Well-trained PETs: Improving
1426+
probability estimation trees (Section 6.2), CeDER Working Paper #IS-00-04,
1427+
Stern School of Business, New York University.
1428+
14251429
.. [F2006] Fawcett, T., 2006. `An introduction to ROC analysis.
14261430
<http://www.sciencedirect.com/science/article/pii/S016786550500303X>`_
14271431
Pattern Recognition Letters, 27(8), pp. 861-874.

sklearn/metrics/_ranking.py

+53-33
Original file line numberDiff line numberDiff line change
@@ -248,27 +248,32 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
248248
"""Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
249249
from prediction scores.
250250
251-
Note: this implementation is restricted to the binary classification task
252-
or multilabel classification task in label indicator format.
251+
Note: this implementation can be used with binary, multiclass and
252+
multilabel classification, but some restrictions apply (see Parameters).
253253
254254
Read more in the :ref:`User Guide <roc_metrics>`.
255255
256256
Parameters
257257
----------
258-
y_true : array, shape = [n_samples] or [n_samples, n_classes]
259-
True binary labels or binary label indicators.
260-
The multiclass case expects shape = [n_samples] and labels
261-
with values in ``range(n_classes)``.
262-
263-
y_score : array, shape = [n_samples] or [n_samples, n_classes]
264-
Target scores, can either be probability estimates of the positive
265-
class, confidence values, or non-thresholded measure of decisions
266-
(as returned by "decision_function" on some classifiers). For binary
267-
y_true, y_score is supposed to be the score of the class with greater
268-
label. The multiclass case expects shape = [n_samples, n_classes]
269-
where the scores correspond to probability estimates.
270-
271-
average : string, [None, 'micro', 'macro' (default), 'samples', 'weighted']
258+
y_true : array-like of shape (n_samples,) or (n_samples, n_classes)
259+
True labels or binary label indicators. The binary and multiclass cases
260+
expect labels with shape (n_samples,) while the multilabel case expects
261+
binary label indicators with shape (n_samples, n_classes).
262+
263+
y_score : array-like of shape (n_samples,) or (n_samples, n_classes)
264+
Target scores. In the binary and multilabel cases, these can be either
265+
probability estimates or non-thresholded decision values (as returned
266+
by `decision_function` on some classifiers). In the multiclass case,
267+
these must be probability estimates which sum to 1. The binary
268+
case expects a shape (n_samples,), and the scores must be the scores of
269+
the class with the greater label. The multiclass and multilabel
270+
cases expect a shape (n_samples, n_classes). In the multiclass case,
271+
the order of the class scores must correspond to the order of
272+
``labels``, if provided, or else to the numerical or lexicographical
273+
order of the labels in ``y_true``.
274+
275+
average : {'micro', 'macro', 'samples', 'weighted'} or None, \
276+
default='macro'
272277
If ``None``, the scores for each class are returned. Otherwise,
273278
this determines the type of averaging performed on the data:
274279
Note: multiclass ROC AUC currently only handles the 'macro' and
@@ -291,26 +296,32 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
291296
sample_weight : array-like of shape (n_samples,), default=None
292297
Sample weights.
293298
294-
max_fpr : float > 0 and <= 1, optional
295-
If not ``None``, the standardized partial AUC [3]_ over the range
299+
max_fpr : float > 0 and <= 1, default=None
300+
If not ``None``, the standardized partial AUC [2]_ over the range
296301
[0, max_fpr] is returned. For the multiclass case, ``max_fpr``,
297302
should be either equal to ``None`` or ``1.0`` as AUC ROC partial
298303
computation currently is not supported for multiclass.
299304
300-
multi_class : string, 'ovr' or 'ovo', optional(default='raise')
301-
Determines the type of multiclass configuration to use.
302-
``multi_class`` must be provided when ``y_true`` is multiclass.
305+
multi_class : {'raise', 'ovr', 'ovo'}, default='raise'
306+
Multiclass only. Determines the type of configuration to use. The
307+
default value raises an error, so either ``'ovr'`` or ``'ovo'`` must be
308+
passed explicitly.
303309
304310
``'ovr'``:
305-
Calculate metrics for the multiclass case using the one-vs-rest
306-
approach.
311+
Computes the AUC of each class against the rest [3]_ [4]_. This
312+
treats the multiclass case in the same way as the multilabel case.
313 F438 +
Sensitive to class imbalance even when ``average == 'macro'``,
314+
because class imbalance affects the composition of each of the
315+
'rest' groupings.
307316
``'ovo'``:
308-
Calculate metrics for the multiclass case using the one-vs-one
309-
approach.
317+
Computes the average AUC of all possible pairwise combinations of
318+
classes [5]_. Insensitive to class imbalance when
319+
``average == 'macro'``.
310320
311-
labels : array, shape = [n_classes] or None, optional (default=None)
312-
List of labels to index ``y_score`` used for multiclass. If ``None``,
313-
the lexicon order of ``y_true`` is used to index ``y_score``.
321+
labels : array-like of shape (n_classes,), default=None
322+
Multiclass only. List of labels that index the classes in ``y_score``.
323+
If ``None``, the numerical or lexicographical order of the labels in
324+
``y_true`` is used.
314325
315326
Returns
316327
-------
@@ -321,12 +332,22 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
321332
.. [1] `Wikipedia entry for the Receiver operating characteristic
322333
<https://en.wikipedia.org/wiki/Receiver_operating_characteristic>`_
323334
324-
.. [2] Fawcett T. An introduction to ROC analysis[J]. Pattern Recognition
325-
Letters, 2006, 27(8):861-874.
326-
327-
.. [3] `Analyzing a portion of the ROC curve. McClish, 1989
335+
.. [2] `Analyzing a portion of the ROC curve. McClish, 1989
328336
<https://www.ncbi.nlm.nih.gov/pubmed/2668680>`_
329337
338+
.. [3] Provost, F., Domingos, P. (2000). Well-trained PETs: Improving
339+
probability estimation trees (Section 6.2), CeDER Working Paper
340+
#IS-00-04, Stern School of Business, New York University.
341+
342+
.. [4] `Fawcett, T. (2006). An introduction to ROC analysis. Pattern
343+
Recognition Letters, 27(8), 861-874.
344+
<https://www.sciencedirect.com/science/article/pii/S016786550500303X>`_
345+
346+
.. [5] `Hand, D.J., Till, R.J. (2001). A Simple Generalisation of the Area
347+
Under the ROC Curve for Multiple Class Classification Problems.
348+
Machine Learning, 45(2), 171-186.
349+
<http://link.springer.com/article/10.1023/A:1010920819831>`_
350+
330351
See also
331352
--------
332353
average_precision_score : Area under the precision-recall curve
@@ -341,7 +362,6 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None,
341362
>>> y_scores = np.array([0.1, 0.4, 0.35, 0.8])
342363
>>> roc_auc_score(y_true, y_scores)
343364
0.75
344-
345365
"""
346366

347367
y_type = type_of_target(y_true)

sklearn/metrics/tests/test_ranking.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -554,7 +554,7 @@ def test_multiclass_ovr_roc_auc_toydata(y_true, labels):
554554
result_unweighted)
555555

556556
# Tests the weighted, one-vs-rest multiclass ROC AUC algorithm
557-
# on the same input (Provost & Domingos, 2001)
557+
# on the same input (Provost & Domingos, 2000)
558558
result_weighted = out_0 * 0.25 + out_1 * 0.25 + out_2 * 0.5
559559
assert_almost_equal(
560560
roc_auc_score(

0 commit comments

Comments
 (0)
0