8000 DOC Add average precision definitions and cross references (#9583) · scikit-learn/scikit-learn@59bb32f · GitHub
[go: up one dir, main page]

Skip to content

Commit 59bb32f

Browse files
agitterjnothman
authored andcommitted
DOC Add average precision definitions and cross references (#9583)
1 parent cceb9b2 commit 59bb32f

File tree

3 files changed

+78
-22
lines changed

3 files changed

+78
-22
lines changed

doc/modules/model_evaluation.rst

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -634,10 +634,25 @@ The :func:`precision_recall_curve` computes a precision-recall curve
634634
from the ground truth label and a score given by the classifier
635635
by varying a decision threshold.
636636

637-
The :func:`average_precision_score` function computes the average precision
638-
(AP) from prediction scores. This score corresponds to the area under the
639-
precision-recall curve. The value is between 0 and 1 and higher is better.
640-
With random predictions, the AP is the fraction of positive samples.
637+
The :func:`average_precision_score` function computes the
638+
`average precision <http://en.wikipedia.org/w/index.php?title=Information_retrieval&oldid=793358396#Average_precision>`_
639+
(AP) from prediction scores. The value is between 0 and 1 and higher is better.
640+
AP is defined as
641+
642+
.. math::
643+
\text{AP} = \sum_n (R_n - R_{n-1}) P_n
644+
645+
where :math:`P_n` and :math:`R_n` are the precision and recall at the
646+
nth threshold. With random predictions, the AP is the fraction of positive
647+
samples.
648+
649+
References [Manning2008]_ and [Everingham2010]_ present alternative variants of
650+
AP that interpolate the precision-recall curve. Currently,
651+
:func:`average_precision_score` does not implement any interpolated variant.
652+
References [Davis2006]_ and [Flach2015]_ describe why a linear interpolation of
653+
points on the precision-recall curve provides an overly-optimistic measure of
654+
classifier performance. This linear interpolation is used when computing area
655+
under the curve with the trapezoidal rule in :func:`auc`.
641656

642657
Several functions allow you to analyze the precision, recall and F-measures
643658
score:
@@ -672,6 +687,24 @@ binary classification and multilabel indicator format.
672687
for an example of :func:`precision_recall_curve` usage to evaluate
673688
classifier output quality.
674689

690+
691+
.. topic:: References:
692+
693+
.. [Manning2008] C.D. Manning, P. Raghavan, H. Schütze, `Introduction to Information Retrieval
694+
<http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html>`_,
695+
2008.
696+
.. [Everingham2010] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman,
697+
`The Pascal Visual Object Classes (VOC) Challenge
698+
<http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.5766&rep=rep1&type=pdf>`_,
699+
IJCV 2010.
700+
.. [Davis2006] J. Davis, M. Goadrich, `The Relationship Between Precision-Recall and ROC Curves
701+
<http://www.machinelearning.org/proceedings/icml2006/030_The_Relationship_Bet.pdf>`_,
702+
ICML 2006.
703+
.. [Flach2015] P.A. Flach, M. Kull, `Precision-Recall-Gain Curves: PR Analysis Done Right
704+
<http://papers.nips.cc/paper/5867-precision-recall-gain-curves-pr-analysis-done-right.pdf>`_,
705+
NIPS 2015.
706+
707+
675708
Binary classification
676709
^^^^^^^^^^^^^^^^^^^^^
677710

examples/model_selection/plot_precision_recall.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -61,16 +61,21 @@
6161
in the threshold considerably reduces precision, with only a minor gain in
6262
recall.
6363
64-
**Average precision** summarizes such a plot as the weighted mean of precisions
65-
achieved at each threshold, with the increase in recall from the previous
66-
threshold used as the weight:
64+
**Average precision** (AP) summarizes such a plot as the weighted mean of
65+
precisions achieved at each threshold, with the increase in recall from the
66+
previous threshold used as the weight:
6767
6868
:math:`\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n`
6969
< 57AE code>7070
where :math:`P_n` and :math:`R_n` are the precision and recall at the
7171
nth threshold. A pair :math:`(R_k, P_k)` is referred to as an
7272
*operating point*.
7373
74+
AP and the trapezoidal area under the operating points
75+
(:func:`sklearn.metrics.auc`) are common ways to summarize a precision-recall
76+
curve that lead to different results. Read more in the
77+
:ref:`User Guide <precision_recall_f_measure_metrics>`.
78+
7479
Precision-recall curves are typically used in binary classification to study
7580
the output of a classifier. In order to extend the precision-recall curve and
7681
average precision to multi-class or multi-label classification, it is necessary
@@ -144,7 +149,7 @@< B41A /div>
144149
plt.ylabel('Precision')
145150
plt.ylim([0.0, 1.05])
146151
plt.xlim([0.0, 1.0])
147-
plt.title('2-class Precision-Recall curve: AUC={0:0.2f}'.format(
152+
plt.title('2-class Precision-Recall curve: AP={0:0.2f}'.format(
148153
average_precision))
149154

150155
###############################################################################
@@ -215,7 +220,7 @@
215220
plt.ylim([0.0, 1.05])
216221
plt.xlim([0.0, 1.0])
217222
plt.title(
218-
'Average precision score, micro-averaged over all classes: AUC={0:0.2f}'
223+
'Average precision score, micro-averaged over all classes: AP={0:0.2f}'
219224
.format(average_precision["micro"]))
220225

221226
###############################################################################

sklearn/metrics/ranking.py

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,9 @@ def auc(x, y, reorder=False):
4040
"""Compute Area Under the Curve (AUC) using the trapezoidal rule
4141
4242
This is a general function, given points on a curve. For computing the
43-
area under the ROC-curve, see :func:`roc_auc_score`.
43+
area under the ROC-curve, see :func:`roc_auc_score`. For an alternative
44+
way to summarize a precision-recall curve, see
45+
:func:`average_precision_score`.
4446
4547
Parameters
4648
----------
@@ -68,7 +70,8 @@ def auc(x, y, reorder=False):
6870
6971
See also
7072
--------
71-
roc_auc_score : Computes the area under the ROC curve
73+
roc_auc_score : Compute the area under the ROC curve
74+
average_precision_score : Compute average precision from prediction scores
7275
precision_recall_curve :
7376
Compute precision-recall pairs for different probability thresholds
7477
"""
@@ -108,6 +111,19 @@ def average_precision_score(y_true, y_score, average="macro",
108111
sample_weight=None):
109112
"""Compute average precision (AP) from prediction scores
110113
114+
AP summarizes a precision-recall curve as the weighted mean of precisions
115+
achieved at each threshold, with the increase in recall from the previous
116+
threshold used as the weight:
117+
118+
.. math::
119+
\\text{AP} = \\sum_n (R_n - R_{n-1}) P_n
120+
121+
where :math:`P_n` and :math:`R_n` are the precision and recall at the nth
122+
threshold [1]_. This implementation is not interpolated and is different
123+
from computing the area under the precision-recall curve with the
124+
trapezoidal rule, which uses linear interpolation and can be too
125+
optimistic.
126+
111127
Note: this implementation is restricted to the binary classification task
112128
or multilabel classification task.
113129
@@ -149,17 +165,12 @@ def average_precision_score(y_true, y_score, average="macro",
149165
References
150166
----------
151167
.. [1] `Wikipedia entry for the Average precision
152-
<http://en.wikipedia.org/wiki/Average_precision>`_
153-
.. [2] `Stanford Information Retrieval book
154-
<http://nlp.stanford.edu/IR-book/html/htmledition/
155-
evaluation-of-ranked-retrieval-results-1.html>`_
156-
.. [3] `The PASCAL Visual Object Classes (VOC) Challenge
157-
<http://citeseerx.ist.psu.edu/viewdoc/
158-
download?doi=10.1.1.157.5766&rep=rep1&type=pdf>`_
168+
<http://en.wikipedia.org/w/index.php?title=Information_retrieval&
169+
oldid=793358396#Average_precision>`_
159170
160171
See also
161172
--------
162-
roc_auc_score : Area under the ROC curve
173+
roc_auc_score : Compute the area under the ROC curve
163174
164175
precision_recall_curve :
165176
Compute precision-recall pairs for different probability thresholds
@@ -190,7 +201,8 @@ def _binary_uninterpolated_average_precision(
190201

191202

192203
def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
193-
"""Compute Area Under the Curve (AUC) from prediction scores
204+
"""Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC)
205+
from prediction scores.
194206
195207
Note: this implementation is restricted to the binary classification task
196208
or multilabel classification task in label indicator format.
@@ -239,7 +251,7 @@ def roc_auc_score(y_true, y_score, average="macro", sample_weight=None):
239251
--------
240252
average_precision_score : Area under the precision-recall curve
241253
242-
roc_curve : Compute Receiver operating characteristic (ROC)
254+
roc_curve : Compute Receiver operating characteristic (ROC) curve
243255
244256
Examples
245257
--------
@@ -396,6 +408,12 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None,
396408
Increasing thresholds on the decision function used to compute
397409
precision and recall.
398410
411+
See also
412+
--------
413+
average_precision_score : Compute average precision from prediction scores
414+
415+
roc_curve : Compute Receiver operating characteristic (ROC) curve
416+
399417
Examples
400418
--------
401419
>>> import numpy as np
@@ -477,7 +495,7 @@ def roc_curve(y_true, y_score, pos_label=None, sample_weight=None,
477495
478496
See also
479497
--------
480-
roc_auc_score : Compute Area Under the Curve (AUC) from prediction scores
498+
roc_auc_score : Compute the area under the ROC curve
481499
482500
Notes
483501
-----

0 commit comments

Comments
 (0)
110
0