8000 Merge pull request #2804 from arjoly/lrap · scikit-learn/scikit-learn@bd1686b · GitHub
[go: up one dir, main page]

Skip to content

Commit bd1686b

Browse files
committed
Merge pull request #2804 from arjoly/lrap
[MRG+1] Label ranking average precision
2 parents 46613e2 + b0995f9 commit bd1686b

File tree

7 files changed

+430
-2
lines changed

7 files changed

+430
-2
lines changed

doc/modules/classes.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -770,6 +770,18 @@ details.
770770
metrics.mean_squared_error
771771
metrics.r2_score
772772

773+
Multilabel ranking metrics
774+
--------------------------
775+
See the :ref:`multilabel_ranking_metrics` section of the user guide for further
776+
details.
777+
778+
.. autosummary::
779+
:toctree: generated/
780+
:template: function.rst
781+
782+
metrics.label_ranking_average_precision_score
783+
784+
773785

774786
Clustering metrics
775787
------------------

doc/modules/model_evaluation.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -900,6 +900,58 @@ In the multilabel case with binary label indicators: ::
900900
elimination with cross-validation.
901901

902902

903+
.. _multilabel_ranking_metrics:
904+
905+
Multilabel ranking metrics
906+
--------------------------
907+
908+
.. currentmodule:: sklearn.metrics
909+
910+
In multilabel learning, each sample can have any number of ground truth labels
911+
associated with it. The goal is to give high scores and better rank to
912+
the ground truth labels.
913+
914+
Label ranking average precision
915+
...............................
916+
The :func:`label_ranking_average_precision_score` function
917+
implements the label ranking average precision (LRAP). This metric is linked to
918+
the :func:`average_precision_score` function, but is based on the notion of
919+
label ranking instead of precision and recall.
920+
921+
Label ranking average precision (LRAP) is the average over each ground truth
922+
label assigned to each sample, of the ratio of true vs. total labels with lower
923+
score. This metric will yield better score if you are able to give better rank
924+
to the labels associated to each sample. The obtained score is always strictly
925+
greater than 0 and the best value is 1. If there is exactly one relevant
926+
label per sample, label ranking average precision is equivalent to the `mean
927+
reciprocal rank <http://en.wikipedia.org/wiki/Mean_reciprocal_rank>`.
928+
929+
Formally, given a binary indicator matrix of the ground truth labels
930+
:math:`y \in \mathcal{R}^{n_\text{samples} \times n_\text{labels}}` and the
931+
score associated to each label
932+
:math:`\hat{f} \in \mathcal{R}^{n_\text{samples} \times n_\text{labels}}`,
933+
the average precision is defined as
934+
935+
.. math::
936+
LRAP(y, \hat{f}) = \frac{1}{n_{\text{samples}}}
937+
\sum_{i=0}^{n_{\text{samples}} - 1} \frac{1}{|y_i|}
938+
\sum_{j:y_{ij} = 1} \frac{|\mathcal{L}_{ij}|}{\text{rank}_{ij}}
939+
940+
941+
with :math:`\mathcal{L}_{ij} = \left\{k: y_{ik} = 1, \hat{f}_{ik} \geq \hat{f}_{ij} \right\}`,
942+
:math:`\text{rank}_{ij} = \left|\left\{k: \hat{f}_{ik} \geq \hat{f}_{ij} \right\}\right|`
943+
and :math:`|\cdot|` is the l0 norm or the cardinality of the set.
944+
945+
Here a small example of usage of this function::
946+
947+
>>> import numpy as np
948+
>>> from sklearn.metrics import label_ranking_average_precision_score
949+
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]])
950+
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
951+
>>> label_ranking_average_precision_score(y_true, y_score) # doctest: +ELLIPSIS
952+
0.416...
953+
954+
903955
.. _regression_metrics:
904956

905957
Regression metrics

sklearn/metrics/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
hamming_loss,
1616
hinge_loss,
1717
jaccard_similarity_score,
18+
label_ranking_average_precision_score,
1819
log_loss,
1920
matthews_corrcoef,
2021
mean_squared_error,
@@ -72,6 +73,7 @@
7273
'homogeneity_completeness_v_measure',
7374
'homogeneity_score',
7475
'jaccard_similarity_score',
76+
'label_ranking_average_precision_score',
7577
'log_loss',
7678
'matthews_corrcoef',
7779
'mean_squared_error',

sklearn/metrics/metrics.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
from ..utils.multiclass import unique_labels
3737
from ..utils.multiclass import type_of_target
3838
from ..utils.fixes import isclose
39+
from ..utils.stats import rankdata
3940

4041

4142
###############################################################################
@@ -2144,6 +2145,75 @@ def hamming_loss(y_true, y_pred, classes=None):
21442145
raise ValueError("{0} is not supported".format(y_type))
21452146

21462147

2148+
def label_ranking_average_precision_score(y_true, y_score):
2149+
"""Compute ranking-based average precision
2150+
2151+
Label ranking average precision (LRAP) is the average over each ground
2152+
truth label assigned to each sample, of the ratio of true vs. total
2153+
labels with lower score.
2154+
2155+
This metric is used in multilabel ranking problem, where the goal
2156+
is to give better rank to the labels associated to each sample.
2157+
2158+
The obtained score is always strictly greater than 0 and
2159+
the best value is 1.
2160+
2161+
Parameters
2162+
----------
2163+
y_true : array, shape = [n_samples, n_labels]
2164+
True binary labels in binary indicator format.
2165+
2166+
y_score : array, shape = [n_samples, n_labels]
2167+
Target scores, can either be probability estimates of the positive
2168+
class, confidence values, or binary decisions.
2169+
2170+
Return
2171+
------
2172+
score : float
2173+
2174+
Examples
2175+
--------
2176+
>>> import numpy as np
2177+
>>> from sklearn.metrics import label_ranking_average_precision_score
2178+
>>> y_true = np.array([[1, 0, 0], [0, 0, 1]])
2179+
>>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]])
2180+
>>> label_ranking_average_precision_score(y_true, y_score) \
2181+
# doctest: +ELLIPSIS
2182+
0.416...
2183+
2184+
"""
2185+
y_true, y_score = check_arrays(y_true, y_score)
2186+
2187+
if y_true.shape != y_score.shape:
2188+
raise ValueError("y_true and y_score have different shape")
2189+
2190+
# Handle badly formated array and the degenerate case with one label
2191+
y_type = type_of_target(y_true)
2192+
if (y_type != "multilabel-indicator"
2193+
and not (y_type == "binary" and y_true.ndim == 2)):
2194+
raise ValueError("{0} format is not supported".format(y_type))
2195+
2196+
n_samples, n_labels = y_true.shape
2197+
2198+
out = 0.
2199+
for i in range(n_samples):
2200+
relevant = y_true[i].nonzero()[0]
2201+
2202+
if (relevant.size == 0 or relevant.size == n_labels):
2203+
# If all labels are relevant or unrelevant, the score is also
2204+
# equal to 1. The label ranking has no meaning.
2205+
out += 1.
2206+
continue
2207+
2208+
scores_i = - y_score[i]
2209+
true_mask = y_true[i].astype(bool)
2210+
rank = rankdata(scores_i, 'max')[true_mask]
2211+
L = rankdata(scores_i[true_mask], 'max')
2212+
out += np.divide(L, rank, dtype=float).mean()
2213+
2214+
return out / n_samples
2215+
2216+
21472217
###############################################################################
21482218
# Regression metrics
21492219
###############################################################################

0 commit comments

Comments
 (0)
0