[MRG+1] Label ranking average precision #2804

arjoly · 2014-01-31T09:31:17Z

The goal of this pull request is to add a first metric for multilabel ranking problem "label ranking average precision". The definition can be found in Mining multilabel data at page 14 or in the documentation.

For the moment, I decided to add a new function instead of a new possible average value to the current average_precision_score. But this could change. I am also open to suggestions for a shorter name.

arjoly · 2014-01-31T11:06:40Z

The remaining multilabel ranking metrics (one_error, coverageand ranking_loss) will follow after the merge of this one.

arjoly · 2014-01-31T11:29:07Z

Lastly, ties are handled correctly thanks to the definition.

arjoly · 2014-03-31T07:16:02Z

Rebased on top of master

jnothman · 2014-04-03T01:55:04Z

you have forgotten to import bincount (from sklearn.utils.fixes, I presume)

arjoly · 2014-04-03T06:05:58Z

We don't need to import from sklearn.utils.fixes. We support sufficiently recent numpy.

I fix the missing import. Thanks @jnothman !

coveralls · 2014-04-03T06:13:33Z

Coverage remained the same when pulling 4d9b730 on arjoly:lrap into fbe974b on scikit-learn:master.

arjoly · 2014-04-23T17:55:54Z

rebase on top of master :-)

jnothman · 2014-04-24T06:34:27Z

doc/modules/model_evaluation.rst

+...............................
+The :func:`label_ranking_average_precision_score` function
+implements the label ranking average precision (AP), which is also simply
+called average precision. It averages over each


Given this terminology, you should note the relationship between this and average_precision_score

Good point!

jnothman · 2014-04-25T03:35:52Z

More terminology things:

in information retrieval this is called Mean Average Precision
the use of "relevant labels" seems a bit odd. "relevant" applies to information retrieval, but in multi-label classification aren't we talking about "true labels"?

arjoly · 2014-04-25T08:41:16Z

Note to myself use rankdata from scipy if possible as in arjoly#4.

jnothman · 2014-04-26T09:30:37Z

Note to myself use rankdata from scipy if possible as in arjoly#4.

If you choose not to use rankdata, I would like to see your bincount approach refactored to an appropriately named function (rankdata_max?) anyway.

arjoly · 2014-04-29T12:41:26Z

in information retrieval this is called Mean Average Precision

I can switch the name to mean average precision. However, this doesn't clearly differentiate this metrics from the other average precision (micro, macro, sample).

the use of "relevant labels" seems a bit odd. "relevant" applies to information retrieval, but in multi-label classification aren't we talking about "true labels"?

Personally, I am fine with both.

jnothman · 2014-04-30T12:07:07Z

in information retrieval this is called Mean Average Precision

It looks like I'm wrong about this. The metrics differ. That's actually just the average of the per-sample area under the PR curve (see Wikipedia). It's no longer clear how this metric relates to that definition, which does not use a rank transformation.

arjoly · 2014-05-01T12:53:34Z

This is also called mean average precision in multilabel ranking. But not sure it helps.

jnothman · 2014-05-01T13:56:36Z

Huh? I might have thought, then, that average_precision_score(y_true, y_score) == label_ranking_average_precision_score([y_true], [y_score]), but it isn't. For example, with a single nonzero in y_true, LRAP is equivalent to reciprocal rank; average precision returns half that score.

vene · 2014-07-16T08:44:07Z

sklearn/metrics/metrics.py

+def label_ranking_average_precision_score(y_true, y_score):
+    """Compute ranking-based average precision
+
+    For each sample, ranking-based average precision average over


i think the second average should be 'averages'?

vene · 2014-07-16T10:10:05Z

I support @jnothman's comment about the terminology 'relevant labels'. True labels also sounds kind of awkward, so maybe rephrase somehow.

The relationship with Mean Average Precision and the semantic ambiguity should be made explicit.

Once this is addressed I'm 👍, the tests are very convincing.

arjoly · 2014-07-17T09:26:03Z

The relationship with Mean Average Precision and the semantic ambiguity should be made explicit.

I have remove the mention to it.

arjoly · 2014-07-17T09:27:01Z

I support @jnothman's comment about the terminology 'relevant labels'. True labels also sounds kind of awkward, so maybe rephrase somehow.

I am open to suggestions. :-)

coveralls · 2014-07-17T09:35:47Z

Coverage increased (+0.02%) when pulling bb93ede on arjoly:lrap into 5b247f9 on scikit-learn:master.

coveralls · 2014-07-17T09:43:18Z

Coverage increased (+0.02%) when pulling bb93ede on arjoly:lrap into 5b247f9 on scikit-learn:master.

vene · 2014-07-17T11:54:43Z

sklearn/metrics/metrics.py

+    equal to the label r divided by the the number of labels with scores
+    higher or equal to the label r. The final score is obtained by averaging
+    over the samples. A label with higher score is thus considered as having
+    better rank.


Let me give this a shot:

Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score.

I would add a link to the paper in the docstring of this function too. Maybe the formula too (but i'm -1 since the formula itself is very trivial, but the denom and num have to be explained in words)

vene · 2014-07-17T12:19:20Z

Thanks for prettifying the backport.
Your tests seem very strong and convincing to me, so I won't go through each test case to try it out by hand (I'm sure my mental arithmetic wouldn't pass even basic unit tests.)

I was thinking of adding this to an example, but after discussing I gave up on that idea. It's a good todo for the future to try to come up with an evocative example for multilabel metrics. Until then, as soon as you consider (and modify or discard) my rephrasing suggestion, I think this is ready 👍

…ant labels

arjoly · 2014-07-17T15:15:25Z

Thanks for your help in the doc!

arjoly · 2014-07-18T08:14:19Z

@vene, I have added your +1 in the title.

arjoly · 2014-07-18T13:22:04Z

A last +1?

amueller · 2014-07-18T13:31:01Z

sklearn/metrics/metrics.py

+    for i in range(n_samples):
+        relevant = y_true[i].nonzero()[0]
+
+        # No relevant label, so we will have to sum over zero element


If would either put the comments into the if or start it with "if"

amueller · 2014-07-18T13:48:09Z

looks good but I didn't check the tests yet ;)

amueller · 2014-07-18T14:10:23Z

lgtm

coveralls · 2014-07-19T10:04:54Z

Coverage increased (+0.09%) when pulling b0995f9 on arjoly:lrap into 5b247f9 on scikit-learn:master.

arjoly · 2014-07-19T10:05:35Z

Travis is happy :-) I am going to merge

[MRG+1] Label ranking average precision

arjoly · 2014-07-19T10:08:02Z

I have updated the what's new.

vene · 2014-07-19T13:04:48Z

Thanks!

arjoly · 2014-07-19T13:13:48Z

thanks for the review :-)

jnothman reviewed Apr 24, 2014
View reviewed changes

vene reviewed Jul 16, 2014
View reviewed changes

arjoly added 4 commits July 17, 2014 11:09

ENH add label ranking average precision

97c3c89

DOC write narrative doc for label ranking average precision

35c9018

DOC FIX error + wording

9d8d774

TST invariance testing + handle degenerate case

1c24e73

DOC remove confusing mention to mean average precision

bb93ede

vene reviewed Jul 17, 2014
View reviewed changes

DOC improve documentatino thanks to @vene and remove mention of relev…

c191771 9E88

…ant labels

arjoly changed the title ~~[MRG] Label ranking average precision~~ [MRG+1] Label ranking average precision Jul 18, 2014

amueller reviewed Jul 18, 2014
View reviewed changes

arjoly added 4 commits July 18, 2014 17:06

DOC more intuition about corner case

24a0cdc

DOC add documentation to backported function

56bba96

ENH less nested code

dc482a2

FIX encoding issue

b0995f9

arjoly added a commit that referenced this pull request Jul 19, 2014

Merge pull request #2804 from arjoly/lrap

bd1686b

[MRG+1] Label ranking average precision

arjoly merged commit bd1686b into scikit-learn:master Jul 19, 2014

arjoly mentioned this pull request Jul 19, 2014

TST: fix tests on numpy 1.9.b2 #3446

Closed

arjoly deleted the lrap branch October 21, 2014 15:35

Uh oh!

[MRG+1] Label ranking average precision #2804

[MRG+1] Label ranking average precision #2804

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!