8000 [MRG+1] Label ranking average precision by arjoly · Pull Request #2804 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Label ranking average precision #2804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jul 19, 2014
Merged

Conversation

arjoly
Copy link
Member
@arjoly arjoly commented Jan 31, 2014

The goal of this pull request is to add a first metric for multilabel ranking problem "label ranking average precision". The definition can be found in Mining multilabel data at page 14 or in the documentation.

For the moment, I decided to add a new function instead of a new possible average value to the current average_precision_score. But this could change. I am also open to suggestions for a shorter name.

@arjoly
Copy link
Member Author
arjoly commented Jan 31, 2014

The remaining multilabel ranking metrics (one_error, coverageand ranking_loss) will follow after the merge of this one.

@arjoly
Copy link
Member Author
arjoly commented Jan 31, 2014

Lastly, ties are handled correctly thanks to the definition.

@arjoly
Copy link
Member Author
arjoly commented Mar 31, 2014

Rebased on top of master

@jnothman
Copy link
Member
jnothman commented Apr 3, 2014

you have forgotten to import bincount (from sklearn.utils.fixes, I presume)

@arjoly
Copy link
Member Author
arjoly commented Apr 3, 2014

We don't need to import from sklearn.utils.fixes. We support sufficiently recent numpy.

I fix the missing import. Thanks @jnothman !

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 4d9b730 on arjoly:lrap into fbe974b on scikit-learn:master.

@arjoly
Copy link
Member Author
arjoly commented Apr 23, 2014

rebase on top of master :-)

...............................
The :func:`label_ranking_average_precision_score` function
implements the label ranking average precision (AP), which is also simply
called average precision. It averages over each
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this terminology, you should note the relationship between this and average_precision_score

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

@jnothman
Copy link
Member

More terminology things:

  • in information retrieval this is called Mean Average Precision
  • the use of "relevant labels" seems a bit odd. "relevant" applies to information retrieval, but in multi-label classification aren't we talking about "true labels"?

@arjoly
Copy link
Member Author
arjoly commented Apr 25, 2014

Note to myself use rankdata from scipy if possible as in arjoly#4.

@jnothman
Copy link
Member

Note to myself use rankdata from scipy if possible as in arjoly#4.

If you choose not to use rankdata, I would like to see your bincount approach refactored to an appropriately named function (rankdata_max?) anyway.

@arjoly
Copy link
Member Author
arjoly commented Apr 29, 2014

in information retrieval this is called Mean Average Precision

I can switch the name to mean average precision. However, this doesn't clearly differentiate this metrics from the other average precision (micro, macro, sample).

the use of "relevant labels" seems a bit odd. "relevant" applies to information retrieval, but in multi-label classification aren't we talking about "true labels"?

Personally, I am fine with both.

@jnothman
Copy link
Member

in information retrieval this is called Mean Average Precision

It looks like I'm wrong about this. The metrics differ. That's actually just the average of the per-sample area under the PR curve (see Wikipedia). It's no longer clear how this metric relates to that definition, which does not use a rank transformation.

@arjoly
Copy link
Member Author
arjoly commented May 1, 2014

This is also called mean average precision in multilabel ranking. But not sure it helps.

@jnothman
Copy link
Member
jnothman commented May 1, 2014

Huh? I might have thought, then, that average_precision_score(y_true, y_score) == label_ranking_average_precision_score([y_true], [y_score]), but it isn't. For example, with a single nonzero in y_true, LRAP is equivalent to reciprocal rank; average precision returns half that score.

def label_ranking_average_precision_score(y_true, y_score):
"""Compute ranking-based average precision

For each sample, ranking-based average precision average over
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the second average should be 'averages'?

@vene
Copy link
Member
vene commented Jul 16, 2014

I support @jnothman's comment about the terminology 'relevant labels'. True labels also sounds kind of awkward, so maybe rephrase somehow.

The relationship with Mean Average Precision and the semantic ambiguity should be made explicit.

Once this is addressed I'm 👍, the tests are very convincing.

@arjoly
Copy link
Member Author
arjoly commented Jul 17, 2014

The relationship with Mean Average Precision and the semantic ambiguity should be made explicit.

I have remove the mention to it.

@arjoly
Copy link
Member Author
arjoly commented Jul 17, 2014

I support @jnothman's comment about the terminology 'relevant labels'. True labels also sounds kind of awkward, so maybe rephrase somehow.

I am open to suggestions. :-)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) when pulling bb93ede on arjoly:lrap into 5b247f9 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.02%) when pulling bb93ede on arjoly:lrap into 5b247f9 on scikit-learn:master.

equal to the label r divided by the the number of labels with scores
higher or equal to the label r. The final score is obtained by averaging
over the samples. A label with higher score is thus considered as having
better rank.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give this a shot:

Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score.

I would add a link to the paper in the docstring of this function too. Maybe the formula too (but i'm -1 since the formula itself is very trivial, but the denom and num have to be explained in words)

@vene
Copy link
Member
vene commented Jul 17, 2014

Thanks for prettifying the backport.
Your tests seem very strong and convincing to me, so I won't go through each test case to try it out by hand (I'm sure my mental arithmetic wouldn't pass even basic unit tests.)

I was thinking of adding this to an example, but after discussing I gave up on that idea. It's a good todo for the future to try to come up with an evocative example for multilabel metrics. Until then, as soon as you consider (and modify or discard) my rephrasing suggestion, I think this is ready 👍

@arjoly
Copy link
Member Author
arjoly commented Jul 17, 2014

Thanks for your help in the doc!

@arjoly
Copy link
Member Author
arjoly commented Jul 18, 2014

@vene, I have added your +1 in the title.

@arjoly arjoly changed the title [MRG] Label ranking average precision [MRG+1] Label ranking average precision Jul 18, 2014
@arjoly
Copy link
Member Author
arjoly commented Jul 18, 2014

A last +1?

for i in range(n_samples):
relevant = y_true[i].nonzero()[0]

# No relevant label, so we will have to sum over zero element
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If would either put the comments into the if or start it with "if"

< F438 p class="ml-1 mb-2 mt-2" data-show-on-error hidden> Sorry, something went wrong.

@amueller
Copy link
Member

looks good but I didn't check the tests yet ;)

@amueller
Copy link
Member

lgtm

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) when pulling b0995f9 on arjoly:lrap into 5b247f9 on scikit-learn:master.

@arjoly
Copy link
Member Author
arjoly commented Jul 19, 2014

Travis is happy :-) I am going to merge

arjoly added a commit that referenced this pull request Jul 19, 2014
[MRG+1] Label ranking average precision
@arjoly arjoly merged commit bd1686b into scikit-learn:master Jul 19, 2014
@arjoly
Copy link
Member Author
arjoly commented Jul 19, 2014

I have updated the what's new.

@vene
Copy link
Member
vene commented Jul 19, 2014

Thanks!

@arjoly
Copy link
Member Author
arjoly commented Jul 19, 2014

thanks for the review :-)

@arjoly arjoly deleted the lrap branch October 21, 2014 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0