-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Label ranking average precision #2804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The remaining multilabel ranking metrics ( |
Lastly, ties are handled correctly thanks to the definition. |
Rebased on top of master |
you have forgotten to import |
We don't need to import from I fix the missing import. Thanks @jnothman ! |
rebase on top of master :-) |
............................... | ||
The :func:`label_ranking_average_precision_score` function | ||
implements the label ranking average precision (AP), which is also simply | ||
called average precision. It averages over each |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this terminology, you should note the relationship between this and average_precision_score
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!
More terminology things:
|
Note to myself use rankdata from scipy if possible as in arjoly#4. |
If you choose not to use |
I can switch the name to mean average precision. However, this doesn't clearly differentiate this metrics from the other average precision (micro, macro, sample).
Personally, I am fine with both. |
It looks like I'm wrong about this. The metrics differ. That's actually just the average of the per-sample area under the PR curve (see Wikipedia). It's no longer clear how this metric relates to that definition, which does not use a rank transformation. |
This is also called mean average precision in multilabel ranking. But not sure it helps. |
Huh? I might have thought, then, that |
def label_ranking_average_precision_score(y_true, y_score): | ||
"""Compute ranking-based average precision | ||
|
||
For each sample, ranking-based average precision average over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the second average should be 'averages'?
I support @jnothman's comment about the terminology 'relevant labels'. True labels also sounds kind of awkward, so maybe rephrase somehow. The relationship with Mean Average Precision and the semantic ambiguity should be made explicit. Once this is addressed I'm 👍, the tests are very convincing. |
I have remove the mention to it. |
I am open to suggestions. :-) |
equal to the label r divided by the the number of labels with scores | ||
higher or equal to the label r. The final score is obtained by averaging | ||
over the samples. A label with higher score is thus considered as having | ||
better rank. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me give this a shot:
Label ranking average precision (LRAP) is the average over each ground truth label assigned to each sample, of the ratio of true vs. total labels with lower score.
I would add a link to the paper in the docstring of this function too. Maybe the formula too (but i'm -1 since the formula itself is very trivial, but the denom and num have to be explained in words)
Thanks for prettifying the backport. I was thinking of adding this to an example, but after discussing I gave up on that idea. It's a good todo for the future to try to come up with an evocative example for multilabel metrics. Until then, as soon as you consider (and modify or discard) my rephrasing suggestion, I think this is ready 👍 |
Thanks for your help in the doc! |
@vene, I have added your +1 in the title. |
A last +1? |
for i in range(n_samples): | ||
relevant = y_true[i].nonzero()[0] | ||
|
||
# No relevant label, so we will have to sum over zero element |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If would either put the comments into the if or start it with "if"
looks good but I didn't check the tests yet ;) |
lgtm |
Travis is happy :-) I am going to merge |
[MRG+1] Label ranking average precision
I have updated the what's new. |
Thanks! |
thanks for the review :-) |
The goal of this pull request is to add a first metric for multilabel ranking problem "label ranking average precision". The definition can be found in Mining multilabel data at page 14 or in the documentation.
For the moment, I decided to add a new function instead of a new possible average value to the current
average_precision_score
. But this could change. I am also open to suggestions for a shorter name.