8000 [WIP] Eleven point average precision by GaelVaroquaux · Pull Request #9091 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[WIP] Eleven point average precision #9091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

GaelVaroquaux
Copy link
Member

Continuation of #9017 (itself a continuation of #7356).

Adds an eleven-point interpolation method for average precision, as described in the IR book.

@GaelVaroquaux
Copy link
Member Author

I will not finish this PR, as I have no need for the eleven point. I am just doing the PR so that the code doesn't get lost. @ndingwall, if you're interest, please do take over.

@vene
Copy link
Member
vene commented Jun 9, 2017

I wrote a simpler reference implementation which does not always agree with the one here, see this gist.

@varunagrawal
Copy link
Contributor

@amueller @jnothman @ndingwall can we please make some progress on this PR? This issue has gotten a bit out of hand at this point.

@jnothman
Copy link
Member
jnothman commented Feb 25, 2018 via email

@varunagrawal
Copy link
Contributor

@jnothman the gist seems to be a to-the-point implementation of the AP metric defined in the Pascal VOC paper. I'd like to hear comments from @ndingwall.

@ndingwall
Copy link
Contributor
ndingwall commented Feb 26, 2018

I've took a look at @vene's gist and the discrepancies seem to come from weird behavior of np.arange. I can reproduce @vene's results when I run the gist locally, but if I replace the list on line 13 with np.arange(0, 0.1, 1.1) then the differences are all zero.

To check what was going on, I printed the out list(np.arange(0, 0.1, 1.1)) and got [0.0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6000000000000001, 0.7000000000000001, 0.8, 0.9, 1.0]. To double check this was causing the problem, I verified that e.g. 0.3 appears as a recall value for one of the examples that had nonzero differences, but not for an example that had a difference of zero, and indeed this was the case.

Fixing that is probably out of scope for this PR!

@ndingwall
Copy link
Contributor

I don't like interpolated average precision because it seems like cheating to me (as discussed in #7356); I only included it in the previous incarnation of this PR since it was an easy addition to a function I was already rewriting and might have satisfied #4577.

But as I noted in #7356 (comment), the Pascal VOC evaluation code doesn't actually use 11-point AP, and it's not clear whether we should be implementing something that agrees with their code, or something that agrees with their definition (on page 11 of the challenge documentation), or both. There's discussion of that in #4577.

In any case, given my views about p_interp being cheating, I'm not enthusiastic to implement it and so I'd be happy to close this PR. If there's strong consensus that sklearn contain an implementation, I'd be in favour of a separate function that makes explicit that it's intended to compute the two versions of the Pascal VOC evaluation. Then nobody would risk comparing AP to interpolated AP.

@varunagrawal
Copy link
Contributor
varunagrawal commented Feb 26, 2018

@ndingwall speaking on behalf of the Computer Vision community, the VOC AP is still a widely used metric for reporting average precision. Here's the kicker: I took a look at Ross Girshick's AP code and the comments state that the interpolated version is the correct one and not the 11 pt version. You can see it here.

That being said, I think it would be immensely useful for scikit-learn to implement a Pascal VOC Average Precision metric since most modern workflows use Python. I agree with your point that having a separate function for VOC specific Average Precision computation (with clarifying documentation) would be the right thing to do.

I am of course willing to help see this through, both in terms of code as well as management.

@varunagrawal
Copy link
Contributor

There's also an old issue from @karpathy dealing with this. I guess this would help corroborate the need for the VOC AP metric in sklearn.

@ndingwall
Copy link
Contributor
ndingwall commented Feb 26, 2018

@varunagrawal that sounds good. I have implementations of both 11-point interpolation and the interpolation you linked to if you want them. 11-point is in the existing PR, and the other one is just something like np.mean([precision[i:].max() for i in range(len(precision))]) (operating on precision after we do precision = precision[-2::-1]). There's likely to be more efficient implementations though since this does a lot of redundant searching for the max over very similar sublists.

EDIT: here's a 5x faster implementation to compute an interpolated AP:

def interpolated_average_precision(y_true, y_scores):
    precision, _, _ = precision_recall_curve(y_true, y_scores)
    p_interp = [0]
    for i in range(len(precision) - 1):
        p_interp.append(max(p_interp[-1], precision[i]))
    return np.mean(p_interp[1:])

@ndingwall
Copy link
Contributor

(Although you should hold off until @amueller / @jnothman / @GaelVaroquaux give their thoughts on making a new function).

@jnothman
Copy link
Member
jnothman commented Feb 26, 2018 via email

@varunagrawal
Copy link
Contributor

@amueller @GaelVaroquaux ping!

@varunagrawal
Copy link
Contributor

Any updates on this? It's been a month since the last activity on this issue.

@jnothman
Copy link
Member
jnothman commented Apr 4, 2018

I don't think you should anticipate @GaelVaroquaux finishing this but you're welcome to open a PR completing the work and try persuade reviewers that it's sufficiently useful and well-being documented given the discussion here

@GaelVaroquaux
Copy link
Member Author
GaelVaroquaux commented Apr 5, 2018 via email

Base automatically changed from master to main January 22, 2021 10:49
@adrinjalali
Copy link
Member

Seems like we're not really very enthusiastic about inclusion of the algorithm in sklearn anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0