8000 [MRG] fixes #4577 adds interpolation to PR curve by chiragnagpal · Pull Request #4936 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] fixes #4577 adds interpolation to PR curve #4936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

[MRG] fixes #4577 adds interpolation to PR curve #4936

wants to merge 6 commits into from

Conversation

chiragnagpal
Copy link

fixes #4577 Added boolean parameter 'interpolated' to sklearn.metrics.precision_recall_curve(). Returns an interpolated, de-noised precision score, if True

@agramfort
Copy link
Member

there is no test

@@ -410,7 +410,19 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None,
# and reverse the outputs so recall is decreasing
last_ind = tps.searchsorted(tps[-1])
sl = slice(last_ind, None, -1)
return np.r_[precision[sl], 1], np.r_[recall[sl], 0], thresholds[sl]

if interpolate is False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to do

if interpolate
    ....
else:
    ....

@chiragnagpal
Copy link
Author

@agramfort @arjoly I'll add a unit test and update the documentation and pull in another request, soon

@arjoly
Copy link
Member
arjoly commented Jul 8, 2015

You can also update this pull request by pushing your changes.

@chiragnagpal
Copy link
Author

pushed changes to the pull request.

prec[i] = p_temp
else:
p_temp = prec[i]
return prec, np.r_[recall[sl], 0], thresholds[sl]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you use the interpolate module from scipy?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interpolation refers to the issue #4577. The interpolation logic has been taken from http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because that does something differently, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scipy.interpolate function is used to perform curve fitting on a set of data points. Here, we don't need to perform curve fitting. all we need is to ensure that the value of precision does not fall, for decreasing values of recall.

@arjoly
Copy link
Member
arjoly commented Jul 9, 2015

my question in #4577 applies for this pr.

@agramfort
Copy link
Member

also can you extend this page with this new feature?

http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html

8000

@amueller
Copy link
Member

should we make this the default in the future? That is set the default to None and raise a ChangedBehaviorWarning?

@amueller
Copy link
Member

can you add a reference to the IR book in the docstring maybe?

@chiragnagpal
Copy link
Author

added reference to the IR book in the docstring. Are we looking at making interpolation the default behaviour?

@agramfort
Copy link
Member

please update the example that demos PR curve.

@chiragnagpal
Copy link
Author

@agramfort updated example, with some text on interpolation.

@@ -64,6 +64,17 @@
a precision-recall curve by considering each element of the label indicator
matrix as a binary prediction (micro-averaging).

Precision recall curves tend to have large jitters because increasing the value
of threshold over a small range while reduces the recall too by a small amount,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by a too small amount?

@agramfort
Copy link
Member

travis is not happy.

@chiragnagpal
Copy link
Author

@agramfort travis is returning an error because in the updated doc string the reference to the IR book includes the author "Schütze" and 'ü' is a non ascii character, I thought of changing it to 'u'. But was unsure if that would be correct.

Here is the error output from the travis build:

ERROR: Failure: SyntaxError (Non-ASCII character '\xc3' in file /home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/ranking.py on line 392, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (ranking.py, line 391))

@agramfort
Copy link
Member

add

-- coding: utf-8 --

on top of your file.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Jul 24, 2015 via email

@@ -383,6 +386,12 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None,
Increasing thresholds on the decision function used to compute
precision and recall.

References
----------
.. [1] Manning, C. D., Raghavan, P., & Schütze, H. (2008).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where he's from but if he's German you can write Schuetze. I don't think anyone would kill you for Schutze, though.

@amueller
Copy link
Member

@GaelVaroquaux @agramfort how do we feel about changing the default behavior?

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Jul 31, 2015 via email

@amueller
Copy link
Member

ok let's do that.

@chiragnagpal
Copy link
Author

@amueller @GaelVaroquaux @agramfort
sorry guys, I know this ones long overdue, I was running against a poster deadline.
Interpolation is still not the default behavior. Let me know if that is fine.

@amueller
Copy link
Member
amueller commented Aug 4, 2015

Do you know how deprecations work?
The default behavior should be None and if None is specified, it will use no interpolation, but warn that in the future (version 0.18), interpolation will be the default.

@chiragnagpal
Copy link
Author

@amueller I have added a deprecation warning. Request Review.

@amueller amueller changed the title referencing #4577 adds interpolation to PR curve [MRG] referencing #4577 adds interpolation to PR curve Aug 5, 2015
@amueller
Copy link
Member
amueller commented Aug 5, 2015

Thanks. Sorry, I'm a bit too busy to review at the moment :-/ hopefully someone else finds the time.

@chiragnagpal
Copy link
Author

👍

@chiragnagpal
Copy link
Author

Hi, this has been pending for a long time!

@amueller
Copy link
Member

Thanks for the ping. Sorry, I've been swamped and it seems no-one else had time.

@raghavrv
Copy link
Member

Could you reference the issue in your PR description? Refer this

@chiragnagpal chiragnagpal changed the title [MRG] referencing #4577 adds interpolation to PR curve [MRG] fixes #4577 adds interpolation to PR curve Dec 11, 2015
@chiragnagpal
Copy link
Author
chiragnagpal commented Dec 11, 2015

@raghavrv i guess the github doesnt allow me to update a PR description from the web interface. however stack overflow suggests that the github api allows it. Weird. 😕 I'll take care of that next time anyhow.

@raghavrv
Copy link
Member

You could just press the edit icon on the top right corner of your PR description box ;)

image

Also nevermind... I was just adding a future note kind-of :) Thanks for the PR!

@chiragnagpal
Copy link
Author
chiragnagpal commented Dec 11, 2015

@raghavrv this could probbly go in the most dedicated github replies in history. :D

@amueller
Copy link
Member
amueller commented Dec 11, 2015

@raghavrv is pretty dedicated indeed ;)

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Dec 11, 2015 via email

@raghavrv
Copy link
Member

Haha! Thanks :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Precision Recall numbers computed by Scikits are not interpolated (non-standard)
8 participants
0