-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] fixes #4577 adds interpolation to PR curve #4936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
there is no test |
@@ -410,7 +410,19 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None, | |||
# and reverse the outputs so recall is decreasing | |||
last_ind = tps.searchsorted(tps[-1]) | |||
sl = slice(last_ind, None, -1) | |||
return np.r_[precision[sl], 1], np.r_[recall[sl], 0], thresholds[sl] | |||
|
|||
if interpolate is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to do
if interpolate
....
else:
....
@agramfort @arjoly I'll add a unit test and update the documentation and pull in another request, soon |
You can also update this pull request by pushing your changes. |
pushed changes to the pull request. |
prec[i] = p_temp | ||
else: | ||
p_temp = prec[i] | ||
return prec, np.r_[recall[sl], 0], thresholds[sl] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you use the interpolate module from scipy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interpolation refers to the issue #4577. The interpolation logic has been taken from http://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. This is ok. I am just asking why not using:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because that does something differently, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scipy.interpolate function is used to perform curve fitting on a set of data points. Here, we don't need to perform curve fitting. all we need is to ensure that the value of precision does not fall, for decreasing values of recall.
my question in #4577 applies for this pr. |
also can you extend this page with this new feature? http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html |
should we make this the default in the future? That is set the default to None and raise a ChangedBehaviorWarning? |
can you add a reference to the IR book in the docstring maybe? |
added reference to the IR book in the docstring. Are we looking at making interpolation the default behaviour? |
please update the example that demos PR curve. |
@agramfort updated example, with some text on interpolation. |
@@ -64,6 +64,17 @@ | |||
a precision-recall curve by considering each element of the label indicator | |||
matrix as a binary prediction (micro-averaging). | |||
|
|||
Precision recall curves tend to have large jitters because increasing the value | |||
of threshold over a small range while reduces the recall too by a small amount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by a too small amount?
travis is not happy. |
@agramfort travis is returning an error because in the updated doc string the reference to the IR book includes the author "Schütze" and 'ü' is a non ascii character, I thought of changing it to 'u'. But was unsure if that would be correct. Here is the error output from the travis build: ERROR: Failure: SyntaxError (Non-ASCII character '\xc3' in file /home/travis/build/scikit-learn/scikit-learn/sklearn/metrics/ranking.py on line 392, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (ranking.py, line 391)) |
add -- coding: utf-8 --on top of your file. |
@agramfort travis is returning an error because in the updated doc string the
reference to the IR book includes the author "Schütze" and 'ü' is a non ascii
character, I thought of changing it to 'u'. But was unsure if that would be
correct.
I think that it's better to do this. I am worried about creeping UTF-8
characters in source code --and my name is written Gaël :)--
|
@@ -383,6 +386,12 @@ def precision_recall_curve(y_true, probas_pred, pos_label=None, | |||
Increasing thresholds on the decision function used to compute | |||
precision and recall. | |||
|
|||
References | |||
---------- | |||
.. [1] Manning, C. D., Raghavan, P., & Schütze, H. (2008). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where he's from but if he's German you can write Schuetze. I don't think anyone would kill you for Schutze, though.
@GaelVaroquaux @agramfort how do we feel about changing the default behavior? |
@GaelVaroquaux @agramfort how do we feel about changing the default behavior?
I feel quite OK about it, but I think that we need to go through the
standard deprecation cycle ( :( ). This is important to not introduce
unexpected changes in behavior.
|
ok let's do that. |
@amueller @GaelVaroquaux @agramfort |
Do you know how deprecations work? |
@amueller I have added a deprecation warning. Request Review. |
Thanks. Sorry, I'm a bit too busy to review at the moment :-/ hopefully someone else finds the time. |
👍 |
Hi, this has been pending for a long time! |
Thanks for the ping. Sorry, I've been swamped and it seems no-one else had time. |
Could you reference the issue in your PR description? Refer this |
@raghavrv i guess the github doesnt allow me to update a PR description from the web interface. however stack overflow suggests that the github api allows it. Weird. 😕 I'll take care of that next time anyhow. |
@raghavrv this could probbly go in the most dedicated github replies in history. :D |
@raghavrv is pretty dedicated indeed ;) |
Haha! Thanks :D |
fixes #4577 Added boolean parameter 'interpolated' to sklearn.metrics.precision_recall_curve(). Returns an interpolated, de-noised precision score, if True