-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+2] Deprecate reorder parameter in auc #9851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I would deprecate To be honest I am not sure it is worth adding |
@lesteve Thanks for the instant reply :) |
I would be in favour of removing |
@lesteve Thanks. I have removed parameter tol. auc([2, 1, 3, 4], [4, 5, 6, 7])
# x is neither increasing nor decreasing : [2 1 3 4].
# np.diff(x) contains 2 positive values and 1 negative values. Since x can be either increasing or decreasing, I'm wondering what's the best information to provide. |
Maybe the most negative value and the most positive value of np.diff? This way you can see that his how far your array is from being increasing or decreasing. |
@lesteve Thanks for the instant reply :) I have updated the error message. auc([2.5, 1.6, 3.7, 4.8], [5, 6, 7, 8])
# ValueError: x is neither increasing nor decreasing : [ 2.5 1.6 3.7 4.8].
# np.diff(x) contains 2 positive values and 1 negative values.
# The most positive value in np.diff(x) : x[2] - x[1] = 2.1.
# The most negative value in np.diff(x) : x[1] - x[0] = -0.9. |
@@ -426,7 +427,20 @@ def test_auc_errors(): | |||
assert_raises(ValueError, auc, [0.0], [0.1]) | |||
|
|||
# x is not in order | |||
assert_raises(ValueError, auc, [1.0, 0.0, 0.5], [0.0, 0.0, 0.0]) | |||
error_message = ("x is neither increasing nor decreasing : " | |||
"[ 2.5 1.6 3.7 4.8]. np.diff(x) contains 2 positive " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy 1.14 will change np array repr/str so don't hard-code it like this instead use '{}'.format(your_array)
.
sklearn/metrics/ranking.py
Outdated
"x[{}] = {}. The most negative value in " | ||
"np.diff(x) : x[{}] - x[{}] = {}." | ||
.format(x, (dx > 0).sum(), (dx < 0).sum(), | ||
dx.argmax() + 1, dx.argmax(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define the variables that you reuse outside of the .format
@lesteve Thanks :) Both comments are addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjust the title to remove reference to tol?
sklearn/metrics/ranking.py
Outdated
y : array, shape = [n] | ||
y coordinates. | ||
reorder : boolean, optional (default=False) | ||
reorder : boolean, optional (default=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make the default something semantically meaningful, like 'warn' or 'deprecated', please?
sklearn/metrics/ranking.py
Outdated
@@ -47,13 +47,18 @@ def auc(x, y, reorder=False): | |||
Parameters | |||
---------- | |||
x : array, shape = [n] | |||
x coordinates. | |||
Increasing or decreasing x coordinates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "x coordinates. These must be either monotonic increasing or monotonic decreasing."? Otherwise, this seems vague.
sklearn/metrics/ranking.py
Outdated
"the x array is not increasing: %s" % x) | ||
maxpos = dx.argmax() | ||
minpos = dx.argmin() | ||
raise ValueError("x is neither increasing nor decreasing " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this message is a bit obscure to a reader of the code, more so than to someone who receives the error by surprise. It deserves a comment, perhaps a reference to the issue.
sklearn/metrics/ranking.py
Outdated
raise ValueError("Reordering is not turned on, and " | ||
"the x array is not increasing: %s" % x) | ||
# Output the most positive value and most negative value in | ||
# np.diff(x) for debugging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it meant to handle the case that adjacent values are out of order but only within a small tolerance?
sklearn/metrics/ranking.py
Outdated
minpos = dx.argmin() | ||
raise ValueError("x is neither increasing nor decreasing " | ||
": {}. The most positive value in np.diff(x)" | ||
" : x[{}] - x[{}] = {}. The most negative " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that this error message will most often leave the user bewildered.
@jnothman Thanks :) The error message is suggested by @lesteve here. (In fact, I have slightly shortened it just now.) I don't have strong opinion on this. Seems that it might indeed introduce some confusion. But at least the original error message is not right. So do you think |
Why do we want to allow increasing or decreasing? Why don't we require it to be increasing, and state that there is a negative diff of ...? I think it would be much clearer to the user why this is relevant. |
It's not like we cause great harm to the user by asking them to reverse their inputs. |
@jnothman Thanks. From my perspective, I agree with you. Allow x to be decreasing seems awkward but the implementation is there. So what's your final decision or do you want me to further investigate the origin of the code? |
Yes, okay. So to be sure, there's nothing wrong with providing the user
with an option to reorder...
I'm not entirely convinced of the merits of this PR then. Deprecation and
removal without clear cause breaks someone's code. Why do we feel it is
appropriate/necessary to remove an existing feature (albeit of no need in
the library)?
|
@jnothman Personally, I would prefer to remain reorder parameter. Like the support for decreasing x, they have long been here and we will not need much effort to support them (though I don't think they are useful). The deprecation is proposed by lesteve, you might ping him for more reasons. Or if you are pretty sure, please give a clear instruction and I'll remove the deprecation and leave this PR as a doc improvement :) |
I can wait until @lesteve replies :)
|
Apologies for taking some time to come back to this one. In other words, a way to put this in plots: import matplotlib.pyplot as plt
plt.figure()
plt.plot([0, 1, 1], [0, 0, 1], 'o-')
plt.figure()
# this curve has the two last points swapped
plt.plot([0, 1, 1], [0, 1, 0], 'o-')
plt.show() Before #9786 was merged we would be switching from the first curve to the second curve because of floating point noise in In the case we fixed in #9786, fpr was not increasing only for floating point reasons. In fact we were guaranteed that fpr were in the right order because they were computed with decreasing thresholds. |
@lesteve Thanks a lot for the detailed reply :) |
Codecov Report
@@ Coverage Diff @@
## master #9851 +/- ##
==========================================
- Coverage 96.17% 96.16% -0.01%
==========================================
Files 336 336
Lines 62413 62504 +91
==========================================
+ Hits 60024 60108 +84
- Misses 2389 2396 +7
Continue to review full report at Codecov.
|
I suppose you think users are as foolish as we are, to think that setting
reorder=True might be an answer to their problems. maybe then tweaking the
error message to say "this may be a result of numerical instability in
which case reorder=True is not a recommended solution"
|
ping @lesteve |
"ascending in case of ties" really should be stated as "y has a monotonic
relation to x".
I really don't mind much here. Fine. Get rid of reorder.
…On 11 October 2017 at 13:43, Hanmin Qin ***@***.***> wrote:
ping @lesteve <https://github.com/lesteve>
I think I'll wait for the final decision from @jnothman
<https://github.com/jnothman> and @lesteve <https://github.com/lesteve>.
Personally, after @lesteve <https://github.com/lesteve>'s explanation, I
might prefer to deprecate reorder. It seems worthwhile to force users to
change their code since the potential error might be significant. Also,
another important reason is that, I don't think the reorder parameter is
designed for general case, because sometimes users might don't want to use
y to break ties (like the case below). So it might be better that users
sort x and arrange y themselves to get the result they expect.
[image: index]
<https://user-images.githubusercontent.com/12003569/31419773-40b1c770-ae70-11e7-841a-dd94172a7bb2.png>
Thanks for the the detailed instructions from all of you :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9851 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wwvhpy5bI7UF1r84rSc7bWAlJJmks5srCs_gaJpZM4PoKta>
.
|
@jnothman I have updated a more detailed reason for deprecation. Do you think it is enough for you to remove the |
Sure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, otherwise LGTM.
sklearn/metrics/ranking.py
Outdated
general use) and is no longer used there. What's more, the result | ||
from auc will be significantly influenced if x is sorted | ||
unexpectedly due to slight floating point error (See issue #9786). | ||
Future (and default) behavior is equivalent to `reorder=False`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rst and not markdown so you need double-backticks around reorder=False. Single backtick is italic in rst.
sklearn/metrics/ranking.py
Outdated
@@ -83,8 +94,13 @@ def auc(x, y, reorder=False): | |||
raise ValueError('At least 2 points are needed to compute' | |||
' area under curve, but x.shape = %s' % x.shape) | |||
|
|||
if reorder != 'deprecated': | |||
warnings.warn("The `reorder` parameter has been deprecated " | |||
"in version 0.20 and will be removed in 0.22.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best practices: the warning ideally:
- explains why there is a warning (maybe here the doc + release notes is enough in this case)
- what to do to get rid of it. In this case say it is recommended to not set reorder (same behaviour as reorder=False) and make sure to have x and y ordered (maybe add as the points of a curve. I am not sure how to phrase this better ...)
@lesteve Thanks :) Comments addressed. |
@jnothman any comments on this one? If I don't hear anything I'll merge it in a few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Good stuff @qinhanmin2014! |
Reference Issue
proposed in #9786 by @lesteve
What does this implement/fix? Explain your changes.
(1)add tolerance to roc_auc_score when checking whether x is increasing. (discarded)
(2)improve error message when x is neither increasing nor decreasing.
(3)deprecate reorder parameter. reorder=False is now the default behaviour.
reorder=True is introduced in 9f18586 for roc_auc_score. Now it is not used in roc_auc_score due to #9786.
Any other comments?
cc @jnothman @lesteve