-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Great! Please reference the relevant pull request here, so I can give you my input. |
I think that what is meant by binary decision is not the output of |
I'd rather have a warning: it could be legitimate. That said a warning |
Yes, I agree. Isn't
I agree with @mblondel that we should check for this case and with @GaelVaroquaux that it should only be a warning. |
|
Yes, it can be interpreted as score, but you also need class values to be able to compute the ROC curve and from that the AUC. That's why I said computing AUC for regression models doesn't make sense. |
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_classes=2)
>>> from sklearn.svm import SVC
>>> clf = SVC(kernel="rbf")
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, shrinking=True, tol=0.001,
verbose=False)
>>> from sklearn.metrics import auc_score
>>> y_score = clf.decision_function(X).ravel()
>>> auc_score(y, y_score)
0.998 |
@mblondel, what did you try to demonstrate with your example? I think we actually agree on when and how to compute the AUC. Please, read my previous comment. |
@tjanez |
Well, you wrote that decision_function doesn't make sense for computing the |
Ok, this is a misunderstanding then. I said that "computing AUC for regression models doesn't make sense". I agree with you that you can use the the output of the decision function instead of probabilities for computing the AUC. To return to the original problem of this issue, do we agree that the |
@amueller , for example, LinearRegression has a decision_function: http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.decision_function |
Yes +1 for me. We might want to implement a small Cython utility function |
+1 for raising a warning. |
+1 also for warning. |
fixed in master. |
The documentation at: http://scikit-learn.org/dev/modules/generated/sklearn.metrics.auc_score.html#sklearn.metrics.auc_score
says that
y_score
can be either probability estimates of the positive class, or binary decisions.It should warn the reader that by using binary decisions, it is only able to compute AUC as if the classifier only returned probabilities 0 and 1 and thus not give the "real" AUC.
Here is an example:
This is the output:
I admit this is not a very good example, as the difference between
AUCs
andAUCs_proba
could be a lot bigger in practice, but I wanted to use a built-in data set.Note that AUC computed from binary decisions is always inferior to the AUC computed with probability estimates.
The text was updated successfully, but these errors were encountered: