DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393

tjanez · 2012-11-22T15:51:08Z

The documentation at: http://scikit-learn.org/dev/modules/generated/sklearn.metrics.auc_score.html#sklearn.metrics.auc_score
says that y_score can be either probability estimates of the positive class, or binary decisions.

It should warn the reader that by using binary decisions, it is only able to compute AUC as if the classifier only returned probabilities 0 and 1 and thus not give the "real" AUC.

Here is an example:

from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn import cross_validation
from sklearn import datasets

data = datasets.load_digits()
X, y = data.data, data.target
# make the classification problem binary
X = X[(y == 8) | (y == 6)]
y = y[(y == 8) | (y == 6)]

clf = LogisticRegression(C=0.001)

k_fold = cross_validation.KFold(len(y), k=10, indices=True, shuffle=True, random_state=18)

AUCs = []
AUCs_proba = []
for train, test in k_fold:
    clf.fit(X[train], y[train])
    AUCs.append(metrics.auc_score(y[test], clf.predict(X[test])))
    AUCs_proba.append(metrics.auc_score(y[test], clf.predict_proba(X[test])[:, 1]))

print "AUCs: "
print AUCs
print "AUCs (with probabilities): "
print AUCs_proba

This is the output:

AUCs: 
[1.0, 0.97222222222222221, 1.0, 0.97058823529411764, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
AUCs (with probabilities): 
[1.0, 1.0, 1.0, 0.99673202614379086, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]

I admit this is not a very good example, as the difference between AUCs and AUCs_proba could be a lot bigger in practice, but I wanted to use a built-in data set.

Note that AUC computed from binary decisions is always inferior to the AUC computed with probability estimates.

The text was updated successfully, but these errors were encountered:

tjanez · 2012-11-22T18:27:44Z

Great!

Please reference the relevant pull request here, so I can give you my input.

mblondel · 2012-11-22T19:30:59Z

I think that what is meant by binary decision is not the output of predict but the output of decision_function. And binary refers to binary classification, not binary values. This is indeed a bit unclear. AUC is the area under the ROC curve and the ROC curve consists in computing the true positive and false positive rates for different decision thresholds. So y_score needs to be real values. We could check if np.unique(y_score) contains only 2 values and raise an exception in that case.

GaelVaroquaux · 2012-11-22T20:08:42Z

We could check if np.unique(y_score) contains only 2 values and raise
an exception in that case.

I'd rather have a warning: it could be legitimate. That said a warning
would be useful.

tjanez · 2012-11-23T15:47:36Z

This is a bit unclear.

Yes, I agree. Isn't decision_function a method of regression models? Computing AUC for such models doesn't make sense.
Anyhow, the documentation should be clearer about this.

We could check if np.unique(y_score) contains only 2 values

I agree with @mblondel that we should check for this case and with @GaelVaroquaux that it should only be a warning.
For example, you could have a classifier that doesn't give you probabilities, only 0s and 1s. In this case, when computing the AUC, you would interpret 0s as probability 0.0 and 1s as probability 1.0.

mblondel · 2012-11-23T17:30:27Z

Yes, I agree. Isn't decision_function a method of regression models? Computing AUC for such models doesn't make sense.

decision_function gives you the dot product between the coefficient vectors and the data. It can be interpreted as a score, hence it can be used for AUC.

tjanez · 2012-11-23T23:18:02Z

decision_function gives you the dot product between the coefficient vectors and the data. It can be interpreted as a score, hence it can be used for AUC.

Yes, it can be interpreted as score, but you also need class values to be able to compute the ROC curve and from that the AUC.

That's why I said computing AUC for regression models doesn't make sense.

mblondel · 2012-11-24T05:11:08Z

>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_classes=2)
>>> from sklearn.svm import SVC
>>> clf = SVC(kernel="rbf")
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, shrinking=True, tol=0.001,
  verbose=False)
>>> from sklearn.metrics import auc_score
>>> y_score = clf.decision_function(X).ravel()
>>> auc_score(y, y_score)
0.998

tjanez · 2012-11-24T15:44:52Z

@mblondel, what did you try to demonstrate with your example?

I think we actually agree on when and how to compute the AUC. Please, read my previous comment.

amueller · 2012-11-24T15:51:57Z

@tjanez decision_function is not a function for regression models, it is a function that some classifiers have. There are no regression models with a decision_function.

mblondel · 2012-11-24T15:59:11Z

I think we actually agree on when and how to compute the AUC. Please, read
my previous comment.

Well, you wrote that decision_function doesn't make sense for computing the
AUC...

tjanez · 2012-11-24T16:11:20Z

Well, you wrote that decision_function doesn't make sense for computing the AUC...

Ok, this is a misunderstanding then. I said that "computing AUC for regression models doesn't make sense". I agree with you that you can use the the output of the decision function instead of probabilities for computing the AUC.

To return to the original problem of this issue, do we agree that the auc_score method should check if np.unique(y_score) contains only 2 values and raise a warning in that case?

tjanez · 2012-11-24T16:20:21Z

There are no regression models with a decision_function.

@amueller , for example, LinearRegression has a decision_function: http://scikit-learn.org/dev/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression.decision_function

mblondel · 2012-11-24T16:22:10Z

Yes +1 for me. We might want to implement a small Cython utility function check_binary (to be put in utils/arrayfuncs.pyx) instead of using np.unique, which could be expensive for big arrays. check_binary could be useful in BernouilliNB for instance.

amueller · 2012-11-24T18:37:47Z

+1 for raising a warning.
About LinearRegression: I consider this a bug. It is an artifact of the inheritance structure. I'll open an issue.

BassT · 2016-04-06T13:57:54Z

+1 also for warning.

8760

amueller · 2016-10-11T00:51:37Z

fixed in master.

amueller mentioned this issue Nov 24, 2012

LinearRegression has decision_function #1404

Closed

tjanez mentioned this issue Nov 26, 2012

WIP: check_unary_or_binary function in sklearn.utils.arrayfuncs #1412

Closed

Ale14 mentioned this issue Sep 24, 2013

Multinomial Naive Bayes: Scikit and Weka have different results #2475

Closed

amueller closed this as completed Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393

DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393

DOC: sklearn.metrics.auc_score should mention that using probabilities will give better scores #1393

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!