-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Support roc_auc_score() for multi-class without probability estimates #18676
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can't you just one-hot encode the predictions to get your score? from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
# Get the data
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Create the model
clf = LinearSVC()
# Split the data in train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
# Train the model
clf.fit(X_train, y_train)
# Predict the test data
predicted = clf.predict(X_test)
predicted_probas = OneHotEncoder().fit_transform(predicted.reshape(-1, 1)).toarray()
roc_auc = roc_auc_score(y_test, predicted_probas, multi_class='ovr')
print(roc_auc) |
@tobyrmanders I do the modification as you suggested, but gave it a bit different value. With your implementation using |
@luismiguells That's because the two models give different predictions. You would need to peek under the hood at the default parameter values of each model type to figure out why they're giving different classifications. I'll point out that ROC-AUC is not as useful a metric if you don't have probabilities, since this measurement is essentially telling you how well your model sorts the samples by label. Without probabilities you cannot know how well the samples are sorted. |
Is there any literature on this? Adding support might not be that easy.
You should use the It should be noted that in this case, you are transforming the problem into a multilabel classification (a set of binary classification) which you will average afterwords. @jnothman knows better the implication of doing such transformation. |
I'm confused by this issue. The solution is to use |
As a first look at the documentation:
It seems that the approach really relies only on probabilities. From archiving (#7663 (comment)) it seems that it was an inconsistency between probabilities and non-thresholded estimates and we limited to probabilities in the multiclass implementation. |
Taking advantage of this issue, I'm also having trouble getting the AUC metric to work with scikit-learn 1.4... can anyone take a look here: pycaret/pycaret#3935 |
I have a multi-class problem. I tried to calculate the ROC-AUC score using the function
metrics.roc_auc_score()
. This function has support for multi-class but it needs the probability estimates, for that the classifier needs to have the methodpredict_proba()
. For example,svm.LinearSVC()
does not have it and I have to usesvm.SVC()
but it takes so much time with big datasets.Here is an example is an example of what I try to do:
If the classifier is changed to
svm.LinearSVC()
it will throw an error. It will be useful to add support for multi-class problems without the probability estimates sincesvm.LinearSVC()
is faster thansvm.SVC()
.The text was updated successfully, but these errors were encountered: