-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
order of returned probabilites unclear for cross_val_predict with method=predict_proba #7863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I suppose this (and disagreements in the set of classes between splits) was overlooked. (For internal models, I'm fairly sure that |
Can I start working on this? |
After running a few tests on the iris dataset using LogisticRegression as the estimator, it became clear that the order of the classes appearing did not matter in the final result. This gives a simple and clear explanation. |
Iris aside, it's possible to create a cross-validation strategy that will Firstly the documentation should point out that the classes will be in On 16 November 2016 at 11:41, Aman Dalmia notifications@github.com wrote:
|
True, we could miss out on a few observations in cross-validation. As for the confirmation of the classes being sorted, since these lines of the cross_val_predict function already ensure that only estimators capable of calculating the probability are passed 'predict_proba' as the method, and all such estimators return the classes in sorted order, do we need any other mode of confirmation? if not callable(getattr(estimator, method)):
raise AttributeError('{} not implemented in estimator'
.format(method)) |
Well, we don't promise that all return classes in sorted order; all store On 16 November 2016 at 14:44, Aman Dalmia notifications@github.com wrote:
|
Then how about, in the |
That sort of thing, yes On 16 November 2016 at 16:23, Aman Dalmia notifications@github.com wrote:
|
That would be a good idea as it would make sure 3rd party classifiers used the same standard. I know it works today for LogisticRegression and RandomForestClassifier. However I am unsure for XGBClassifier as that is outside sklearn. |
Cross_val_predict has a new method parameter which is typically set to "predict_proba" to return probabilities for each class.
However the order of the classes returned is unclear. Either self.classes_ needs to be set; or the results need to be returned in a predictable order. Otherwise we have a list of probabilities for each class but no way to know which column relates to which class.
The text was updated successfully, but these errors were encountered: