-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH un-confuse pos_label use for label indicator matrices #1992
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
PS: the only non- |
You cannot change neg_label to -1 by default as it would break compatibility. The 0-1 encoding is quite natural for an indicator matrix. It can be used to do some clever matrix operations as well (Naive Bayes uses that). Also, all binary classifiers in the scikit support arbitrary encoding as long as the positive label is greater than the negative label. While I can see some value in removing the pos_label in other PRs, here I am all for applying the moto "if it ain't broken, don't fix it". (Changing parameter names has a cost since we need to take care of the warnings for two releases and we ask our users to change their code base. We should do it only when there's real value to it) |
Another advantage of the 0-1 encoding is that you can easily transforn the indicator matrix to a sparse format. |
I didn't suggest changing default |
I think we've sorted this out in metrics where it was most confusing. I'm closing. |
Some metrics take a
pos_label
argument and interpret it as indicating the positive class in a label indicator matrix (multilabel target representation). This meaning should be removed because it is unusual forpos_label
to not be 1, and it can be confused with the positive class label (which corresponds to a column in a label indicator matrix, not a value).Should
pos_label
also be removed fromLabelBinarizer
(where it means "positive indicator value") and fixed to 1? I assumeneg_label
should not be fixed to 0, as some classifiers work with -1 (however, perhaps that's not a desirable flexibility in label indicator matrices).I don't particularly like
pos_label
being used there either, given thatlabel
is synonymous withclass
elsewhere. Perhaps theLabelBinarizer
parameters, where not removed, should be renamed topos_indicator
andneg_indicator
.This was brought up at #1983 and #1985 (I accidentally posed it at one when it was meant for the other!).
The text was updated successfully, but these errors were encountered: