8000 ENH un-confuse pos_label use for label indicator matrices · Issue #1992 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH un-confuse pos_label use for label indicator matrices #1992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue May 23, 2013 · 5 comments
Closed

ENH un-confuse pos_label use for label indicator matrices #1992

jnothman opened this issue May 23, 2013 · 5 comments

Comments

@jnothman
Copy link
Member

Some metrics take a pos_label argument and interpret it as indicating the positive class in a label indicator matrix (multilabel target representation). This meaning should be removed because it is unusual for pos_label to not be 1, and it can be confused with the positive class label (which corresponds to a column in a label indicator matrix, not a value).

Should pos_label also be removed from LabelBinarizer (where it means "positive indicator value") and fixed to 1? I assume neg_label should not be fixed to 0, as some classifiers work with -1 (however, perhaps that's not a desirable flexibility in label indicator matrices).

I don't particularly like pos_label being used there either, given that label is synonymous with class elsewhere. Perhaps the LabelBinarizer parameters, where not removed, should be renamed to pos_indicator and neg_indicator.

This was brought up at #1983 and #1985 (I accidentally posed it at one when it was meant for the other!).

@jnothman
Copy link
Member Author

PS: the only non-1 value I can imagine setting LabelBinarizer.pos_label to is True, but currently the implementation explicitly sets dtype=np.int (even though bools would be more compact and suitable for many purposes).

@mblondel
Copy link
Member

You cannot change neg_label to -1 by default as it would break compatibility. The 0-1 encoding is quite natural for an indicator matrix. It can be used to do some clever matrix operations as well (Naive Bayes uses that). Also, all binary classifiers in the scikit support arbitrary encoding as long as the positive label is greater than the negative label. While I can see some value in removing the pos_label in other PRs, here I am all for applying the moto "if it ain't broken, don't fix it". (Changing parameter names has a cost since we need to take care of the warnings for two releases and we ask our users to change their code base. We should do it only when there's real value to it)

@mblondel
Copy link
Member

Another advantage of the 0-1 encoding is that you can easily transforn the indicator matrix to a sparse format.

@jnothman
Copy link
Member Author

I didn't suggest changing default neg_label; certainly, I agree neg_label needs to be flexible and the interface should be stable where possible. As long as it's clear that metrics will only recognise a label indicator matrix with pos_label == 1 (or perhaps we should use pos_label == max(y); thoughts, @arjoly?), and that this meaning of label is different to its use to mean "class" elsewhere, it's fine to leave LabelBinarizer.pos_label in there.

@jnothman
Copy link
Member Author
jnothman commented Dec 8, 2013

I think we've sorted this out in metrics where it was most confusing. I'm closing.

@jnothman jnothman closed this as completed Dec 8, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0