-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221
Conversation
8000 return Y | ||
if sparse_output: | ||
if isinstance(classes[0], integer_types) and classes[0] != 0: | ||
classes = np.sort(np.append(classes, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than appending 0 at the end and sorting, I would do "classes = np.r_[0, classes]" which avoid doing 2 array operations.
This looks overall right, but the tests are failing in Python 3. |
@GaelVaroquaux thanks a lot for reviewing! yeah I'm trying to figure out why that failure is happening.... |
@GaelVaroquaux tests pass now. Could you please have a look? Thanks! |
return Y | ||
else: | ||
Y += neg_label | ||
return Y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this if-else
branching?
It should return an array of pos_label
irrespective of
whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a small doubt, in this case:
lb = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
y = [0, 0, 0, 0]
lb.fit_transform(y)
the output should be:
[[0]
[0]
[0]
[0]]
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually even I think we should default to the positive label in this case but then I'll have to modify the original tests. Should I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I think we should update whatsnew as a bug fix.
@hamsal could you verify that this is expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright thanks! I'll proceed in this direction then.
@MechCoder one test is failing.... I fixed the other two but I'm not sure if it's the correct way. Is there any other way to fix them? |
@@ -271,7 +271,10 @@ def predict(self, X): | |||
indices = (scores > 0).astype(np.int) | |||
else: | |||
indices = scores.argmax(axis=1) | |||
return self.classes_[indices] | |||
if len(self.classes_) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which test caused you to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting an IndexError
in test_common.test_meta_estimators
as can be seen here. I fixed the IndexError
by making those changes but test_mlp
is still failing...
@MechCoder I'm not quite able to fix the failing test.... Could you please check once? Thanks for the help! |
ping @MechCoder :) |
Ok, it is a more complex issue than I thought, just make sure that the dense and the sparse case is compatible and I'll open another issue for the |
@MechCoder all tests in |
Oh, I meant to keep the old behavior as it is, i.e returning |
Oh alright I'll do that. |
@MechCoder I've restored the original behavior for |
if n_classes == 1: | ||
if sparse_output: | ||
n_classes += 1 | ||
classes = np.append(classes, neg_label) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just return sparse.csr_matrix(neg_label * np.ones_like(y))
or something similar for this corner case, so that it need not follow the complex code path below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's better. Actually since sparse binarization is only supported for 0 neg_label
I'll just do something like:
return sp.csr_matrix((n_samples, 1), dtype=int)
Looks ok, pending comments |
@MechCoder I've made the changes. Is this alright? |
@MechCoder gentle ping :) |
LGTM |
Can anyone please give a second review on this one? |
LGTM |
got = lb.fit_transform(inp) | ||
assert_array_equal(lb.classes_, ["pos"]) | ||
assert_array_equal(expected, got) | ||
assert_array_equal(lb.inverse_transform(got), inp) | ||
|
||
# For sparse case: | ||
lb = LabelBinarizer(sparse_output=True) | ||
inp = ["pos", "pos", "pos", "pos"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to define this again?
@rvraghav93 edited, squashed, rebased. Can merge? |
Yes LGTM. @MechCoder merge? |
Once this gets merged, we could work on the ones and zeros case as stated above. |
[MRG+1] LabelBinarizer single label case now works for sparse and dense case
Thanks !! |
Thanks for the reviews! |
Fixes #6202.