[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221

devashishd12 · 2016-01-24T16:31:29Z

GaelVaroquaux · 2016-01-24T18:34:06Z

sklearn/preprocessing/label.py

-            return Y
+            if sparse_output:
+                if isinstance(classes[0], integer_types) and classes[0] != 0:
+                    classes = np.sort(np.append(classes, 0))


Rather than appending 0 at the end and sorting, I would do "classes = np.r_[0, classes]" which avoid doing 2 array operations.

GaelVaroquaux · 2016-01-24T18:35:01Z

This looks overall right, but the tests are failing in Python 3.

devashishd12 · 2016-01-24T19:45:33Z

@GaelVaroquaux thanks a lot for reviewing! yeah I'm trying to figure out why that failure is happening....

devashishd12 · 2016-01-24T21:14:43Z

@GaelVaroquaux tests pass now. Could you please have a look? Thanks!
cc: @MechCoder

MechCoder · 2016-01-27T23:16:40Z

sklearn/preprocessing/label.py

+                    return Y
+                else:
+                    Y += neg_label
+                    return Y


Why this if-else branching?

It should return an array of pos_label irrespective of
whatever.

just a small doubt, in this case:

lb = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False) y = [0, 0, 0, 0] lb.fit_transform(y)

the output should be:

[[0] [0] [0] [0]]

right?

Actually even I think we should default to the positive label in this case but then I'll have to modify the original tests. Should I do that?

Yes, and I think we should update whatsnew as a bug fix.

@hamsal could you verify that this is expected?

Alright thanks! I'll proceed in this direction then.

devashishd12 · 2016-01-30T19:44:54Z

@MechCoder one test is failing.... I fixed the other two but I'm not sure if it's the correct way. Is there any other way to fix them?

MechCoder · 2016-02-02T05:24:49Z

sklearn/linear_model/base.py

@@ -271,7 +271,10 @@ def predict(self, X):
            indices = (scores > 0).astype(np.int)
        else:
            indices = scores.argmax(axis=1)
-        return self.classes_[indices]
+        if len(self.classes_) == 1:


Which test caused you to change this?

I was getting an IndexError in test_common.test_meta_estimators as can be seen here. I fixed the IndexError by making those changes but test_mlp is still failing...

devashishd12 · 2016-02-09T19:05:53Z

@MechCoder I'm not quite able to fix the failing test.... Could you please check once? Thanks for the help!

devashishd12 · 2016-02-22T13:34:00Z

ping @MechCoder :)

MechCoder · 2016-02-22T23:19:59Z

Ok, it is a more complex issue than I thought, just make sure that the dense and the sparse case is compatible and I'll open another issue for the ones or zeros case

devashishd12 · 2016-02-23T03:33:50Z

@MechCoder all tests in test_label are passing so AFAIK, sparse and dense case are compatible. I have also squashed and rebased.

MechCoder · 2016-02-23T03:41:28Z

Oh, I meant to keep the old behavior as it is, i.e returning neg_label and just to address the issue of the output being dense even when sparse_output is set to True.

devashishd12 · 2016-02-23T06:32:10Z

Oh alright I'll do that.

devashishd12 · 2016-02-24T10:45:35Z

@MechCoder I've restored the original behavior for LabelBinarizer defaulting to neg_label. It works for sparse case too now. Although AppVeyor is acting funny...

MechCoder · 2016-02-24T20:39:04Z

sklearn/preprocessing/label.py

+        if n_classes == 1:
+            if sparse_output:
+                n_classes += 1
+                classes = np.append(classes, neg_label)


I think we should just return sparse.csr_matrix(neg_label * np.ones_like(y)) or something similar for this corner case, so that it need not follow the complex code path below

Yes that's better. Actually since sparse binarization is only supported for 0 neg_label I'll just do something like:

return sp.csr_matrix((n_samples, 1), dtype=int)

MechCoder · 2016-02-24T20:41:30Z

Looks ok, pending comments

devashishd12 · 2016-02-25T02:44:59Z

@MechCoder I've made the changes. Is this alright?

devashishd12 · 2016-03-04T09:18:47Z

@MechCoder gentle ping :)

MechCoder · 2016-03-07T14:38:18Z

LGTM

devashishd12 · 2016-03-11T15:11:27Z

Can anyone please give a second review on this one?

nelson-liu · 2016-03-20T05:59:41Z

LGTM

raghavrv · 2016-03-20T20:08:14Z

sklearn/preprocessing/tests/test_label.py

    got = lb.fit_transform(inp)
    assert_array_equal(lb.classes_, ["pos"])
    assert_array_equal(expected, got)
    assert_array_equal(lb.inverse_transform(got), inp)

+    # For sparse case:
+    lb = LabelBinarizer(sparse_output=True)
+    inp = ["pos", "pos", "pos", "pos"]


Do we need to define this again?

devashishd12 · 2016-03-21T05:17:28Z

@rvraghav93 edited, squashed, rebased. Can merge?

raghavrv · 2016-03-21T09:55:10Z

Yes LGTM. @MechCoder merge?

devashishd12 · 2016-03-21T10:12:50Z

Once this gets merged, we could work on the ones and zeros case as stated above.

[MRG+1] LabelBinarizer single label case now works for sparse and dense case

MechCoder · 2016-03-21T13:51:48Z

Thanks !!

devashishd12 · 2016-03-21T15:08:50Z

Thanks for the reviews!

GaelVaroquaux reviewed Jan 24, 2016
View reviewed changes

devashishd12 changed the title ~~LabelBinarizer single label case now works for sparse and dense case~~ [MRG] LabelBinarizer single label case now works for sparse and dense case Jan 24, 2016

MechCoder reviewed Jan 27, 2016
View reviewed changes

devashishd12 changed the title ~~[MRG] LabelBinarizer single label case now works for sparse and dense case~~ [WIP] LabelBinarizer single label case now works for sparse and dense case Jan 30, 2016

MechCoder reviewed Feb 2, 2016
View reviewed changes

devashishd12 changed the title ~~[WIP] LabelBinarizer single label case now works for sparse and dense case~~ [MRG] LabelBinarizer single label case now works for sparse and dense case Feb 24, 2016

MechCoder reviewed Feb 24, 2016
View reviewed changes

MechCoder changed the title ~~[MRG] LabelBinarizer single label case now works for sparse and dense case~~ [MRG+1] LabelBinarizer single label case now works for sparse and dense case Mar 7, 2016

raghavrv reviewed Mar 20, 2016
View reviewed changes

LabelBinarizer single label case now works for sparse and dense case

e9492b7

MechCoder added a commit that referenced this pull request Mar 21, 2016

Merge pull request #6221 from dsquareindia/LabelBinarizer_fix

945cb7e

[MRG+1] LabelBinarizer single label case now works for sparse and dense case

MechCoder merged commit 945cb7e into scikit-learn:master Mar 21, 2016

devashishd12 deleted the LabelBinarizer_fix branch March 21, 2016 15:03

Uh oh!

[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221

[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!