8000 [MRG+1] LabelBinarizer single label case now works for sparse and dense case by devashishd12 · Pull Request #6221 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 21, 2016
Merged

[MRG+1] LabelBinarizer single label case now works for sparse and dense case #6221

merged 1 commit into from
Mar 21, 2016

Conversation

devashishd12
Copy link
Contributor

Fixes #6202.

8000 return Y
if sparse_output:
if isinstance(classes[0], integer_types) and classes[0] != 0:
classes = np.sort(np.append(classes, 0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than appending 0 at the end and sorting, I would do "classes = np.r_[0, classes]" which avoid doing 2 array operations.

@GaelVaroquaux
Copy link
Member

This looks overall right, but the tests are failing in Python 3.

@devashishd12
Copy link
Contributor Author

@GaelVaroquaux thanks a lot for reviewing! yeah I'm trying to figure out why that failure is happening....

@devashishd12
Copy link
Contributor Author

@GaelVaroquaux tests pass now. Could you please have a look? Thanks!
cc: @MechCoder

@devashishd12 devashishd12 changed the title LabelBinarizer single label case now works for sparse and dense case [MRG] LabelBinarizer single label case now works for sparse and dense case Jan 24, 2016
return Y
else:
Y += neg_label
return Y
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this if-else branching?

It should return an array of pos_label irrespective of
whatever.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a small doubt, in this case:

lb = LabelBinarizer(neg_label=0, pos_label=1, sparse_output=False)
y = [0, 0, 0, 0]
lb.fit_transform(y)

the output should be:

[[0]
 [0]
 [0]
 [0]]

right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually even I think we should default to the positive label in this case but then I'll have to modify the original tests. Should I do that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and I think we should update whatsnew as a bug fix.

@hamsal could you verify that this is expected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright thanks! I'll proceed in this direction then.

@devashishd12 devashishd12 changed the title [MRG] LabelBinarizer single label case now works for sparse and dense case [WIP] LabelBinarizer single label case now works for sparse and dense case Jan 30, 2016
@devashishd12
Copy link
Contributor Author

@MechCoder one test is failing.... I fixed the other two but I'm not sure if it's the correct way. Is there any other way to fix them?

@@ -271,7 +271,10 @@ def predict(self, X):
indices = (scores > 0).astype(np.int)
else:
indices = scores.argmax(axis=1)
return self.classes_[indices]
if len(self.classes_) == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which test caused you to change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting an IndexError in test_common.test_meta_estimators as can be seen here. I fixed the IndexError by making those changes but test_mlp is still failing...

@devashishd12
Copy link
Contributor Author

@MechCoder I'm not quite able to fix the failing test.... Could you please check once? Thanks for the help!

@devashishd12
Copy link
Contributor Author

ping @MechCoder :)

@MechCoder
Copy link
Member

Ok, it is a more complex issue than I thought, just make sure that the dense and the sparse case is compatible and I'll open another issue for the ones or zeros case

@devashishd12
Copy link
Contributor Author

@MechCoder all tests in test_label are passing so AFAIK, sparse and dense case are compatible. I have also squashed and rebased.

@MechCoder
Copy link
Member

Oh, I meant to keep the old behavior as it is, i.e returning neg_label and just to address the issue of the output being dense even when sparse_output is set to True.

@devashishd12
Copy link
Contributor Author

Oh alright I'll do that.

@devashishd12
Copy link
Contributor Author

@MechCoder I've restored the original behavior for LabelBinarizer defaulting to neg_label. It works for sparse case too now. Although AppVeyor is acting funny...

@devashishd12 devashishd12 changed the title [WIP] LabelBinarizer single label case now works for sparse and dense case [MRG] LabelBinarizer single label case now works for sparse and dense case Feb 24, 2016
if n_classes == 1:
if sparse_output:
n_classes += 1
classes = np.append(classes, neg_label)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just return sparse.csr_matrix(neg_label * np.ones_like(y)) or something similar for this corner case, so that it need not follow the complex code path below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's better. Actually since sparse binarization is only supported for 0 neg_label I'll just do something like:

return sp.csr_matrix((n_samples, 1), dtype=int)

@MechCoder
Copy link
Member

Looks ok, pending comments

@devashishd12
Copy link
Contributor Author

@MechCoder I've made the changes. Is this alright?

@devashishd12
Copy link
Contributor Author

@MechCoder gentle ping :)

@MechCoder
Copy link
Member

LGTM

@MechCoder MechCoder changed the title [MRG] LabelBinarizer single label case now works for sparse and dense case [MRG+1] LabelBinarizer single label case now works for sparse and dense case Mar 7, 2016
@devashishd12
Copy link
Contributor Author

Can anyone please give a second review on this one?

@nelson-liu
Copy link
Contributor

LGTM

got = lb.fit_transform(inp)
assert_array_equal(lb.classes_, ["pos"])
assert_array_equal(expected, got)
assert_array_equal(lb.inverse_transform(got), inp)

# For sparse case:
lb = LabelBinarizer(sparse_output=True)
inp = ["pos", "pos", "pos", "pos"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to define this again?

@devashishd12
Copy link
Contributor Author

@rvraghav93 edited, squashed, rebased. Can merge?

@raghavrv
Copy link
Member

Yes LGTM. @MechCoder merge?

@devashishd12
Copy link
Contributor Author

Once this gets merged, we could work on the ones and zeros case as stated above.

MechCoder added a commit that referenced this pull request Mar 21, 2016
[MRG+1] LabelBinarizer single label case now works for sparse and dense case
@MechCoder MechCoder merged commit 945cb7e into scikit-learn:master Mar 21, 2016
@MechCoder
Copy link
Member

Thanks !!

@devashishd12 devashishd12 deleted the LabelBinarizer_fix branch March 21, 2016 15:03
@devashishd12
Copy link
Contributor Author

Thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0