8000 MRG LibLinear class weights fix by amueller · Pull Request #1491 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MRG LibLinear class weights fix #1491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 3, 2013

Conversation

amueller
Copy link
Member

Fixes #1411.

It seems I broke that in 5427f81 - or rather that I didn't fix there when I fixed it for libsvm.
The current tests for class weights seem suboptimal.
This branch contains tests for all estimators supporting class_weight.

I will also try to improve the tests in the svm module.
If everything works, I'll add another comment to linear.cpp.

I want to check if the corrected behavior is consistent with the libsvm binary (see comments in #1411), but I feel being consistent inside sklearn is more important any way.

This PR is on top of #1485 because they touch the same stuff.

@amueller
Copy link
Member Author

The behavior us now consistent within sklearn (I am a bit puzzled by RidgeClassifier, see #1489) and also the same as for the LibSVM binary. Renamed to MRG.

@agramfort
Copy link
Member

travis is not happy.

@amueller
Copy link
Member Author

@agramfort thanks, forgot that one :(

@amueller
Member Author

Ok, I skipped that test for the moment. It's not really related to the bug the PR fixes.

@amueller
Copy link
Member Author

Still not working :( I can't generate a dataset with 3 classes where balancing helps the f1 score

@amueller
Copy link
Member Author

ok, should be good now.

@mblondel
Copy link
Member

Can you summarize what was the problem?

@amueller
Copy link
Member Author

There used to be a problem in the class order differing in liblinear/libsvm and sklearn. I fixed that some time ago by switching the classes in the liblinear and libsvm code. This fixed the sign of the decision function and for some reason also the class weights in the SVC. The class weights in LinearSVC remained broken.

It looks to me as if the class-weights were never correct in the binary case - and the tests really didn't test anything.

@@ -107,6 +107,9 @@ Changelog
Gaussian and sparse random projection matrix
by `Olivier Grisel`_ and `Arnaud Joly`_.

- Fixed `class_weight` support in :class:`svm.LinearSVC` and
:class:`linear_model.LogisticRegression` by `Andreas Müller`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could give some more details here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, SVC and NuSVC are also impacted. At least the dtype of the prediction outcome is now integer instead of float.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to SVC and NuSVC come from #1485, so I should add the comment there.

@ogrisel
Copy link
Member
ogrisel commented Dec 29, 2012

Besides the minor comments above, +1 for merging.

@amueller
Copy link
Member Author

@ogrisel I added the changes to the SVC classes_ to the other PR and rebased this on top of it again.

@mblondel
Copy link
Member
mblondel commented Jan 2, 2013

+1 for merge

@amueller
Copy link
Member Author
amueller commented Jan 2, 2013

Thanks for the reviews @mblondel :)

@amueller amueller merged commit 30045e1 into scikit-learn:master Jan 3, 2013
@amueller amueller deleted the svm_class_weights branch January 3, 2013 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weird behavior in LogisticRegression on parameter class_weight
4 participants
0