[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

trevorstephens · 2015-05-24T18:35:37Z

Was browsing Landscape.io and noticed this strange one. class_weight is a zombie param for the PassiveAggressiveRegressor, not present for the PassiveAggressiveClassifier O_o

Removed from regressor, didn't think deprecation is necessary since it didn't go anywhere and makes no sense either, and implemented it for the classifier with some tests.

amueller · 2015-05-27T16:52:16Z

sklearn/linear_model/tests/test_passive_aggressive.py

@@ -125,6 +126,77 @@ def test_classifier_undefined_methods():
        assert_raises(AttributeError, lambda x: getattr(clf, x), meth)


+def test_class_weights():
+    # Test class weights.


is this taken from the SGDClassifier tests?

Most of it, with some mods here and there. Some others might have been adapted from d-tree's tests if I recall correctly.

It would be great to reuse the SGD tests for PA (since they share the implementation). There'd be more work there, so I think this shouldn't be a show stopper for this PR. @amueller what do you think?

@vene, there's certainly things that may need to be factored over to common tests for the class/sample wt classifiers, seems like a different PR to me though... ping @amueller ?

eg, here: #4838 (comment)

amueller · 2015-05-27T16:53:07Z

LGTM. Removing from regressor seems fine for me as silently ignoring was a bug and breaking there seems ok.

trevorstephens · 2015-06-02T00:04:25Z

Note that I need to rebase this on top of the newly merged #4347

trevorstephens · 2015-06-04T00:51:17Z

Changed over the class_weight preset from 'auto' to 'balanced' due to #4347 's merge. Assuming @amueller that your +1 stands, anyone else feel like taking a look?

amueller · 2015-06-04T15:52:30Z

yeah still lgtm.

trevorstephens · 2015-06-04T16:48:56Z

@amueller ... While I'm at it, should i also add a sample_weight parameter to both PA-Reg/Cls? I'm reading over the reference lit: http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf and it states to initialize all sample weights to zero, which wouldn't work with our sgd implementation... Not sure now if the whole class_weight thing is straying too far from the literature as it isn't mentioned anywhere in there, or if the consistency with the rest of the API is a valid reason to implement it. 😕 Your thoughts?

vene · 2015-06-04T17:17:18Z

sklearn/linear_model/passive_aggressive.py

+            raise ValueError("class_weight 'balanced' is not supported for "
+                             "partial_fit. In order to use 'balanced' "
+                             "weights, use "
+                             "compute_class_weight('balanced', classes, y). "


Should this error message indicate that compute_class_weight can be found in sklearn.utils?

Yes, it would be more clear, I'll update this. I'd simply copied from SGD's partial_fit message. I can also update the other instances where this message appears around the code base in a separate PR.

vene · 2015-06-04T17:27:15Z

it states to initialize all sample weights to zero

I skimmed through it and couldn't find this, could you point it out? They do talk about initializing the linear model's weight vector (the coefficients) to 0. Is there something I missed in the cost-sensitive learning section?

I thought sample weights (and class weights) are a generally applicable concept, which is why people don't really mention them. But I'd leave sample weights for a different PR.

amueller · 2015-06-04T17:31:29Z

I thought sample weights (and class weights) are a generally applicable concept, which is why people don't really mention them. But I'd leave sample weights for a different PR.

+1 on both accounts

trevorstephens · 2015-06-04T17:36:13Z

I skimmed through it and couldn't find this, could you point it out? They do talk about initializing the linear model's weight vector (the coefficients) to 0. Is there something I missed in the cost-sensitive learning section?

Ah yes, I think I skipped through it too fast :/

trevorstephens · 2015-06-04T18:01:32Z

OK, error message updated and commits squashed.

trevorstephens · 2015-06-06T00:14:51Z

@vene that message look better to you?

vene · 2015-06-06T02:25:40Z

sklearn/linear_model/passive_aggressive.py

+                             "partial_fit. In order to use 'balanced' "
+                             "weights, from the sklearn.utils module use "
+                             "compute_class_weight('balanced', classes, y). "
+                             "In place of y you can us a large enough sample "


you can us -> you can use
I'd say "subset" instead of "sample", because it's less ambiguous. (consider n_samples).

Ha, yeah ok. Cut and paste job, shall fix. And as I mentioned, this error exists several other places, will (once we get this settled) persist around the code base.

Fair call, shall also change that.

vene · 2015-06-06T02:32:37Z

I really like the new suggestion of using a subset of the labels to estimate class priors.

rebase on top of scikit-learn#4347 improve error message update error msg

trevorstephens · 2015-06-06T16:10:53Z

@vene , I think I have all your comments incorporated.

trevorstephens · 2015-06-29T15:02:28Z

ping @vene this look good to you now?

vene · 2015-06-29T17:18:19Z

sklearn/linear_model/tests/test_passive_aggressive.py

+    assert_array_equal(clf.predict([[0.2, -1.0]]), np.array([1]))
+
+    # we give a small weights to class 1
+    clf = PassiveAggressiveClassifier(C=0.1, n_iter=100,


I'm not entirely convinced it's better, but I can see some reasons for just doing clf.set_params(class_weight={1: 0.001}) here. It makes it explicit that the rest of the parameters shouldn't be changed, in case somebody modifies the test in the future.

Agree it might be slightly clearer @vene but I see this paradigm only very rarely in other tests in git grep... You think it's necessary for merge?

It's not important, it just seems slightly better to me from a maintenance point of view.

I don't have a strong opinion on this. Either way would be fine.

GaelVaroquaux · 2015-08-30T15:25:47Z

I reviewed the code. The changes are good. They are some pending comments, but it's minor details, and merging right now seems the best way to provide value to users.

Given that there is already a 👍 I am merging.

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor

trevorstephens · 2015-08-30T15:31:59Z

Thanks for the reviews all!

GaelVaroquaux · 2015-08-30T15:33:40Z

Thanks for the reviews all!

Thanks for the work! And sorry it took so long to merge: the bandwiwth of core devs is unfortunately very limited.

amueller reviewed May 27, 2015
View reviewed changes

amueller changed the title ~~[MRG] Add class_weight to PA Classifier, remove from PA Regressor~~ [MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor May 27, 2015

trevorstephens force-pushed the passive-aggressive_cw branch from 44c11f5 to 0166186 Compare June 4, 2015 00:25

vene reviewed Jun 4, 2015
View reviewed changes

trevorstephens force-pushed the passive-aggressive_cw branch from 0166186 to 718825d Compare June 4, 2015 17:50

vene reviewed Jun 6, 2015
View reviewed changes

add class_weight to PA cls, remove from PA reg

ee78879

rebase on top of scikit-learn#4347 improve error message update error msg

trevorstephens force-pushed the passive-aggressive_cw branch from 718825d to ee78879 Compare June 6, 2015 15:58

vene reviewed Jun 29, 2015
View reviewed changes

GaelVaroquaux added a commit that referenced this pull request Aug 30, 2015

Merge pull request #4767 from trevorstephens/passive-aggressive_cw

6e735f5

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor

GaelVaroquaux merged commit 6e735f5 into scikit-learn:master Aug 30, 2015

trevorstephens deleted the passive-aggressive_cw branch August 30, 2015 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

[MRG + 1] Add class_weight to PA Classifier, remove from PA Regressor #4767

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment