MRG+2: Refactor - Farm out class_weight calcs to .utils#4190
MRG+2: Refactor - Farm out class_weight calcs to .utils#4190glouppe merged 5 commits intoscikit-learn:masterfrom
Conversation
sklearn/utils/class_weight.py
Outdated
There was a problem hiding this comment.
maybe use in1d from fixes for old numpy.
There was a problem hiding this comment.
Thanks, was not aware of the fix, will update.
|
besides looks great. I like all these red lines. Is there any other classifier that could benefit from this? |
:-)
Asides from those proposed in #4114, perhaps This function is written mostly to help in multi-output and bootstrap settings where there's more moving parts, so the code reduction in linear models and/or svm would be minimal, if it could be used at all. Most linear model and svm classes seem to do different things with an intermediate |
I would say yes. The benefit is also that this function gets more |
sklearn/linear_model/ridge.py
Outdated
There was a problem hiding this comment.
I am not totally familiar with RidgeClassifierCV, but note here ^ that I am now computing the implied cw variable over y rather than Y, as it is/was done in RidgeClassifier. I did some tests and it appears that compute_class_weight does not mind if it gets a one-hot-encoded version of y, or the original long version. Thus, I do not think that this change has any effect, but just putting it out there.
@agramfort , I guess there is also some benefit to having more explicit unit testing on what the expanded class_weight |
|
LGTM +1 for merge |
|
Thanks @agramfort . I just noticed that there was an inplace multiplication on |
|
Anyone else have time to review? |
There was a problem hiding this comment.
Can you add a test when class_weight=None?
There was a problem hiding this comment.
Good point. Done & thx for the review @glouppe !
|
Besides my small comment, +1 for merge. Thanks for the refactoring! |
Given this function, it becomes quite easy to add class weights to estimators that already support sample weights. This includes Bagging or GradientBoosting for example. (This can be done in the future, in a later PR, assuming you want to tackle this) |
Yep, I will refactor #4114 on top of this function which should make the review of those estimators easier too. Thanks for the +1 @glouppe |
|
Can this be merged @agramfort @glouppe ? |
|
Merging, thanks for your work :) |
MRG+2: Refactor - Farm out class_weight calcs to .utils
|
Thanks! |
With #4114 bringing a few more ensembles onboard the
class_weightbandwagon, a fair bit of duplicated code is being proposed. I know that @amueller was originally concerned about code duplication in the original RF/Tree PR and I think that this function may alleviate some of that.This PR farms out the calculations and some of the error checks for the
expanded_class_weightvariable to a newsklearn.utilsfunction:compute_sample_weight(class_weight, y, indices=None)and refactors the code from #3961 to utilise it. It also adds a bit more rigor to the input checks and tests.Benefits:
class_weight='subsample'option is doing what we think it is doingIf merged, I shall also make appropriate mods to the code in #4114 to also take advantage of this helper function.