-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] class_weight for Bagging, AdaBoost & GradientBoosting classifiers #4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] class_weight for Bagging, AdaBoost & GradientBoosting classifiers #4114
Conversation
CC: @pprett as well |
@trevorstephens I'm not sure if I completely understand what you are planning to do... |
@pprett yes, that is how it is working... Let me try to be more coherent with my question :) In |
Added a (possibly questionable) hack to pass |
@@ -758,6 +760,9 @@ def check_class_weight_classifiers(name, Classifier): | |||
classifier.set_params(n_iter=100) | |||
if hasattr(classifier, "min_weight_fraction_leaf"): | |||
classifier.set_params(min_weight_fraction_leaf=0.01) | |||
if isinstance(classifier, (BaggingClassifier, AdaBoostClassifier)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the fix is good. If you have a better test let me know ;) Maybe add a comment saying "the dataset is noisy and these classifiers are not regularized" or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall do. Also got rid of the doubling down on class_weights, that just seemed cheeky to me :) pared back the boosting rate for Ada instead to get a passing grade.
I've been thinking about the Also, there's two ways to go in I have also moved around some pep8 things in GBM in prior commits and moved the Flipping to [MRG] but interested to hear what the tree people think. |
I was thinking of implementing a new Something like It could check the dimension of This would make transitioning any of the meta estimators to multi-output down the road a little easier too if that's on the map. Anyone have any strong opinions on this? |
I'm closing this PR and will open a two new ones, #4215 , which refactors this code on top of #4190 as well as another (later) one that proposes the GBM style mods that I implemented here. Seems that these are two items that should be addressed separately and would probably muddy the waters for potential code reviewers, who are busy enough as it is :-) |
Initial early crack at the rest of the ensembles carrying on from #3961
My biggest question is how (or if) to handle
class_weight='subsample'
inGradientBoostingClassifier
. InRandomForestClassifier
, I weighted based on the bootstrap sample as I have done inBaggingClassifier
in this PR. But the applicability to GBM is not so clear to me. Right now I simply use the 'auto' weighting as a initialization of the sample weights, as the fitsample_weight
param does right now, and as is implemented inAdaBoostClassifier
in this PR. My initial thought for 'subsample' was to fit the residuals to the re-weighted boostingsample_weight
while not actually altering it and letting the iterations do their things, but I feel that we may be venturing far away from the literature there...Any thoughts?
Additionally, I'm certain Travis will be upset with me on the infamous
check_class_weight_classifiers
inestimator_checks
. GBM may get by from the hack here: #3961 (diff) but the two meta estimators do not callmin_weight_fraction_leaf
directly and are sure to fail. As one solution, I could add another bandaid to check if the estimator hasbase_estimator
and set a decision tree with a similar bias against lightly weighted leaf nodes.Will add individual estimator checks on top of this soon, of course.
Calling the previous reviewers for input: @glouppe @amueller @GaelVaroquaux , plus any of you other fine people who are willing to comment :-)