Consider reversing deprecation of min_samples_leaf #11976

jph00 · 2018-09-02T21:04:54Z

Current RC proposes removing min_samples_leaf. In my experience across numerous projects and competitions, this param is very useful for generalization. Conceptually, I've often seen extreme outliers like this (top left point) result in picking a single point in a group leaf, which clearly isn't what you want. Larger leaves have less variance and therefore generalize better.

This param is also one way to reduce tree size:

@amueller requested adding this issue in this twitter thread https://twitter.com/jeremyphoward/status/1036110165236363266

parrt · 2018-09-02T21:49:39Z

I concur that this parameter it is useful. I have found it helps with overfitting and also seems to speed up training, which I use during model dev. I thought it was due to fewer nodes but twitter thread makes me wonder if I know exactly how that parameter is used in this kit's RF implementation.

jnothman · 2018-09-02T22:19:58Z

It's great that we are getting this feedback during RC. The problem here was that core developers as well as users expected these parameters to do something they didn't. I'm still not certain from your illustration whether the same, if not better, can be achieved with min_samples_split. After all, if min_samples_leaf is limiting the depth, this would imply there are insufficient instances to create a larger leaf at those points. The problem with min_samples_leaf is that it doesn't necessarily stop when it discovers that the best split at a point would create an invalid leaf. If min_samples_split is sufficient, then we would benefit from having fewer parameters. But if you can show that min_samples_split does not suffice, then reverting deprecation and improving docs would be good.

jnothman · 2018-09-03T02:51:09Z

"Larger leaves have less variance" does seem to be a valuable notion for regression. Less so for classification, but maybe it can indeed help there too??

We should be thinking of this as min_samples_branch rather than min_samples_leaf and perhaps we should reframe it as that:

min_samples_leaf : int
    A split point at any depth will only be considered if it leaves at least
    `min_samples_leaf` training samples in each of the left and right branches.
    This may have the effect of smoothing the model, especially in regression.

We could even rename it, but that might be unnecessarily disruptive to users.

parrt · 2018-09-03T23:29:53Z

@jnothman That really cleared things up for me. thanks! I thought that's what min_samples_leaf did; i.e., don't split if splitting would create a leaf with fewer than the hyperparameter value. What did min_samples_leaf imply in code before? The doc led me to believe it was doing what I tell students, a simple check before splitting:

The minimum number of samples required to be at a leaf node: If int, then consider min_samples_leaf as the minimum number.

I'd vote for keeping same name, changing doc, and making it do as you suggest. Can you summarize how it's actually used now? thanks! Sorry, but I haven't looked up this bit in the code.

Yes, for regression preventing n=1 leaves most definitely helps avoid outlier leaves. I don't have enough experience to say for classification but I'd say an outlier class lurking amongst a similar group of feature vectors shouldn't be isolated and taken as valid. Seems like overfitting to me.

parrt · 2018-09-03T23:43:25Z

Come to think of it, min_samples_split is almost what I'd want, you're right. It is doing the "don't split if less than this number" thing. OTOH, splitting by min size of current node might give less control because it doesn't consider purity of potential children, only size of current node. min_samples_leaf prevents a split of current node if a child created based upon purity would be too small. Hmm... guess we need an example.

Ok, imagine the current node has 10 samples and min_samples_split=10 so we are okay splitting. But, what if all but one predicted value were the same; one is an outlier. Splitting would create a node with a single outlier sample/value and another node with 9 samples.

Now imagine the same scenario but we have min_samples_leaf=3. Either we prevent splitting because it would create an outlier child node or we split the 10-sample node but borrow samples to create a note with three samples and a node with seven samples. I'm not sure the proper way to borrow so maybe the easiest thing to do is prevent splitting of the 10-sample node because it would create a leaf that is too small using the normal process.

That sounds like the simplest thing and is exactly what @jnothman has proposed in his doc. heh, cool!

jnothman · 2018-09-03T23:47:02Z

Yes, with very many classes that may be the case. Many users, and core developers, had expected that if the best split could not adhere to this constraint, that node would be left un-split (a stopping criterion). Rather, it keeps searching for the best split that does fit the constraint.

parrt · 2018-09-03T23:50:33Z

Interesting. I wonder how it would handle the 10-sample node case where all samples predicted the same value except for one. If min_samples_leaf=3 it doesn't seem like it would find such a best to split that fit the constraint. I guess in that case it would in fact be a stopping condition and not split the current 10-sample node. (edit: actually that depends on what features we have...oops)

jph00 · 2018-09-04T00:01:38Z

Ok, imagine the current node has 10 samples and min_samples_split=10 so we are okay splitting. But, what if all but one predicted value were the same; one is an outlier. Splitting would create a node with a single outlier sample/value and another node with 9 samples.> Now imagine the same scenario but we have min_samples_leaf=3. Either we prevent splitting because it would create an outlier child node or we split the 10-sample node but borrow samples to create a note with three samples and a node with seven samples. I'm not sure the proper way to borrow so maybe the easiest thing to do is prevent splitting of the 10-sample node because it would create a leaf that is too small using the normal process.Yes this is the kind of issue to consider. Note that it's extremely

common at the end of the tree (which is ~50% of the splits, of course!) Avoiding a tiny leaf here is important, since tiny leaves don't generalize (both for classification and regression). Also, note that all proximity matrix / rf kernel methods rely on having reasonable sized leaves. min_samples_leaf is exactly what you want to create these - min_samples_split isn't quite the correct behavior here.

jnothman · 2018-09-04T00:04:16Z

I think in doing the deprecation, we had under-considered the regression and very-many-class cases. @amueller, @glemaitre, your thoughts?

amueller · 2018-09-07T15:18:46Z

I think we should undo the deprecation until we have hard evidence for all cases.

jnothman mentioned this issue Sep 4, 2018

MNT Revert the deprecation of min_samples_leaf and min_weight_fraction_leaf #11998

Merged

rth closed this as completed in #11998 Sep 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider reversing deprecation of min_samples_leaf #11976

Consider reversing deprecation of min_samples_leaf #11976

Consider reversing deprecation of min_samples_leaf #11976

Consider reversing deprecation of min_samples_leaf #11976

Comments