-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
docs state min_samples_leaf reduces size of the tree #10773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
if it doesn't regularize, then I agree it should be deprecated. There are
too many parameters to tune, especially when you don't realise one doesn't
work
…On 8 Mar 2018 7:50 am, "Andreas Mueller" ***@***.***> wrote:
The docs state that min_samples_leaf reduces the size of the tree. This
is wrong. The size of the tree is not influenced by min_samples_leaf. See
the discussion at #8399
<#8399>.
I would actually vote to deprecate the parameter, empirically 1 is always
best.
I would have like to have a parameter that actually implements what I
*meant* when I proposed min_samples_leaf but I'm not sure it's worth it.
It would make for nicer trees than min_samples_split imho but I don't
have the data to proof it ;)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#10773>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz67eF8Vg72LgDwPdBKLPWPGUeSNtKks5tcEgbgaJpZM4ShIp9>
.
|
Hi ... Can I give this a shot if deprecation is intended? |
I think @amueller's right that the current option results in bad splits and
poor regularisation. Yes, let's deprecate it.
…On 9 March 2018 at 17:38, FarahSaeed ***@***.***> wrote:
Hi ... Can I give this a shot if deprecation is intended?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10773 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6wu_mU1_7VFdOpB0kQcz6vE1Y8qgks5tciN9gaJpZM4ShIp9>
.
|
I think deprecation is a good step. I'd kind of like to introduce the "correct" parameter, which is what I intended and which is what's implemented in R's ranger (random forests) and C4.5. But I'm not sure how to best do that and if it's worth it. I think it would be strictly better than |
@jnothman it "regularizes" but it doesn't reduce the size of the tree. |
gotcha
|
@FarahSaeed Are you still working on this ? If not I'd like to take a look at this :) |
@gamazeps How would you mitigate this issue? |
deprecate parameter and remove it from the docs, I think. The other option would be to actually implement the thing that is implemented in other libraries. That would change behavior in a backward-incompatible way, but only if |
FFR the paper: https://arxiv.org/abs/1710.04725 according to the functional anova analysis this hyperparameter is the most important over 100 datasets (functional anova does not take into account defaults), but further analysis shows that this hyperparameter should often kept left to default. Probably also interesting for @mfeurer , as autosklearn tries to optimize this parameter and might waste resources with that. (updated, wrong link) |
@gamazeps you can try this any time... good luck with your attempt.. |
I will take a stab at this! |
Is it also appropriate to add a note to docstrings for BaseDecisionTree and BaseGradientBoosting? currently those docstrings are just 1 liner "this is an abstract class, please use derived classes". |
Fixed in #11870 |
The docs state that
min_samples_leaf
reduces the size of the tree. This is wrong. The size of the tree is not influenced bymin_samples_leaf
. See the discussion at #8399.I would actually vote to deprecate the parameter, empirically 1 is always best.
I would have like to have a parameter that actually implements what I meant when I proposed
min_samples_leaf
but I'm not sure it's worth it. It would make for nicer trees thanmin_samples_split
imho but I don't have the data to proof it ;)The text was updated successfully, but these errors were encountered: