-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] correct condition in decision tree construction #7441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] correct condition in decision tree construction #7441
Conversation
While I think these conditions are logically inappropriate, just in case I'm wrong, have you confirmed that they are never executed at least by the test suite? |
just verified; in the original code, never is |
I mean with something like printing a message if the condition is ever true On 16 September 2016 at 07:12, Nelson Liu notifications@github.com wrote:
|
yes, that's what i did (and nothing was printed). sorry for being unclear. |
Great, thanks. LGTM On 16 September 2016 at 08:46, Nelson Liu notifications@github.com wrote:
|
Hmmm can you explain why we should remove it? What if you make a test with |
Is it possible the test suite just isn't comprehensive enough to test cases where this might be important? Can you create by hand a dataset where you would expect that the weight would cause a difference in the splits and confirm? |
yes, that's very possible. Are said "cases where this might be important" just when you have a dataset with |
Yeah, but sample_weight for the samples in the current node, not the entire dataset. I haven't read through the test cases recently, but do we have a test where we should make a split on weighted data (not unweighted data), but don't because of min_weight_leaf? |
What does it mean logically to say that something should be a leaf if its weight is so small that it violates the |
This shouldn't be possible through the public interface, which sets (If this isn't a valid change to make, then surely changing |
IIRC Nothing would break even if you remove the whole set of conditions as the splitter will check it again. The reason why this check is given here is to check and avoid entering the splitter at all if we know it is a leaf for sure. And I think the condition should instead be
Where does "here" refer to? How did you verify? The flow, IIUC, is as follows Assume
|
I had originally thought that condition inappropriate (@nelson-liu had suggested it), but now I think you might be right. If that condition is true, there is no way to split the points at that node such that both splits satisfy Yes, I now think this is appropriate as a fast path. However, this will change the fitted trees, so we should note that in what's new. Thanks. |
No it won't change the fitted trees, as before the splitter used to mark it as leaf. Now after the corrected condition, it will not enter the splitter at all. And yes with the corrected condition in place there is no way to split it such that |
Yes, that will change the trees due to random_state. On 22 September 2016 at 10:07, Raghav RV notifications@github.com wrote:
|
Wow. Good catch! I totally missed that. Indeed the tree will change! |
following up with this; should we change the condition to |
I think so. (Sorry for changing my mind!) On 29 September 2016 at 00:14, Nelson Liu notifications@github.com wrote:
|
no need to be sorry, I'm glad we were able to come to a better solution. pushed the new change. |
@@ -218,8 +218,8 @@ cdef class DepthFirstTreeBuilder(TreeBuilder): | |||
|
|||
is_leaf = ((depth >= max_depth) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind removing all the extraneous parentheses too? :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, sorry about that.
please change PR title |
actually, I will, seeing as I'm adding a +1. LGTM |
What I'd like to see is a what's new entries that says it's more efficient but trees using the parameter will change. |
640d413
to
35ed851
Compare
@jnothman thanks for changing the title for me. I added a what's new entry and removed the extra parens. |
Thanks for the change. This looks good to me. I think it would be better if @glouppe gave his final +1 and merge :) |
LGTM too! Merging |
- Edited criterion for leaf nodes in decision tree criterion by declaring a | ||
node as a leaf if the weighted number of samples at the node is less than | ||
2 * the minimum weight specified to be at a node. This makes growth more | ||
efficient, but trees using parameters that modify the weight at each leaf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a confusing way of stating it. Will fix it up in master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @jnothman
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…#7441) * remove unused condition in decision tree construction * edit is_leaf condition for min_weight_leaf * edit ordering of statements * remove extra parens and add whats new
…#7441) * remove unused condition in decision tree construction * edit is_leaf condition for min_weight_leaf * edit ordering of statements * remove extra parens and add whats new
…#7441) * remove unused condition in decision tree construction * edit is_leaf condition for min_weight_leaf * edit ordering of statements * remove extra parens and add whats new
Reference Issue
Continuation of #7340
What does this implement/fix? Explain your changes.
Removes an unused leaf condition in
_tree.pyx
.Any other comments?
ping @jnothman, who I previously discussed this with
This change is