-
-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Improves memory usage and runtime for gradient boosting #26957
ENH Improves memory usage and runtime for gradient boosting #26957
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we first merge #26959, then update this PR and then merge this PR?
For more name changes, I prefer to do it in another PR. This allows this PR to focus on the efficiency improvement. |
I would have expected merge conflicts, but I'm mistaken. So I'll merge. |
Reference Issues/PRs
Found this when reviewing #26278
What does this implement/fix? Explain your changes.
On
main
, a CSR matrix is passed tofit
, which the tree will convert to a csc matrix here:scikit-learn/sklearn/tree/_tree.pyx
Lines 109 to 110 in 405a5a0
This PR makes use of the
X_csc
matrix when fitting, so the tree no longer needs to make the copy. Here is a quick memory profiler benchmark:main
PR
We can see that the PR runs faster and uses less memory overall.