[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Improves memory usage and runtime for gradient boosting #26957

Merged
merged 3 commits into from
Aug 1, 2023

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Found this when reviewing #26278

What does this implement/fix? Explain your changes.

On main, a CSR matrix is passed to fit, which the tree will convert to a csc matrix here:

X = X.tocsc()
X.sort_indices()

This PR makes use of the X_csc matrix when fitting, so the tree no longer needs to make the copy. Here is a quick memory profiler benchmark:

from scipy.sparse import csc_matrix

from sklearn.datasets import make_regression
from sklearn.ensemble import GradientBoostingRegressor

X, y = make_regression(10_000, 100, random_state=0)
X_csc = csc_matrix(X)

gb = GradientBoostingRegressor(random_state=0)

gb.fit(X_csc, y)

main

Figure_1

PR

Figure_2

We can see that the PR runs faster and uses less memory overall.

@github-actions
Copy link
github-actions bot commented Jul 31, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4a35aaa. Link to the linter CI: here

Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE!

Copy link
Member
@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we first merge #26959, then update this PR and then merge this PR?

@thomasjpfan
Copy link
Member Author

Could we first merge #26959, then update this PR and then merge this PR?

For more name changes, I prefer to do it in another PR. This allows this PR to focus on the efficiency improvement.

@lorentzenchr
Copy link
Member

Could we first merge #26959, then update this PR and then merge this PR?

For more name changes, I prefer to do it in another PR. This allows this PR to focus on the efficiency improvement.

I would have expected merge conflicts, but I'm mistaken. So I'll merge.

@lorentzenchr lorentzenchr merged commit 5c4e9a0 into scikit-learn:main Aug 1, 2023
23 checks passed
9Y5 pushed a commit to 9Y5/scikit-learn that referenced this pull request Aug 2, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants