8000 ENH reuse parent histogram in HGBT by lorentzenchr · Pull Request #26189 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

ENH reuse parent histogram in HGBT #26189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR reuses the parent's histogram for the feature that was split on. This saves a little time.

Any other comments?

The implementation is no beauty. If we want to include this improvement, suggestions or alternative PRs are welcome.

@lorentzenchr
Copy link
Member Author
lorentzenchr commented Apr 15, 2023

Simple benchmark

On the higgs dataset, this PR gives ~4% speed-up, which is about the expected amount (1 / 28 features = 3.5%).

Running

python benchmarks/bench_hist_gradient_boosting_higgsboson.py --n-trees 100

on main gives

Fit 100 trees in 68.848 s, (3100 total leaves)
Time spent computing histograms: 39.301s
Time spent finding best splits:  0.259s
Time spent applying splits:      8.559s
Time spent predicting:           1.956s
fitted in 69.034s
predicted in 8.757s, ROC AUC: 0.8228, ACC: 0.7415

On this PR:

Fit 100 trees in 65.878 s, (3100 total leaves)
Time spent computing histograms: 38.698s
Time spent finding best splits:  0.236s
Time spent applying splits:      8.407s
Time spent predicting:           1.964s
fitted in 66.047s
predicted in 8.215s, ROC AUC: 0.8228, ACC: 0.7415

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! I left a first pass review.

@lorentzenchr lorentzenchr force-pushed the hgbt_reuse_parent_hist branch from 9fb9006 to b1efb34 Compare September 6, 2023 20:33
@github-actions
Copy link
github-actions bot commented Sep 6, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e0659e4. Link to the linter CI: here

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I do not think this change is too complex, it still adds more complexity to the codebase. Overall, do you thi 8000 nk the complexity is worth the 1-1/n_features performance bump?

@jjerphan
Copy link
Member
jjerphan commented Sep 8, 2024

@lorentzenchr: should we continue and merge this PR, or not?

@lorentzenchr
Copy link
Member Author

@lorentzenchr: should we continue and merge this PR, or not?

Only if I get the commitment of 2 core devs willing to review and finally approve.

@jjerphan
Copy link
Member
jjerphan commented Sep 9, 2024

I would be glad reviewing this PR, but I would rather finish what I started first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0