8000 MAINT Handle `Tree.sample_weight` using a memoryview by adam2392 · Pull Request #24994 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Handle Tree.sample_weight using a memoryview #24994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 22, 2022

Conversation

adam2392
Copy link
Member

Reference Issues/PRs

Fixes: #24987

What does this implement/fix? Explain your changes.

Refactors all instances where sample_weight was a DOUBLE_t pointer to a Cython DOUBLE_t memoryview.

@jshinm will assist in running asv benchmarks to confirm that there is no performance regression once #24678 is merged and this is rebased on top of those changes.

Note that changing LOC if sample_weight != NULL: to if sample_weight != None: should not impact nogil performance. See this thread: https://stackoverflow.com/questions/63144139/how-to-check-whether-a-memmoryview-in-null-in-cython#comment131530483_63167895.

Any other comments?

This should be rebased and merged after #24678 is reviewed and merged.

@adam2392 adam2392 changed the title [MAINT] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type [MAINT, Tree] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type Nov 20, 2022
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@adam2392
Copy link
Member Author

Kay this PR has been addressed. Waiting for review/merge of #24678 .

@jshinm feel free to take a look at this PR to see what the Cython changes were made. Wanna help run asv benchmark?

@jjerphan jjerphan changed the title [MAINT, Tree] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type MAINT Handle Tree.sample_weight using a memoryview Nov 22, 2022
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo some nitpicks for using this PR as an opportunity to improve those implementations' readability.

Copy link
Member
@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for review/merge of #24678 .

This PR looks like a net improvement and is mergable without #24678.

Do you have benchmarks comparing this PR with main?

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We just need to have a bench to show that there is no regression. Otherwise LGTM.

@adam2392
Copy link
Member Author

Results show no performance regression.

> asv continuous --verbose --split --bench RandomForest upstream/main sample_weight
...
 [100.00%] · Running '/usr/bin/git name-rev --name-only --exclude=remotes/* --no-undefined aca0326e67f1e3b68baab17c55b62f9510383c91'
            OUTPUT -------->
            main
[100.00%] · Running '/usr/bin/git name-rev --name-only --exclude=remotes/* --no-undefined de299b7022c7c37508d63c7f0c6cf7d078993abe'
            OUTPUT -------->
            sample_weight

BENCHMARKS NOT SIGNIFICANTLY CHANGED.
(sklearn) adam2392@Adams-MacBook-Pro asv_benchmarks % asv compare main sample_weight

All benchmarks:

       before           after         ratio
     [aca0326e]       [de299b70]
     <main>           <sample_weight>
             264M             254M     0.96  ensemble.RandomForestClassifierBenchmark.peakmem_fit('dense', 1)
             565M             576M     1.02  ensemble.RandomForestClassifierBenchmark.peakmem_fit('sparse', 1)
             215M             217M     1.01  ensemble.RandomForestClassifierBenchmark.peakmem_predict('dense', 1)
             442M             443M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('sparse', 1)
          4.79±0s       4.87±0.01s     1.02  ensemble.RandomForestClassifierBenchmark.time_fit('dense', 1)
       7.14±0.03s       7.14±0.01s     1.00  ensemble.RandomForestClassifierBenchmark.time_fit('sparse', 1)
         140±10ms          137±5ms     0.98  ensemble.RandomForestClassifierBenchmark.time_predict('dense', 1)
         944±10ms         974±20ms     1.03  ensemble.RandomForestClassifierBenchmark.time_predict('sparse', 1)
  0.7502017055240395  0.7502017055240395     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('dense', 1)
  0.8656423941766682  0.8656423941766682     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('sparse', 1)
  0.9971484595841501  0.9971484595841501     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('dense', 1)
  0.9996123288718864  0.9996123288718864     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('sparse', 1)

@jjerphan jjerphan merged commit 9268eea into scikit-learn:main Nov 22, 2022
@adam2392 adam2392 deleted the sample_weight branch November 22, 2022 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MAINT, Cython, Tree] Refactor 'sample_weight' double pointers to modern Cython memory views
4 participants
0