MAINT Handle `Tree.sample_weight` using a memoryview #24994

adam2392 · 2022-11-20T19:55:39Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Refactors all instances where sample_weight was a DOUBLE_t pointer to a Cython DOUBLE_t memoryview.

@jshinm will assist in running asv benchmarks to confirm that there is no performance regression once #24678 is merged and this is rebased on top of those changes.

Note that changing LOC if sample_weight != NULL: to if sample_weight != None: should not impact nogil performance. See this thread: https://stackoverflow.com/questions/63144139/how-to-check-whether-a-memmoryview-in-null-in-cython#comment131530483_63167895.

Any other comments?

This should be rebased and merged after #24678 is reviewed and merged.

sklearn/tree/_criterion.pyx

sklearn/tree/_splitter.pyx

sklearn/tree/_criterion.pyx

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

sklearn/tree/_criterion.pyx

adam2392 · 2022-11-22T04:45:11Z

Kay this PR has been addressed. Waiting for review/merge of #24678 .

@jshinm feel free to take a look at this PR to see what the Cython changes were made. Wanna help run asv benchmark?

jjerphan

LGTM modulo some nitpicks for using this PR as an opportunity to improve those implementations' readability.

sklearn/tree/_criterion.pxd

sklearn/tree/_criterion.pyx

sklearn/tree/_splitter.pxd

sklearn/tree/_splitter.pyx

sklearn/tree/_criterion.pyx

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

thomasjpfan

glemaitre

We just need to have a bench to show that there is no regression. Otherwise LGTM.

adam2392 · 2022-11-22T18:55:09Z

Results show no performance regression.

> asv continuous --verbose --split --bench RandomForest upstream/main sample_weight
...
 [100.00%] · Running '/usr/bin/git name-rev --name-only --exclude=remotes/* --no-undefined aca0326e67f1e3b68baab17c55b62f9510383c91'
            OUTPUT -------->
            main
[100.00%] · Running '/usr/bin/git name-rev --name-only --exclude=remotes/* --no-undefined de299b7022c7c37508d63c7f0c6cf7d078993abe'
            OUTPUT -------->
            sample_weight

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

(sklearn) adam2392@Adams-MacBook-Pro asv_benchmarks % asv compare main sample_weight

All benchmarks:

       before           after         ratio
     [aca0326e]       [de299b70]
     <main>           <sample_weight>
             264M             254M     0.96  ensemble.RandomForestClassifierBenchmark.peakmem_fit('dense', 1)
             565M             576M     1.02  ensemble.RandomForestClassifierBenchmark.peakmem_fit('sparse', 1)
             215M             217M     1.01  ensemble.RandomForestClassifierBenchmark.peakmem_predict('dense', 1)
             442M             443M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('sparse', 1)
          4.79±0s       4.87±0.01s     1.02  ensemble.RandomForestClassifierBenchmark.time_fit('dense', 1)
       7.14±0.03s       7.14±0.01s     1.00  ensemble.RandomForestClassifierBenchmark.time_fit('sparse', 1)
         140±10ms          137±5ms     0.98  ensemble.RandomForestClassifierBenchmark.time_predict('dense', 1)
         944±10ms         974±20ms     1.03  ensemble.RandomForestClassifierBenchmark.time_predict('sparse', 1)
  0.7502017055240395  0.7502017055240395     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('dense', 1)
  0.8656423941766682  0.8656423941766682     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('sparse', 1)
  0.9971484595841501  0.9971484595841501     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('dense', 1)
  0.9996123288718864  0.9996123288718864     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('sparse', 1)

Made changes to all sample_weight instances

d86ea44

github-actions bot added module:tree cython labels Nov 20, 2022

Remove extraneous import

30ddc2c

adam2392 changed the title ~~[MAINT] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type~~ [MAINT, Tree] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type Nov 20, 2022

glemaitre reviewed Nov 21, 2022

View reviewed changes

adam2392 commented Nov 22, 2022

View reviewed changes

sklearn/tree/_criterion.pyx Outdated Show resolved Hide resolved

Apply suggestions from code review

1567ee7

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

adam2392 commented Nov 22, 2022

View reviewed changes

sklearn/tree/_criterion.pyx Outdated Show resolved Hide resolved

adam2392 added 2 commits November 21, 2022 23:36

Merge branch 'main' into sample_weight

fd3ddff

Adding comments about memoryview

5ee3c07

jjerphan changed the title ~~[MAINT, Tree] Convert 'sample_weight' parameter type to a Cython memoryview from its previous C pointer type~~ MAINT Handle Tree.sample_weight using a memoryview Nov 22, 2022

jjerphan approved these changes Nov 22, 2022

View reviewed changes

adam2392 and others added 4 commits November 22, 2022 11:55

Apply suggestions from code review

c251d98

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Fix indentation of functions

3e4cbc4

Adding reformatting to function signatures

bb457b9

Merge branch 'main' into sample_weight

de299b7

adam2392 mentioned this pull request Nov 22, 2022

MAINT Convert samples parameter in Criterion classes to memory view #25004

Closed

thomasjpfan reviewed Nov 22, 2022

View reviewed changes

glemaitre approved these changes Nov 22, 2022

View reviewed changes

jjerphan merged commit 9268eea into scikit-learn:main Nov 22, 2022

adam2392 deleted the sample_weight branch November 22, 2022 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Handle `Tree.sample_weight` using a memoryview #24994

MAINT Handle `Tree.sample_weight` using a memoryview #24994

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MAINT Handle Tree.sample_weight using a memoryview #24994

MAINT Handle Tree.sample_weight using a memoryview #24994

Uh oh!

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

MAINT Handle `Tree.sample_weight` using a memoryview #24994

MAINT Handle `Tree.sample_weight` using a memoryview #24994