Address Julien's comments #27

jshinm · 2022-11-19T08:48:27Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

…rphan

…rphan

…jerphan

…jerphan

jshinm · 2022-11-19T08:50:56Z

(sklearn-dev) PS C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn> pytest .\sklearn\metrics\
============================================================================================= test session starts =============================================================================================
platform win32 -- Python 3.9.13, pytest-7.2.0, pluggy-1.0.0
rootdir: C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn, configfile: setup.cfg
collected 2826 items

sklearn\metrics\_classification.py ssssssssssssssssss                                                                                                                                                    [  0%]
sklearn\metrics\_ranking.py ssssssssss                                                                                                                                                                   [  0%]
sklearn\metrics\_regression.py sssssssssssssss                                                                                                                                                           [  1%]
sklearn\metrics\_scorer.py s                                                                                                                                                                             [  1%]
sklearn\metrics\pairwise.py sssssss                                                                                                                                                                      [  1%]
sklearn\metrics\_plot\confusion_matrix.py sss                                                                                                                                                            [  1%] 
sklearn\metrics\_plot\det_curve.py sss                                                                                                                                                                   [  2%]
sklearn\metrics\_plot\precision_recall_curve.py sss                                                                                                                                                      [  2%] 
sklearn\metrics\_plot\roc_curve.py sss                                                                                                                                                                   [  2%] 
sklearn\metrics\_plot\tests\test_base.py .....                                                                                                                                                           [  2%]
sklearn\metrics\_plot\tests\test_common_curve_display.py ssssssssssssssssssssssssssssss                                                                                                                  [  3%]
sklearn\metrics\_plot\tests\test_confusion_matrix_display.py ssssssssssssssssssssssssssssssssssssss                                                                                                      [  4%]
sklearn\metrics\_plot\tests\test_det_curve_display.py ssssssssssssssssss                                                                                                                                 [  5%]
sklearn\metrics\_plot\tests\test_precision_recall_display.py sssssssssssssssss                                                                                                                           [  6%]
sklearn\metrics\_plot\tests\test_roc_curve_display.py sssssssssssssssssssssssssssssssssssssssssssss                                                                                                      [  7%]
sklearn\metrics\cluster\_supervised.py sssssssss                                                                                                                                                         [  7%] 
sklearn\metrics\cluster\tests\test_bicluster.py ...                                                                                                                                                      [  8%]
sklearn\metrics\cluster\tests\test_common.py ..................................................................                                                                                          [ 10%]
sklearn\metrics\cluster\tests\test_supervised.py ....................................                                                                                                                    [ 11%]
sklearn\metrics\cluster\tests\test_unsupervised.py ...........                                                                                                                                           [ 12%]
sklearn\metrics\tests\test_classification.py ............................................................................................................................                                [ 16%]
........................................................................................................................................................................................................ [ 29%] 
........................................................................................................................................................................................................ [ 36%] 
........................................................................................................................................................................................................ [ 43%] 
........................................................................................................................................................................................................ [ 50%] 
........................................................................................................................................................................................................ [ 57%] 
........................................                                                                                                                                                                 [ 59%] 
sklearn\metrics\tests\test_dist_metrics.py ............................................................................................................................................................. [ 64%] 
.......................                                                                                                                                                                                  [ 65%] 
sklearn\metrics\tests\test_pairwise.py s...................ss.....ss.....ss..................s.s.s.s......ssssss......sss...sss...s.....ss...ssss....ss...........................x.x................... [ 71%] 
.............................................ssssssssssssssssssssssss........................                                                                                                            [ 74%] 
sklearn\metrics\tests\test_pairwise_distances_reduction.py ............................................................................................................................................. [ 79%] 
...............................xx..xx......................xx..xx....................................................................................................................................... [ 86%] 
.................................                                                                                                                                                                        [ 87%] 
sklearn\metrics\tests\test_ranking.py .................................................................................................................................................................. [ 93%] 
.......................                                                                                                                                                                                  [ 94%] 
sklearn\metrics\tests\test_regression.py ...........................                                                                                                                                     [ 95%] 
sklearn\metrics\tests\test_score_objects.py ....................................................................................
8000
......................................................                   [100%] 

========================================================================= 2540 passed, 276 skipped, 10 xfailed, 59 warnings in 49.54s ========================================================================= 
C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn\sklearn\utils\_testing.py:450: UserWarning: Could not delete temporary folder C:\Users\jongm\AppData\Local\Temp\sklearn_testing_uroyj_lf
  warnings.warn("Could not delete temporary folder %s" % folder_path)
C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn\sklearn\utils\_testing.py:450: UserWarning: Could not delete temporary folder C:\Users\jongm\AppData\Local\Temp\sklearn_testing___n98nqq
  warnings.warn("Could not delete temporary folder %s" % folder_path)

(sklearn-dev) PS C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn> pytest .\sklearn\tree\
============================================================================================= test session starts ============================================================================================= 
platform win32 -- Python 3.9.13, pytest-7.2.0, pluggy-1.0.0
rootdir: C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn, configfile: setup.cfg
collected 349 items

sklearn\tree\_classes.py ssss                                                                                                                                                                            [  1%] 
sklearn\tree\_export.py sss                                                                                                                                                                              [  2%] 
sklearn\tree\tests\test_export.py ......sss                                                                                                                                                              [  4%] 
sklearn\tree\tests\test_reingold_tilford.py ..                                                                                                                                                           [  5%] 
sklearn\tree\tests\test_tree.py ........................................................................................................................................................................ [ 53%] 
...................................................................................................................................................................                                      [100%] 

================================================================================ 339 passed, 10 skipped, 49 warnings in 12.62s ================================================================================

adam2392 · 2022-11-19T14:16:31Z

What are the asv results?

FYI You can't run anything else while running that benchmark.

sklearn/tree/_criterion.pxd

sklearn/tree/_criterion.pyx

adam2392

Some suggestions and minor clean up. Please run the asv benchmark for RF and post the results here when you are done w/ the cleanup. Remember to not run anything in the background while the asv is running to not confound the results.

Add our names and emails to the top few lines of .pyx file. E.g. where

#          Jacob Schreiber <jmschreiber91@gmail.com>
#          Nelson Liu <nelson@nelsonliu.me>
#

is in the same format to address credit for making changes.

adam2392 · 2022-11-19T17:33:29Z

You don't need to post the output of the tests: #27 (comment)

The unit-tests are ran via the CI jobs. What you do need to show is the asv benchmark, since that is not run until PRs are merged into main.

jshinm · 2022-11-19T17:36:27Z

What are the asv results?

FYI You can't run anything else while running that benchmark.

@adam2392 do they look okay? I didn't run main when I ran crit, but the env is the same. And I didn't run anything while running the benchmark

(sklearn-main) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv show main
Commit: main <main>

ensemble.RandomForestClassifierBenchmark.peakmem_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        188M 
       sparse       422M 
  ================ ======
  started: 2022-11-19 10:53:56, duration: 10.6s

ensemble.RandomForestClassifierBenchmark.peakmem_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        190M 
       sparse       406M 
  ================ ======
  started: 2022-11-19 10:54:06, duration: 1.21s

ensemble.RandomForestClassifierBenchmark.time_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ============
  --                  n_jobs   
  ---------------- ------------
   representation       1      
  ================ ============
       dense         3.94±0s   
       sparse       6.41±0.02s 
  ================ ============
  started: 2022-11-19 10:54:08, duration: 1.07m

ensemble.RandomForestClassifierBenchmark.time_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ===========
  --                  n_jobs  
  ---------------- -----------
   representation       1     
  ================ ===========
       dense        131±0.8ms 
       sparse        871±2ms  
  ================ ===========
  started: 2022-11-19 10:55:12, duration: 47.1s

ensemble.RandomForestClassifierBenchmark.track_test_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.7464271763500541 
       sparse       0.8656423941766682 
  ================ ====================
  started: 2022-11-19 10:55:59, duration: 357ms

ensemble.RandomForestClassifierBenchmark.track_train_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.9968171694224932 
       sparse       0.9996123288718864 
  ================ ====================
  started: 2022-11-19 10:55:59, duration: 1.21s

(crit) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv show crit
Commit: crit <crit>

ensemble.RandomForestClassifierBenchmark.peakmem_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        188M 
       sparse       422M 
  ================ ======
  started: 2022-11-19 01:51:42, duration: 10.6s

ensemble.RandomForestClassifierBenchmark.peakmem_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        189M 
       sparse       406M 
  ================ ======
  started: 2022-11-19 01:51:52, duration: 1.21s

ensemble.RandomForestClassifierBenchmark.time_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ============
  --                  n_jobs   
  ---------------- ------------
   representation       1      
  ================ ============
       dense        4.02±0.02s 
       sparse       6.37±0.01s 
  ================ ============
  started: 2022-11-19 01:51:53, duration: 1.08m

ensemble.RandomForestClassifierBenchmark.time_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ =========
  --                 n_jobs 
  ---------------- ---------
   representation      1    
  ================ =========
       dense        131±1ms 
       sparse       863±3ms 
  ================ =========
  started: 2022-11-19 01:52:58, duration: 46.7s

ensemble.RandomForestClassifierBenchmark.track_test_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.7464271763500541 
       sparse       0.8656423941766682 
  ================ ====================
  started: 2022-11-19 01:53:45, duration: 358ms

ensemble.RandomForestClassifierBenchmark.track_train_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.9968171694224932 
       sparse       0.9996123288718864 
  ================ ====================
  started: 2022-11-19 01:53:45, duration: 1.21s

adam2392 · 2022-11-19T17:37:45Z

You can compare the two like done in scikit-learn#24678 (comment)

Maybe check https://asv.readthedocs.io/en/stable/reference.html to see how you can use their CLI tools to directly compare the two results you have? Or some plot.

Skimming the numbers they look okay tho. But for Julien and sklearn reviewers, best to have it in an easier format to read.

Co-authored-by: Adam Li <adam2392@gmail.com>

jshinm · 2022-11-19T18:19:55Z

Here is the comparison from asv CLI, though, I realized that the crit I ran was the one before reflecting on Julien's comments, so I'll run the benchmark again and maybe draw some figures

(crit) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv compare main crit

All benchmarks:

       before           after         ratio
     [9c9c8582]       [106b01cf]
     <main>           <crit>   
     
             188M             188M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_fit('dense', 1)
             422M             422M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_fit('sparse', 1)
             190M             189M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('dense', 1)
             406M             406M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('sparse', 1)
          3.94±0s       4.02±0.02s     1.02  ensemble.RandomForestClassifierBenchmark.time_fit('dense', 1)
       6.41±0.02s       6.37±0.01s     0.99  ensemble.RandomForestClassifierBenchmark.time_fit('sparse', 1)
        131±0.8ms          131±1ms     1.00  ensemble.RandomForestClassifierBenchmark.time_predict('dense', 1)
          871±2ms          863±3ms     0.99  ensemble.RandomForestClassifierBenchmark.time_predict('sparse', 1)
  0.7464271763500541  0.7464271763500541     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('dense', 1)
  0.8656423941766682  0.8656423941766682     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('sparse', 1)
  0.9968171694224932  0.9968171694224932     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('dense', 1)
  0.9996123288718864  0.9996123288718864     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('sparse', 1)

adam2392 · 2022-11-19T19:14:53Z

Sweet! Nah that looks good. Basically means no regressions in performance as expected.

adam2392 · 2022-11-19T19:15:51Z

Can you copy the asv results over to the main PR and information about your testing machine?

jshinm added 4 commits November 19, 2022 02:24

remove entry from update log

04a4da8

move pxd comments to pyx docstring and made changes suggested by @jje…

13fbc1d

…rphan

move pxd comments to class docstring and make changes suggested by @j…

0886c84

…jerphan

update docstring as suggested by @jjerphan

332fc38

jshinm requested a review from adam2392 November 19, 2022 08:48

jshinm self-assigned this Nov 19, 2022

adam2392 reviewed Nov 19, 2022

View reviewed changes

sklearn/tree/_criterion.pxd Show resolved Hide resolved

adam2392 reviewed Nov 19, 2022

View reviewed changes

sklearn/tree/_criterion.pyx Outdated Show resolved Hide resolved

adam2392 reviewed Nov 19, 2022

View reviewed changes

sklearn/tree/_criterion.pyx Outdated Show resolved Hide resolved

adam2392 requested changes Nov 19, 2022

View reviewed changes

jshinm and others added 3 commits November 19, 2022 12:38

Update sklearn/tree/_criterion.pyx

76316d6

Co-authored-by: Adam Li <adam2392@gmail.com>

fix indentation as @adam2392 suggested

aefa276

add contributors

b139954

adam2392 approved these changes Nov 19, 2022

View reviewed changes

adam2392 merged commit 8442333 into neurodata:crit Nov 19, 2022

jshinm deleted the jms-update-crit branch December 13, 2022 06:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address Julien's comments #27

Address Julien's comments #27

Address Julien's comments #27

Address Julien's comments #27

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment