8000 Address Julien's comments by jshinm · Pull Request #27 · neurodata/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Address Julien's comments #27

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 19, 2022
Merged

Conversation

jshinm
Copy link
@jshinm jshinm commented Nov 19, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

@jshinm jshinm requested a review from adam2392 November 19, 2022 08:48
@jshinm jshinm self-assigned this Nov 19, 2022
@jshinm
Copy link
Author
jshinm commented Nov 19, 2022
(sklearn-dev) PS C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn> pytest .\sklearn\metrics\
============================================================================================= test session starts =============================================================================================
platform win32 -- Python 3.9.13, pytest-7.2.0, pluggy-1.0.0
rootdir: C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn, configfile: setup.cfg
collected 2826 items

sklearn\metrics\_classification.py ssssssssssssssssss                                                                                                                                                    [  0%]
sklearn\metrics\_ranking.py ssssssssss                                                                                                                                                                   [  0%]
sklearn\metrics\_regression.py sssssssssssssss                                                                                                                                                           [  1%]
sklearn\metrics\_scorer.py s                                                                                                                                                                             [  1%]
sklearn\metrics\pairwise.py sssssss                                                                                                                                                                      [  1%]
sklearn\metrics\_plot\confusion_matrix.py sss                                                                                                                                                            [  1%] 
sklearn\metrics\_plot\det_curve.py sss                                                                                                                                                                   [  2%]
sklearn\metrics\_plot\precision_recall_curve.py sss                                                                                                                                                      [  2%] 
sklearn\metrics\_plot\roc_curve.py sss                                                                                                                                                                   [  2%] 
sklearn\metrics\_plot\tests\test_base.py .....                                                                                                                                                           [  2%]
sklearn\metrics\_plot\tests\test_common_curve_display.py ssssssssssssssssssssssssssssss                                                                                                                  [  3%]
sklearn\metrics\_plot\tests\test_confusion_matrix_display.py ssssssssssssssssssssssssssssssssssssss                                                                                                      [  4%]
sklearn\metrics\_plot\tests\test_det_curve_display.py ssssssssssssssssss                                                                                                                                 [  5%]
sklearn\metrics\_plot\tests\test_precision_recall_display.py sssssssssssssssss                                                                                                                           [  6%]
sklearn\metrics\_plot\tests\test_roc_curve_display.py sssssssssssssssssssssssssssssssssssssssssssss                                                                                                      [  7%]
sklearn\metrics\cluster\_supervised.py sssssssss                                                                                                                                                         [  7%] 
sklearn\metrics\cluster\tests\test_bicluster.py ...                                                                                                                                                      [  8%]
sklearn\metrics\cluster\tests\test_common.py ..................................................................                                                                                          [ 10%]
sklearn\metrics\cluster\tests\test_supervised.py ....................................                                                                                                                    [ 11%]
sklearn\metrics\cluster\tests\test_unsupervised.py ...........                                                                                                                                           [ 12%]
sklearn\metrics\tests\test_classification.py ............................................................................................................................                                [ 16%]
........................................................................................................................................................................................................ [ 29%] 
........................................................................................................................................................................................................ [ 36%] 
........................................................................................................................................................................................................ [ 43%] 
........................................................................................................................................................................................................ [ 50%] 
........................................................................................................................................................................................................ [ 57%] 
........................................                                                                                                                                                                 [ 59%] 
sklearn\metrics\tests\test_dist_metrics.py ............................................................................................................................................................. [ 64%] 
.......................                                                                                                                                                                                  [ 65%] 
sklearn\metrics\tests\test_pairwise.py s...................ss.....ss.....ss..................s.s.s.s......ssssss......sss...sss...s.....ss...ssss....ss...........................x.x................... [ 71%] 
.............................................ssssssssssssssssssssssss........................                                                                                                            [ 74%] 
sklearn\metrics\tests\test_pairwise_distances_reduction.py ............................................................................................................................................. [ 79%] 
...............................xx..xx......................xx..xx....................................................................................................................................... [ 86%] 
.................................                                                                                                                                                                        [ 87%] 
sklearn\metrics\tests\test_ranking.py .................................................................................................................................................................. [ 93%] 
.......................                                                                                                                                                                                  [ 94%] 
sklearn\metrics\tests\test_regression.py ...........................                                                                                                                                     [ 95%] 
sklearn\metrics\tests\test_score_objects.py ....................................................................................
8000
......................................................                   [100%] 

========================================================================= 2540 passed, 276 skipped, 10 xfailed, 59 warnings in 49.54s ========================================================================= 
C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn\sklearn\utils\_testing.py:450: UserWarning: Could not delete temporary folder C:\Users\jongm\AppData\Local\Temp\sklearn_testing_uroyj_lf
  warnings.warn("Could not delete temporary folder %s" % folder_path)
C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn\sklearn\utils\_testing.py:450: UserWarning: Could not delete temporary folder C:\Users\jongm\AppData\Local\Temp\sklearn_testing___n98nqq
  warnings.warn("Could not delete temporary folder %s" % folder_path)
(sklearn-dev) PS C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn> pytest .\sklearn\tree\
============================================================================================= test session starts ============================================================================================= 
platform win32 -- Python 3.9.13, pytest-7.2.0, pluggy-1.0.0
rootdir: C:\Users\jongm\Desktop\workspace\JOVO\sklearn\scikit-learn, configfile: setup.cfg
collected 349 items

sklearn\tree\_classes.py ssss                                                                                                                                                                            [  1%] 
sklearn\tree\_export.py sss                                                                                                                                                                              [  2%] 
sklearn\tree\tests\test_export.py ......sss                                                                                                                                                              [  4%] 
sklearn\tree\tests\test_reingold_tilford.py ..                                                                                                                                                           [  5%] 
sklearn\tree\tests\test_tree.py ........................................................................................................................................................................ [ 53%] 
...................................................................................................................................................................                                      [100%] 

================================================================================ 339 passed, 10 skipped, 49 warnings in 12.62s ================================================================================ 

@adam2392
Copy link
Collaborator

What are the asv results?

FYI You can't run anything else while running that benchmark.

Copy link
Collaborator
@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions and minor clean up. Please run the asv benchmark for RF and post the results here when you are done w/ the cleanup. Remember to not run anything in the background while the asv is running to not confound the results.

Add our names and emails to the top few lines of .pyx file. E.g. where

#          Jacob Schreiber <jmschreiber91@gmail.com>
#          Nelson Liu <nelson@nelsonliu.me>
#

is in the same format to address credit for making changes.

@adam2392
Copy link
Collaborator

You don't need to post the output of the tests: #27 (comment)

The unit-tests are ran via the CI jobs. What you do need to show is the asv benchmark, since that is not run until PRs are merged into main.

@jshinm
Copy link
8000
Author
jshinm commented Nov 19, 2022

What are the asv results?

FYI You can't run anything else while running that benchmark.

@adam2392 do they look okay? I didn't run main when I ran crit, but the env is the same. And I didn't run anything while running the benchmark

(sklearn-main) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv show main
Commit: main <main>

ensemble.RandomForestClassifierBenchmark.peakmem_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        188M 
       sparse       422M 
  ================ ======
  started: 2022-11-19 10:53:56, duration: 10.6s

ensemble.RandomForestClassifierBenchmark.peakmem_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        190M 
       sparse       406M 
  ================ ======
  started: 2022-11-19 10:54:06, duration: 1.21s

ensemble.RandomForestClassifierBenchmark.time_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ============
  --                  n_jobs   
  ---------------- ------------
   representation       1      
  ================ ============
       dense         3.94±0s   
       sparse       6.41±0.02s 
  ================ ============
  started: 2022-11-19 10:54:08, duration: 1.07m

ensemble.RandomForestClassifierBenchmark.time_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ===========
  --                  n_jobs  
  ---------------- -----------
   representation       1     
  ================ ===========
       dense        131±0.8ms 
       sparse        871±2ms  
  ================ ===========
  started: 2022-11-19 10:55:12, duration: 47.1s

ensemble.RandomForestClassifierBenchmark.track_test_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.7464271763500541 
       sparse       0.8656423941766682 
  ================ ====================
  started: 2022-11-19 10:55:59, duration: 357ms

ensemble.RandomForestClassifierBenchmark.track_train_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.9968171694224932 
       sparse       0.9996123288718864 
  ================ ====================
  started: 2022-11-19 10:55:59, duration: 1.21s
(crit) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv show crit
Commit: crit <crit>

ensemble.RandomForestClassifierBenchmark.peakmem_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        188M 
       sparse       422M 
  ================ ======
  started: 2022-11-19 01:51:42, duration: 10.6s

ensemble.RandomForestClassifierBenchmark.peakmem_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ======
  --               n_jobs
  ---------------- ------
   representation    1   
  ================ ======
       dense        189M 
       sparse       406M 
  ================ ======
  started: 2022-11-19 01:51:52, duration: 1.21s

ensemble.RandomForestClassifierBenchmark.time_fit [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ============
  --                  n_jobs   
  ---------------- ------------
   representation       1      
  ================ ============
       dense        4.02±0.02s 
       sparse       6.37±0.01s 
  ================ ============
  started: 2022-11-19 01:51:53, duration: 1.08m

ensemble.RandomForestClassifierBenchmark.time_predict [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ =========
  --                 n_jobs 
  ---------------- ---------
   representation      1    
  ================ =========
       dense        131±1ms 
       sparse       863±3ms 
  ================ =========
  started: 2022-11-19 01:52:58, duration: 46.7s

ensemble.RandomForestClassifierBenchmark.track_test_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.7464271763500541 
       sparse       0.8656423941766682 
  ================ ====================
  started: 2022-11-19 01:53:45, duration: 358ms

ensemble.RandomForestClassifierBenchmark.track_train_score [crit/conda-py3.9-cython-joblib-numpy-pandas-scipy-threadpoolctl]
  ok
  ================ ====================
  --                      n_jobs       
  ---------------- --------------------
   representation           1          
  ================ ====================
       dense        0.9968171694224932 
       sparse       0.9996123288718864 
  ================ ====================
  started: 2022-11-19 01:53:45, duration: 1.21s

@adam2392
Copy link
Collaborator
adam2392 commented Nov 19, 2022

You can compare the two like done in scikit-learn#24678 (comment)

Maybe check https://asv.readthedocs.io/en/stable/reference.html to see how you can use their CLI tools to directly compare the two results you have? Or some plot.

Skimming the numbers they look okay tho. But for Julien and sklearn reviewers, best to have it in an easier format to read.

@jshinm
Copy link
Author
jshinm commented Nov 19, 2022

Here is the comparison from asv CLI, though, I realized that the crit I ran was the one before reflecting on Julien's comments, so I'll run the benchmark again and maybe draw some figures

(crit) jshinm@jshinm-OMEN-by-HP-Laptop-16-b0xxx:~/Desktop/workstation/sklearn-jms/asv_benchmarks$ asv compare main crit

All benchmarks:

       before           after         ratio
     [9c9c8582]       [106b01cf]
     <main>           <crit>   
     
             188M             188M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_fit('dense', 1)
             422M             422M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_fit('sparse', 1)
             190M             189M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('dense', 1)
             406M             406M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('sparse', 1)
          3.94±0s       4.02±0.02s     1.02  ensemble.RandomForestClassifierBenchmark.time_fit('dense', 1)
       6.41±0.02s       6.37±0.01s     0.99  ensemble.RandomForestClassifierBenchmark.time_fit('sparse', 1)
        131±0.8ms          131±1ms     1.00  ensemble.RandomForestClassifierBenchmark.time_predict('dense', 1)
          871±2ms          863±3ms     0.99  ensemble.RandomForestClassifierBenchmark.time_predict('sparse', 1)
  0.7464271763500541  0.7464271763500541     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('dense', 1)
  0.8656423941766682  0.8656423941766682     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('sparse', 1)
  0.9968171694224932  0.9968171694224932     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('dense', 1)
  0.9996123288718864  0.9996123288718864     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('sparse', 1)

@adam2392
Copy link
Collaborator

Sweet! Nah that looks good. Basically means no regressions in performance as expected.

@adam2392
Copy link
Collaborator

Can you copy the asv results over to the main PR and information about your testing machine?

@adam2392 adam2392 merged commit 8442333 into neurodata:crit Nov 19, 2022
@jshinm jshinm deleted the jms-update-crit branch December 13, 2022 06:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0