8000 MAINT Introduce `BaseCriterion` as a base abstraction for `Criterion`s by adam2392 · Pull Request #24678 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

MAINT Introduce BaseCriterion as a base abstraction for Criterions #24678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

adam2392
Copy link
Member

Reference Issues/PRs

Fixes #24577

What does this implement/fix? Explain your changes.

Adds the BaseCriterion class as an abstract class for just the API relevant changes, while keeping Criterion as the assumed "supervised learning" criterion.

This change is backwards compatible.

Any other comments?

@ogrisel
Copy link
Member
ogrisel commented Oct 17, 2022

I think this PR includes by mistake unrelated Cython files from the sklearn/metrics folder.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to check for BaseCriterion instead of Criterion in:

to make it easier to subclass BaseDecisionTree with a custom criterion although other parts of the fit method assume a float64 dtyped y array. The various steps of the fit method should probably be split in private submethod to make it easier to subclass. This can be addressed in subsequent PRs.

@adam2392 adam2392 requested review from ogrisel and glemaitre and removed request for ogrisel October 18, 2022 17:06
@adam2392
Copy link
Member Author

Cross-linking: #20648

@adam2392
Copy link
Member Author
adam2392 10000 commented Nov 3, 2022

Hi @jjerphan @ogrisel and @glemaitre just wanted to do a light ping to see if there's any feedback regarding this and if this is good to merge? This is probably the most simple change I proposed in the roadmap for improving trees.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this work and for the heads-up, @adam2392.

Is it possible to run the RandomForest ASV benchmarks to assess any potential performance regression w.r.t main?

In the meantime, here are some comments and suggestions.

@adam2392

This comment was marked as outdated.

@adam2392
Copy link
Member Author
adam2392 commented Nov 8, 2022

@jshinm has volunteered to try to address the comments you outlined in the code. Thanks!

@adam2392
Copy link
Member Author
adam2392 commented Nov 9, 2022

I re-ran the asv benchmark after closing all open apps on my Mac M1:

(sklearn) adam2392@Adams-MacBook-Pro asv_benchmarks % asv compare --split upstream/main crit

Benchmarks that have improved:

       before           after         ratio
     [3e47fa91]       [106b01cf]
     <main>           <crit>    
-            555M             495M     0.89  ensemble.RandomForestClassifierBenchmark.peakmem_fit('sparse', 1)

Benchmarks that have stayed the same:

       before           after         ratio
     [3e47fa91]       [106b01cf]
     <main>           <crit>    
             267M             281M     1.05  ensemble.RandomForestClassifierBenchmark.peakmem_fit('dense', 1)
             218M             218M     1.00  ensemble.RandomForestClassifierBenchmark.peakmem_predict('dense', 1)
             439M             435M     0.99  ensemble.RandomForestClassifierBenchmark.peakmem_predict('sparse', 1)
          4.82±0s          4.75±0s     0.98  ensemble.RandomForestClassifierBenchmark.time_fit('dense', 1)
       7.25±0.01s          7.13±0s     0.98  ensemble.RandomForestClassifierBenchmark.time_fit('sparse', 1)
          137±8ms        132±0.4ms     0.97  ensemble.RandomForestClassifierBenchmark.time_predict('dense', 1)
         980±40ms         907±50ms     0.93  ensemble.RandomForestClassifierBenchmark.time_predict('sparse', 1)
  0.7502017055240395  0.7502017055240395     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('dense', 1)
  0.8656423941766682  0.8656423941766682     1.00  ensemble.RandomForestClassifierBenchmark.track_test_score('sparse', 1)
  0.9971484595841501  0.9971484595841501     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('dense', 1)
  0.9996123288718864  0.9996123288718864     1.00  ensemble.RandomForestClassifierBenchmark.track_train_score('sparse', 1)

There are no performance regressions with including this refactoring. However, not sure why we saw an improvement in memory for sparse data. Might be chance? Saw similar results when I ran it yesterday tho (#24678 (comment)) cc: @jjerphan

Our next steps are for @jshinm to re-run the asv benchmarks and to also address the code changes.

@adam2392
Copy link
Member Author

Just wanted to follow up to get some idea what might be blocking this?

We have several downstream PRs that fully flush out the modularity of criterion/splitter in the right directions w/o introducing performance degradation, so don't wanna continue working on those if something is drastically going to change here.

Thanks!

@jjerphan
Copy link
Member

Just wanted to follow up to get some idea what might be blocking this?

It has not got response nor attention mainly because of maintainers' lack of time, unfortunately.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vincent-Maladiere and I are currently taking some time to get a better understanding of those implementations.

In the meantime, I think we better use a feature branch for such refactor (this will allow us to integrate this work more safely and confidently). In this regards, I have created refactor/cython-tree-class-hierarchy from main.

@adam2392: could you change the target branch of this PR to refactor/cython-tree-class-hierarchy?

@adam2392 adam2392 changed the base branch from main to refactor/cython-tree-class-hierarchy January 26, 2023 18:10
@adam2392 adam2392 dismissed jjerphan’s stale review January 26, 2023 18:10

The base branch was changed.

@adam2392 adam2392 requested review from jjerphan and removed request for ogrisel January 26, 2023 18:10
@adam2392
Copy link
Member Author

@adam2392: could you change the target branch of this PR to refactor/cython-tree-class-hierarchy?

Done!

@adam2392
Copy link
Member Author

The only CI issue is the CHANGELOG, but IIRC, that is fine.

@adam2392
Copy link
Member Author

Here is an example of how such a refactoring is utilized: https://github.com/neurodata/scikit-tree/blob/main/sktree/tree/_unsup_criterion.pyx

@haiatn
Copy link
Contributor
haiatn commented Jul 29, 2023

Good job!

@jjerphan jjerphan deleted the branch scikit-learn:refactor/cython-tree-class-hierarchy January 25, 2024 14:19
@jjerphan jjerphan closed this Jan 25, 2024
@lorentzenchr
Copy link
Member

What happened here? Why was it closed?

@jjerphan
Copy link
Member

Sorry for the annoyance.< DBAF /p>

@adrinjalali mentioned the scikit-learn:refactor/cython-tree-class-hierarchy branch to me which seemed not to be used. I did not remember that this PR target to it, hence I deleted it, closing this PR. 🤦

I just have repushed the branch since I had a copy on my clone locally.

@adam2392: I cannot reopen this PR, can you?

@adam2392
Copy link
Member Author
adam2392 commented Jan 29, 2024

Sure I can re-open this PR. May I ask what is the desire in doing so? Do we think this might move forward? I don't want to get my hopes up :p

Do I target the now deleted branch, or do I target main?

@jjerphan
Copy link
Member

Do we think this might move forward?

I can't guarantee this will move forward. We need to have people be involved on the Cython side of the project (I personally am contributing to scikit-learn on my free time solely).

refactor/cython-tree-class-hierarchy is available again. Can you target it if you want to continue this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants
0