-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Accelerate slow examples #21598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@hhnnhh or @marenwestermann may be interested in this. |
Hi @adrinjalali. Could you elaborate on what "speeding up" entails? Is this about choosing a more reasonable setup of the parameters, or a more substantial refactoring of the examples? |
@cakiki ideally you'd be able to speed them up by just changing some parameters or reducing the size of the data, while being able to present the same outcome, but changing the examples a bit is also not necessarily out of scope if it's required. |
For instance, you can switch from the digits dataset to the iris dataset in the first and slowest example, and speed it up by almost 100 fold. The question is then if that still represents the benefit of |
I will work on ../examples/ensemble/plot_gradient_boosting_early_stopping.py |
Would work on the |
Will look at ../examples/ensemble/plot_gradient_boosting_regularization.py next. |
Will look at ../examples/model_selection/plot_successive_halving_iterations.py next |
I'll try |
I'll work on |
Will work on |
Will work on |
Now working on |
Am working on |
Am working on |
…t-learn#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment, elaborate analysis
…earn#21598 (scikit-learn#21612) * accelerate plot_successive_halving_iterations.py example scikit-learn#21598 * n_estimators back to 20
…t-learn#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment, elaborate analysis
…earn#21598 (scikit-learn#21612) * accelerate plot_successive_halving_iterations.py example scikit-learn#21598 * n_estimators back to 20
…t-learn#21598 (scikit-learn#21611) * accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598 * speed up by less samples and less trees * use train_test_split instead of slicing
As suggested by @adrinjalali on Gitter, I am commenting on my attempt to accelerate This example involves a logistic regression using the Compared to other PRs in this issue, it seems that the acceleration obtained here is quite low. I am not sure even if this is an example that can be further improved considering the most expensive operation here is |
…t-learn#21678) * Reduce num of samples in plot-digit-linkage example * Remove unnecessary random_state * Remove nudge_images * Address PR comment, elaborate analysis
…earn#21598 (scikit-learn#21612) * accelerate plot_successive_halving_iterations.py example scikit-learn#21598 * n_estimators back to 20
…t-learn#21598 (scikit-learn#21611) * accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598 * speed up by less samples and less trees * use train_test_split instead of slicing
refer this : * accelerate plot_successive_halving_iterations.py example scikit-learn#21598 |
I am working on |
As far as My placement of timing "checkpoints" can be seen here: I'm a first-time contributor, so sorry that I submitted that .py file as a .txt file, but I'm still getting used to how Github works. Please let me know if there is a better way to share my work, since it's not a file that should be merged into the main project. Unless this sample can be switched to a different dataset that is faster to fetch, I don't think performance can be improved since fetching the data is ~85% of the total runtime. |
We are aware of the slow So I would ignore for the moment the fact that the fetcher is taking too much time. |
Hello everyone, I saw this issue open and I will like to work on it. Is there any help needed? |
@Aditi840 for now I think no more help is needed here. Thanks for offering. |
Closing after merging #21938 that accelerate the remaining examples |
These examples take quite a long time to run, and they make our documentation CI fail quite frequently due to timeout. It'd be nice to speed the up a little bit.
To contributors: if you want to work on an example, first have a look at the example, and if you think you're comfortable working on it and have found a potential way to speed-up execution time while preserving the educational message of the example, please mention which one you're working on in the comments below.
Please open a dedicated PR for each individual example you have a found fix for (with a new git branch branched off of
main
for each example) to make the review faster.Please focus on the longest running examples first (e.g. 30s or more). Examples that run in less than 15s are probably fine.
Please also keep in mind that we want to keep the example code as simple as possible for educational reasons while preserving the main points expressed in the text of the example valid and well illustrated by the result of the execution (plots or text outputs).
Finally, we expect that some examples cannot really be accelerated while preserving their educational value (integrity of the message and the simplicity of the code). In this case, we might decide to keep them as they are if they last less than 60s.
To maintainers: I'm running a script which automatically updates the following list with connected PRs and "done" checkboxes, no need to updated them manually.
Examples to Update
plot_gradient_boosting_quantile.py
example #21666plot_semi_supervised_newsgroups.py
example #21673The text was updated successfully, but these errors were encountered: