[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

steinfurt · 2021-12-02T23:56:43Z

Reference Issues/PRs

References #21598

What does this implement/fix? Explain your changes.

This replaces the MNIST dataset with the internal toy dataset for hand-written digits, roughly keeping the weight vector sparsity and test score, but reducing the test execution time from ~20 seconds to < 1 second. Visualization of the weight vector may be slightly worse (after all it is only 8x8 digits), but the main features of the digits 0-9 are still recognizable.

Any other comments?

Mainly, the previous implementation was slow because of the fetch_openml function.
I kept the reference to MNIST to comment in/out.

glemaitre · 2021-12-03T07:02:47Z

Using SAGA on a toy dataset defeat the bit the purpose of the example. I don't think that we should do that the Ben if the example is taking 25 secs. We could down sample a bit the original dataset depending of the number of sample.

steinfurt · 2021-12-03T19:11:42Z

Thanks for having a quick look, @glemaitre!

The example already used only 5000 from the original 70k MNIST samples for training. As the execution time is mainly limited by fetch_openml (see #11821, #14855 and others), further downsampling does not substantially accelerate it. Therefore I believed it would be fine to go for the toy dataset instead and leave the original in a comment. But I see your point that this is probably taking it too far. Feel free to reject the PR, unless you have a different suggestion.

Sorry, @webbdays, I overlooked you mentioned in #21598 that you are already looking into how to speed up this example.

ogrisel · 2021-12-04T17:43:20Z

I agree with @glemaitre that we want to using MNIST in this example. We will have to find a way to accelerate the MNIST dataset loading one way or another but changing the example itself is probably not the solution. Let me close this PR. Thanks anyways for the contribution @steinfurt.

steinfurt added 4 commits December 3, 2021 00:33

Do not use pixel size explicitly

7b96ab1

Use full dataset

9aec439

Load toy dataset

204dc95

steinfurt force-pushed the speed_up_plot_sparse_logistic_regression_mnist_py branch from 2ce42ca to 993327c Compare December 3, 2021 00:07

Fix linting issue and format with black

2b8610e

steinfurt force-pushed the speed_up_plot_sparse_logistic_regression_mnist_py branch from 993327c to 2b8610e Compare December 3, 2021 00:13

ogrisel closed this Dec 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?