8000 [MRG] Accelerate example plot_sparse_logistic_regression_mnist.py by steinfurt · Pull Request #21862 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG] Accelerate example plot_sparse_logistic_regression_mnist.py #21862

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

steinfurt
Copy link
Contributor

Reference Issues/PRs

References #21598

What does this implement/fix? Explain your changes.

This replaces the MNIST dataset with the internal toy dataset for hand-written digits, roughly keeping the weight vector sparsity and test score, but reducing the test execution time from ~20 seconds to < 1 second. Visualization of the weight vector may be slightly worse (after all it is only 8x8 digits), but the main features of the digits 0-9 are still recognizable.

Any other comments?

Mainly, the previous implementation was slow because of the fetch_openml function.
I kept the reference to MNIST to comment in/out.

< 8000 a class="avatar avatar-user" style="width:20px;height:20px;" data-test-selector="commits-avatar-stack-avatar-link" data-hovercard-type="user" data-hovercard-url="/users/steinfurt/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="/steinfurt"> @steinfurt
@steinfurt steinfurt force-pushed the speed_up_plot_sparse_logistic_regression_mnist_py branch from 2ce42ca to 993327c Compare December 3, 2021 00:07
@steinfurt steinfurt force-pushed the speed_up_plot_sparse_logistic_regression_mnist_py branch from 993327c to 2b8610e Compare December 3, 2021 00:13
@glemaitre
Copy link
Member

Using SAGA on a toy dataset defeat the bit the purpose of the example. I don't think that we should do that the Ben if the example is taking 25 secs. We could down sample a bit the original dataset depending of the number of sample.

@steinfurt
Copy link
Contributor Author

Thanks for having a quick look, @glemaitre!

The example already used only 5000 from the original 70k MNIST samples for training. As the execution time is mainly limited by fetch_openml (see #11821, #14855 and others), further downsampling does not substantially accelerate it. Therefore I believed it would be fine to go for the toy dataset instead and leave the original in a comment. But I see your point that this is probably taking it too far. Feel free to reject the PR, unless you have a different suggestion.

Sorry, @webbdays, I overlooked you mentioned in #21598 that you are already looking into how to speed up this example.

@ogrisel
Copy link
Member
ogrisel commented Dec 4, 2021

I agree with @glemaitre that we want to using MNIST in this example. We will have to find a way to accelerate the MNIST dataset loading one way or another but changing the example itself is probably not the solution. Let me close this PR. Thanks anyways for the contribution @steinfurt.

@ogrisel ogrisel closed this Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0