8000 Increase execution speed of plot_sparse_logistic_regression_20newsgroup.py · Pull Request #21619 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Increase execution speed of plot_sparse_logistic_regression_20newsgroup.py #21619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed

Conversation

ghost
Copy link
@ghost ghost commented Nov 10, 2021

#21598 @sply88 @adrinjalali @cakiki

Adapted the number of samples and added the warm_start option for the Logistic Regression Modell. Execution time was reduced from 45 sec to 9 sec :

before(1)
after(1)

@ghost ghost changed the title Increase execution speed of plot Increase execution speed of plot_sparse_logistic_regression_20newsgroup.py Nov 10, 2021
Copy link
Member
@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -82,6 +82,7 @@
penalty="l1",
max_iter=this_max_iter,
random_state=42,
warm_start=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you sure this has an effect? I'm not sure if in this case it should

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so deep in how exactly they run these modules/their tests but this would have an influence depending on how often they are executed and in what order (e.g. multiple runs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are creating a new lr instance each time you call fit, it should not have any impact.

warm_start=True could have an impact if you move the creation of the lr instance before entering the for this_max_iter in model_params["iters"] loop and then only calling the lr.set_params(max_iter=this_max_iter) inside the inner loop prior to calling fit. However wall time measurements would get a different meaning and would need to be readjusted before plotting. I have the feeling that this be too tricky to achieve or would render the example too complex.

Let's just use the training set size reduction and revert the use of warm_start=True.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel Will do

@@ -40,7 +40,7 @@
solver = "saga"

# Turn down for faster run time
n_samples = 10000
n_samples = 7000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even go lower, maybe 4000 would still be enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it, probably 4000 is still enough :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, time for second example is now down to 5 sec (from 9) but accuracy also dropped a bit (from 0.7 to 0.6). Will check out 5000 and 6000

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, 5000 seems to be a good compromise, time is down to under 7 sec with accurace close to where it was before

@adrinjalali adrinjalali mentioned this pull request Nov 10, 2021
41 tasks
@thomasjpfan
Copy link
Member

Thank you for working on this PR! The proposed change looks to already be on main:

which was merged in #21773. With that in mind, I am closing this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0