-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Increase execution speed of plot_sparse_logistic_regression_20newsgroup.py #21619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…xecution speed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sveneschlbeck
@@ -82,6 +82,7 @@ | |||
penalty="l1", | |||
max_iter=this_max_iter, | |||
random_state=42, | |||
warm_start=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you sure this has an effect? I'm not sure if in this case it should
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so deep in how exactly they run these modules/their tests but this would have an influence depending on how often they are executed and in what order (e.g. multiple runs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are creating a new lr
instance each time you call fit, it should not have any impact.
warm_start=True
could have an impact if you move the creation of the lr
instance before entering the for this_max_iter in model_params["iters"]
loop and then only calling the lr.set_params(max_iter=this_max_iter)
inside the inner loop prior to calling fit. However wall time measurements would get a different meaning and would need to be readjusted before plotting. I have the feeling that this be too tricky to achieve or would render the example too complex.
Let's just use the training set size reduction and revert the use of warm_start=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel Will do
@@ -40,7 +40,7 @@ | |||
solver = "saga" | |||
|
|||
# Turn down for faster run time | |||
n_samples = 10000 | |||
n_samples = 7000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd even go lower, maybe 4000 would still be enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into it, probably 4000 is still enough :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, time for second example is now down to 5 sec (from 9) but accuracy also dropped a bit (from 0.7 to 0.6). Will check out 5000 and 6000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, 5000 seems to be a good compromise, time is down to under 7 sec with accurace close to where it was before
removed ``warm_start=True``
Thank you for working on this PR! The proposed change looks to already be on scikit-learn/examples/linear_model/plot_sparse_logistic_regression_20newsgroups.py Line 43 in 3082d95
which was merged in #21773. With that in mind, I am closing this PR. |
#21598 @sply88 @adrinjalali @cakiki
Adapted the number of samples and added the
warm_start
option for the Logistic Regression Modell. Execution time was reduced from 45 sec to 9 sec :