-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH scaling of LogisticRegression loss as 1/n * LinearModelLoss #26721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH scaling of LogisticRegression loss as 1/n * LinearModelLoss #26721
Conversation
b6254c5
to
fc0f0e7
Compare
As of ead068f with the same benchmark code as in #24752 (comment), I get Note that suboptimality is capped at 1e-12. To make newton-cg pass, I added e1c2128. Nevertheless, newton-cg seems broken for high accuracy (low tol). |
As of fdc0fa3 with fixed curvature criterion in conjugate gradient computation: Summary
|
I could put fdc0fa3 into a separate small PR. |
For a better comparison, here main vs this PR: Numbers in details MAIN:
PR
|
I am confused here @lorentzenchr is it a bug fix for Newton CG? 2 thoughts:
|
@agramfort This PR does not change any API. It just translates to and uses the objective Please observe the stuck lbfgs (~single point) in the issue #24752 and the curve it has with this PR: Now The performance improvements of newton-cg are more of a nice side effect. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it !
thx @lorentzenchr
This is needed for more reliable convergence. Tests like test_logistic_regressioncv_class_weights then don't raise a convergence error.
@glemaitre Thank you so much. |
This reverts commit ef26813.
sklearn/utils/optimize.py
Outdated
if ret[0] is None: | ||
# line search failed: try different one. | ||
ret = line_search_wolfe2( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info: Codecov complains about no test coverage of those lines. It seems that the new convergence check just above makes this call to line_search_wolfe2
obsolete.
if 0 <= curv <= 16 * np.finfo(np.float64).eps * psupi_norm2: | ||
# See https://arxiv.o 9E88 rg/abs/1803.02924, Algo 1 Capped Conjugate Gradient. | ||
break | ||
elif curv < 0: | ||
if i > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Info: Codecov complains about no test coverage of those lines.
We have only one very simple test for _newton_cg
in test_optimize.py
. Also the tests for LogisticRegression with newton-cg don't hit these branches.
I don't think blowing up the tests is beneficial for this PR. I could add a TODO note in the code, though.
@@ -39,8 +40,37 @@ def _line_search_wolfe12(f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwarg | |||
""" | |||
ret = line_search_wolfe1(f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwargs) | |||
|
|||
if ret[0] is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this change should be reflected in an entry in the changelog as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean more details than just
...have much better convergence for solvers
"lbfgs"
and"newton-cg"
I can add a changelog later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to have an entry in the sklearn.linear_model
section specifically tagged as an enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a 2nd entry or add that detail to the existing entry (of this PR)? The latter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a second entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am convinced with the benchmark and the reasoning here.
@glemaitre Ready for merge. |
Thanks @lorentzenchr |
thx @lorentzenchr ! |
…it-learn#26721) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…it-learn#26721) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes #24752, #18074.
Alternative to #27191.
What does this implement/fix? Explain your changes.
This PR changes
$\frac{1}{n} \sum_i loss(i) + penalty$ .
LinearModelLoss
to always use1/n
factor in front, such that the loss isThis is a pure internal change, no API is touched. But coefficients of log reg models may change due to different convergence / stopping of "newton-cg" and "lbfgs" solvers.
Any other comments?
This PR is incomplete and test do not pass. If someone else wants to continue, please go on.With this PR, the default setting of LogisticRegression might produce less accurate coefficients. So we might consider increasing
tol
.