-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add verbosity to newton-cg solver #27526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH add verbosity to newton-cg solver #27526
Conversation
This reverts commit 52d63d5.
This is needed for more reliable convergence. Tests like test_logistic_regressioncv_class_weights then don't raise a convergence error.
I wonder if we should add these verbosity PRs or add callbacks instead (cc @jeremiedbb ) |
verbose through callbacks is a bit limited because it will be triggered only at places where callbacks are called so if we really need to have advanced low level verbosity we have to do it manually. I'm not convinced though that we really need advanced low level verbosity but no strong opinion here. |
…cikit-learn into newton_cg_verbose
@adrinjalali @jeremiedbb @ogrisel I would like to merge this. Newton-CG might be our best solver for multiclass logistic regression and the other solvers have verbose output, too. It doesn't change the API and is just convenience for a small group of users. Review should be super easy. |
Let me update this branch with |
For other reviewers here is how a typical verbose run would look like:
|
@lorentzenchr would it be possible to get more details on the line search iterations? |
Not really. We use private line search functions from scipy. They do not have a verbose option. Those functions call, e.g., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more quick pass but I have to go offline soon. I'll keep the firefox tab open to finalize my review soon this time.
CI failing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pass, LGTM.
@@ -71,6 +93,8 @@ def _line_search_wolfe12(f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwarg | |||
# TODO: It seems that the new check for the sum of absolute gradients above | |||
# catches all cases that, earlier, ended up here. In fact, our tests never | |||
# trigger this "if branch" here and we can consider to remove it. | |||
if is_verbose: | |||
print(" last resort: try line search wolfe2") | |||
ret = line_search_wolfe2( | |||
f, fprime, xk, pk, gfk, old_fval, old_old_fval, **kwargs | |||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there anything interesting to print based on the result of wolfe2? Maybe we should at least print that the wolfe2 line search was successful when ret[0] is not None
to be consistent with what we print for the wolfe1 line search.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there anything interesting to print based on the result of wolfe2?
🤷
print( | ||
" check sum(|gradient|) < sum(|gradient_old|): " | ||
f"{sum_abs_grad} < {sum_abs_grad_old} {check}" | ||
) | ||
if check: | ||
ret = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make it explicit that we perform an update with unit step size in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current message is 1-1 with NewtonSolver
.
eps = 16 * np.finfo(np.asarray(old_fval).dtype).eps | ||
if is_verbose: | ||
print(" Line Search") | ||
print(f" eps=16 * finfo.eps={eps}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would defer the print of eps
to when it's actually used:
print(f" eps=16 * finfo.eps={eps}") |
otherwise one gets the impression that it's used by wolfe1, especially when it's successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current message is 1-1 with NewtonSolver
.
print( | ||
" check loss |improvement| <= eps * |loss_old|:" | ||
f" {np.abs(loss_improvement)} <= {tiny_loss} {check}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print( | |
" check loss |improvement| <= eps * |loss_old|:" | |
f" {np.abs(loss_improvement)} <= {tiny_loss} {check}" | |
) | |
print(f" eps=16 * finfo.eps={eps}") | |
print( | |
" check loss |improvement| <= eps * |loss_old|:" | |
f" {np.abs(loss_improvement)} <= {tiny_loss} {check}" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some untested lines which it seems they're legit and should be tested, otherwise LGTM.
I think some of @ogrisel 's comments are valid in terms of moving messages or being more explicit, but I don't mind them being a separate PR since they'd touch multiple solvers.
Except for about one occasion, the lines not tested according to codecov are just hard to trigger corner cases. There are currently no tests that trigger them, to my knowledge (so it’s a more than 10 year old liability). I would much prefer to not put that burden on this PR. |
Reference Issues/PRs
This PR is meant to be merged after #26721.
What does this implement/fix? Explain your changes.
This PR adds verbosity to
_newton_cg
solver in our privatesklearn.utils.optimize
module. It is used, e.g., inLogisticRegression
.Any other comments?