MAINT: Fix the sign when using Newton's method #9466
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR cleans up a small question in the code when using Newton's method to fit a generic maximum-likelihood estimator.
statsmodels/statsmodels/base/model.py
Line 542 in ddeaa63
The code flips the sign of the gradient and Hessian for Newton's method, but not any other minimization method (BFGS, Nelder-Mead, etc.). All other methods minimize negative log likelihood, but Netwon's gets the gradient and Hessian of the (positive) log likelihood function. Newton's method still uses the negative log likelihood as the objective function.
Newton's Method
While the gradient and Hessian don't match the objective function, Newton's method works fine because it doesn't use the objective function at all. Instead, Newton's method finds zeros of the derivative, which correspond to the critical points: local minimum, local maximum, and saddle points. In the existing implementation, Newton's method finds the maximum of the log likelihood function, which equals the minimum of the negative log likelihood problem (assuming all critical points are global maxima). You can always multiply the gradient and Hessian by -1, or any nonzero scalar, and get the same results.
For clarity, this PR removes the sign flip for Newton's method, since it will work equally well in either case. Without the sign flip, I think the code is easier to read.
Ridge Regularization
The sign change has a slight impact on a ridge factor applied in
_fit_newton
:statsmodels/statsmodels/base/optimizer.py
Line 448 in 3be76b8
I changed the sign of the Hessian, but not the ridge factor. The resulting matrix changes its values beyond the intended sign flip. I think adding rather than subtracting the ridge factor is proper here. We'd want a square penalty when maximizing log likelihood, which would correspond to subtracting the ridge factor. However, we're working with negative log likelihood, so we flip the sign and add the ridge factor.
(Also, the code doesn't adjust the gradient, but I think it should. I haven't made that change in this PR.)