8000 MAINT: Fix the sign when using Newton's method by rob-sil · Pull Request #9466 · statsmodels/statsmodels · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Fix the sign when using Newton's method #9466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rob-sil
Copy link
@rob-sil rob-sil commented Dec 26, 2024

This PR cleans up a small question in the code when using Newton's method to fit a generic maximum-likelihood estimator.

# TODO: why are score and hess positive?

The code flips the sign of the gradient and Hessian for Newton's method, but not any other minimization method (BFGS, Nelder-Mead, etc.). All other methods minimize negative log likelihood, but Netwon's gets the gradient and Hessian of the (positive) log likelihood function. Newton's method still uses the negative log likelihood as the objective function.

Newton's Method

While the gradient and Hessian don't match the objective function, Newton's method works fine because it doesn't use the objective function at all. Instead, Newton's method finds zeros of the derivative, which correspond to the critical points: local minimum, local maximum, and saddle points. In the existing implementation, Newton's method finds the maximum of the log likelihood function, which equals the minimum of the negative log likelihood problem (assuming all critical points are global maxima). You can always multiply the gradient and Hessian by -1, or any nonzero scalar, and get the same results.

For clarity, this PR removes the sign flip for Newton's method, since it will work equally well in either case. Without the sign flip, I think the code is easier to read.

Ridge Regularization

The sign change has a slight impact on a ridge factor applied in _fit_newton:

H[np.diag_indices(H.shape[0])] += ridge_factor

I changed the sign of the Hessian, but not the ridge factor. The resulting matrix changes its values beyond the intended sign flip. I think adding rather than subtracting the ridge factor is proper here. We'd want a square penalty when maximizing log likelihood, which would correspond to subtracting the ridge factor. However, we're working with negative log likelihood, so we flip the sign and add the ridge factor.

(Also, the code doesn't adjust the gradient, but I think it should. I haven't made that change in this PR.)

@josef-pkt
Copy link
Member

Thanks,
looks fine
I guess the only disadvantage is that negative of hessian requires a copy, which might have been the reason to avoid the sign flip.

H[np.diag_indices(H.shape[0])] += ridge_factor

AFAIR, diagonal of H should be positive now. So this is now correct.
I think this was wrong before if diag(H) is negative.

one test failure on one machine
FAILED statsmodels/discrete/tests/test_predict.py::TestZINegativeBinomialPPredict::test_predict

Also, the code doesn't adjust the gradient, but I think it should.

Do you mean adjusting the score for the the ridge factor?
I had added the ridge factor only to make the hessian less likely to be ill conditioned. It's not computing a ridge regression.

@josef-pkt
Copy link
Member

sign change of hessian needs decision

sign of ridge factor needs more review to see if current version is really a bug.

Possibly check if adding the corresponding ridge factor also to the score improves optimization. (When I used the ridge factor to stabilize hessian during optimization in the past, I never adjusted score or similar. The ridge factor here should just move numerical noise in the right direction.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0