add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

argriffing · 2013-10-29T18:08:51Z

No description provided.

…NLogit

argriffing · 2013-10-29T18:09:41Z

related to issue #1141

josef-pkt · 2013-10-29T18:20:27Z

statsmodels/discrete/discrete_model.py

@@ -1588,6 +1588,21 @@ def score(self, params):
        #NOTE: might need to switch terms if params is reshaped
        return np.dot(firstterm.T, self.exog).flatten()

+    def loglike_and_score(self, params):


Update: maybe not such a good idea, because your _fit_mle_lbfgs covers several cases of loglike and score

Since this is mainly for optimization and doesn't affect other statistical results, I would define it in terms of the negative logliglihood nloglike_and_score avoids the negation.

in statsmodels.base.model.GenericLikelihoodModel and subclasses, I used nloglike

Is GenericLikelihoodModel even used? I've run across swaths of statsmodels code that is either commented out or has comments like 'dead code below here' related to the generic likelihood model.

GenericLikelihoodModel is the rapid prototyping and development version. Most likelihood models grow up and then integrate into the LikelihoodModel hierarchy and don't subclass GenericLikelihoodModel.
GenericLikelihoodModel has more preconfigured or more general defaults than LikelihoodModel.

josef-pkt · 2013-10-29T18:33:25Z

I think, you could try to special case it in line L378, if loglike_and_score are available, to make it used automatically.
I haven't looked how this would look like after Skipper's refactoring/outsourcing.

looks good so far.

argriffing · 2013-10-29T18:55:18Z

I think, you could try to special case it in line L378,
if loglike_and_score are available, to make it used automatically.

I'm not sure exactly what you mean here, but maybe you are talking about statsmodels/base/model.py? If so, I agree that this could make sense. There are so many statsmodels layers that I'm not sure which changes in which places would be most appropriate.

Regarding the comments about loglike, score, nloglike_and_score, and negations, the statmodels situation is confusing to me because functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code. I would prefer to have loglike never mean negative log likelihood, and to have score never mean the negative of the log likelihood gradient, but on the other hand it is probably not a good use of my time to go through and change the variable names everywhere this occurs in statsmodels.

…y passing it a score function

josef-pkt · 2013-10-29T19:17:06Z

functions and variables with identical names but with opposite signs are used interchangeably throughout the current statsmodels code

Can you just point them out or open an issue for them.
We shouldn't have different definitions with the same name, so it should be cleaned up, but that's not part of this PR.

argriffing · 2013-10-29T19:56:25Z

Can you just point them out or open an issue for them.

added as issue #1162

argriffing · 2013-10-29T19:59:26Z

Is the Travis failure a known issue? (SVD did not converge in pinv in kalman filter log likelihood) [EDIT: the travis failure is probably sporadic and is related to these and not to this PR in particular]

argriffing · 2013-10-29T22:45:40Z

I've tried closing and reopening to toggle Travis for the SVD convergence problem. On my system nosetests statsmodels passes all tests locally (except for that test_pca.py), with no complaint about SVD convergence. I'm pretty sure that my SVD uses openblas LAPACK. [EDIT: the travis failure is https://github.com/numpy/numpy/issues/1588 ?]

argriffing · 2013-11-01T16:07:48Z

@josef-pkt Is there anything preventing merging of this PR? I think that the SVD non-convergence is some numpy bug, and I don't understand your comment about line 378. If you would like me to address that comment before merging, I might need more detail. Is it referring to this line https://github.com/argriffing/statsmodels/blob/907122b797b4bf19ad3cf5e25686bc40c4464c51/statsmodels/base/model.py#L378?

josef-pkt · 2013-11-01T16:37:12Z

@argriffing I didn't know this is ready for merge. I can look into it in a few hours.
I need to look at the SVD convergence failure. We didn't have it a while ago and I don't know what triggered it.

In the unit test you give loglike_and_score has method argument. I thought this should be picked up automatically if the model class specifies this method. I didn't work my way fully through the different options of specifying scores when I looked at this PR before.

8000

argriffing · 2013-11-01T16:55:14Z

@josef-pkt Thanks! The interface design could become clearer if we add interfaces to more solvers that use loglike+score but not hessian (e.g. tnc), and when more models provide the simultaneous calculations (maybe ARIMA).

josef-pkt · 2013-11-01T19:48:04Z

The latest build on master was without failure, so the svd failure is most likely in this PR
https://travis-ci.org/statsmodels/statsmodels/builds

common problem I have with svd failures is if a nan or inf sneaked into my numbers.

josef-pkt · 2013-11-01T23:47:09Z

statsmodels/base/model.py

    extra_kwargs = dict((x, kwargs[x]) for x in names if x in kwargs)

-    if extra_kwargs.get('approx_grad', False):
-        score = None


Here we removed the score if approx_grad

argriffing · 2013-11-02T01:04:49Z

In my opinion the root of the approx_grad confusion is that the code has multiple points at which the gradient may be approximated. For example the ARIMA code currently uses a numdiff finite differences function to approximate the gradient (although @jseabold has suggested that this could be replaced by an analytically computed gradient). On the other hand, the scipy fmin_l_bfgs_b function will internally approximate the gradient if no gradient is provided and if approx_grad is True.

So there are multiple ways to specify the gradient.

let lbfgs use its internal gradient approximation
2a) use numdiff and pass lbfgs the resulting score function
2b) use numdiff and pass lbfgs a bundled loglike+numdiffscore function
3a) pass lbfgs a more clever (e.g. analytic, autodiff) score function
3b) pass lbfgs a bundled loglike+cleverscore function

Of these options, (3) is best, followed by (1), and (2) is worst. We are currently using (2) for ARIMA, but I hope that fixing this does not fall in the scope of this PR.

josef-pkt · 2013-11-02T12:15:19Z

(We had a power outage all last evening and part of the night).

I understand the principle, but I'm still not sure whether ARMA is or is not using the score method, and whether that default has been changed by this PR, leading to the SVD failure.

Changing/Improving the score calculation for ARIMA is of course not required by this pull request.
But I don't see how to check which score is actually used. Do we store the optimization options for this?

argriffing · 2013-11-02T14:16:17Z

and whether that default has been changed by this PR, leading to the SVD failure.

Yeah I think you're right. I'll change it back.

coveralls · 2013-11-02T14:46:41Z

Coverage remained the same when pulling 3b14099 on argriffing:mnlogit-loglike-score into 4f675c7 on statsmodels:master.

josef-pkt · 2013-11-02T15:49:24Z

It made TravisCI green.

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b. Do I read this correctly?

The svd failure with the different estimation settings might need investigation, but that's not part of this PR.

I can merge this (after another brief look), if you want to use this for your low memory estimation, and we open a new issue for integrating this with automatic loglike_and_score usage.

argriffing · 2013-11-02T16:28:28Z

What I find strange is that it looks like the ARIMA estimation doesn't use "our" score, but the buildin of l_bfgs_b.
Do I read this correctly?

Yes, I believe so. In the old statsmodels code (and in its current restored version), ARIMA uses the internal gradient approximation of fmin_l_bfgs_b even though it also defines its own score function which calls numdiff. So that is a little odd. But maybe ARIMA's numdiff score function is used somewhere else, instead of in the MLE search?

open a new issue for integrating this with automatic loglike_and_score usage.

You are suggesting to have the statsmodels machinery automatically detect a loglike_and_score member function of a Model subclass, and statsmodels will automatically use this function when appropriate for parameter estimation (fitting), similarly to how the loglike, score, and hessian member functions are currently treated?

josef-pkt · 2013-11-02T16:48:31Z

Yes, that was my first thought when I looked through this PR, automatically using loglike_and_score if available, instead of loglike and score separately.

Now, I think we better wait for Skippers outsourcing of the optimizer code. To make the optimizer code more generally available, we need to be able to specify the objective function and the score even if they are not lo 8000 glike and score.
Our preferred optimizer is Newton in discrete, and there it would be better to have loglike_and_score_and_hessian.
Furthermore, scipy.optimize has two new more robust newton optimizers, that would be very useful for models like discrete where in many or most cases we have analytical 1st and 2nd derivatives.

So, if you can do your low memory work with the changes in this PR, then I would postpone this integration until Skipper can also have a look at this, and can check how this works with his branch.

argriffing · 2013-11-02T17:03:26Z

@josef-pkt That sounds like a good plan. After this PR is merged I'll work on low memory modifications (or low memory subclasses, if a time/memory tradeoff is required) for MNLogit.

By the way, I saw your mailing list question

Does scipy have another optimizer besides fmin (Nelder-Mead) that is
robust to near-singular, high condition number Hessian?

fmin_bfgs goes into neverland, values become huge until I get some
nans in my calculations.

What would be nice is an optimizer that uses derivatives, but
regularizes, forces Hessian or equivalent to be positive definite.

and I'd suggest giving lbfgs a try, in unconstrained mode. I've often had it succeed when bfgs fails.

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit

josef-pkt · 2013-11-02T17:18:15Z

Good, merged

unit tests pass, and we will have another look later, see #1141

Thanks you

…core add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit

alexbrc added 2 commits October 29, 2013 09:58

8000

ENH: add MNLogit loglike_and_score

aa12688

ENH: mle lbfgs now uses simultaneous loglike and score, tested with M…

0474015

…NLogit

josef-pkt reviewed Oct 29, 2013
View reviewed changes

alexbrc added 2 commits October 29, 2013 14:59

MAINT: change default maxiter to match docstring

81a6f52

MAINT: do not tell lbfgs to approximate the gradient if we are alread…

907122b

…y passing it a score function

argriffing closed this Oct 29, 2013

argriffing reopened this Oct 29, 2013

josef-pkt reviewed Nov 1, 2013
View reviewed changes

MAINT: restore approx_grad flag superpowers

3b14099

josef-pkt added a commit that referenced this pull request Nov 2, 2013

Merge pull request #1161 from argriffing/mnlogit-loglike-score

831e2ad

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit

josef-pkt merged commit 831e2ad into statsmodels:master Nov 2, 2013

josef-pkt mentioned this pull request Nov 2, 2013

ENH: model fitting interface for simultaneous calculation of loglike and score #1141

Open

PierreBdR pushed a commit to PierreBdR/statsmodels that referenced this pull request Sep 2, 2014

Merge pull request statsmodels#1161 from argriffing/mnlogit-loglike-s…

6bccfda

…core add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

add a fitting interface for simultaneous log likelihood and score, for lbfgs, tested with MNLogit #1161

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!