[go: up one dir, main page]

0% found this document useful (0 votes)
147 views12 pages

Estimating Functions (Godambe Article)

This document discusses the development of estimating functions theory and how it provides a unified framework for least squares and maximum likelihood estimation methods. Estimating functions theory provides a finite sample justification for maximum likelihood estimation analogous to the Gauss-Markov theorem's justification of least squares estimation. It also addresses deficiencies of maximum likelihood for nuisance parameters and generalizes the theory beyond the standard linear model assumptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views12 pages

Estimating Functions (Godambe Article)

This document discusses the development of estimating functions theory and how it provides a unified framework for least squares and maximum likelihood estimation methods. Estimating functions theory provides a finite sample justification for maximum likelihood estimation analogous to the Gauss-Markov theorem's justification of least squares estimation. It also addresses deficiencies of maximum likelihood for nuisance parameters and generalizes the theory beyond the standard linear model assumptions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Institute of Mathematical Statistics

LECTURE NOTES — MONOGRAPH SERIES

ESTIMATING FUNCTIONS: A SYNTHESIS OF LEAST


SQUARES AND MAXIMUM LIKELIHOOD METHODS

V.P. Godambe
University of Waterloo
ABSTRACT
The development of the modern theory of estimating functions is traced
from its inception. It is shown that this development has brought about
a synthesis of the two historically important methodologies of estimation
namely, the 'least squares' and the 'maximum likelihood'.
Key Words: Estimating functions; likelihood; score function.

1 Introduction
In common with most of the historical investigations, it is difficult to trace
the origin of the subject of this conference: 'Estimating Functions'. How-
ever, in the last two centuries clearly there are three important precursors of
the modern theory of estimating functions (EF): In the year 1805, Legendre
introduced the least squares (LS) method. At the turn of the last century
Pearson proposed the method of moments and in 1925 Fisher put forward
the maximum likelihood (ML) equations. Of these three, the method of
moments faded out in time because of its lack of any sound theoretical jus-
tification. However the other two methods namely the LS and the ML even
at present play an important role in the statistical methodology. These two
methods would also concern us in the following. The LS method was justified
by what today is called the Gauss-Markoff (GM) theorem: The estimates
obtained from LS equations are 'optimal' in the sense that they have min-
imum variance in the class of linear unbiased estimates. This was a finite
sample justification. At about the same time Laplace provided a different
'asymptotic justification' for the method. Fisher justified the ML estimation,
for it produced estimates which are asymptotically unbiased with smallest
variance. This left open the question, is there a finite sample justification
for the ML estimation corresponding to the GM theorem justification for the
LS estimation?
The modern EF theory provided such a justification. According to the
Όptimality criterion' of the EF theory, the score function (SF) is 'optimal'.
6 GODAMBE

2 SF Optimality
To state the just mentioned result formally we introduce briefly some nota-
tion. Let X = {x} be the sample (observations) space and a class of possible
distributions (densities) on X be given by {/( |0),0 G Ω}, Ω being the pa-
rameter space, which we assume here to be the real line. If the function /
is completely specified up to the (unknown) parameter 0, f{ \0) is called a
parametric model. For this model the (score function) SF = d\ogf( \θ)/dθ.
Any real function of x and 0 say g(x,0) is called an estimating function,
(EF). It is said to be unbiased if its mean value for 0 E Ω, is zero; £g = 0.
Further, for reasons which would be clear later, corresponding to every EF
g we define a standardized version g/{£{τ^)}- Now in a class Q = {g} of
uu
unbiased estimating functions, g* is said to be 'optimal' if the variance of
the standardized EF g, is minimized for g — g*:
ε
^)2/{£(%)}2 < ε(g)2/{εφ}2, θen,9eg. (2.1)
SF Theorem (Godambe, 1960). For a parametric model /( |0), granting
some regularity conditions, in the class of all unbiased EFs, the optimal
estimating function is given by the SF i.e.

g* = dlogf{.\θ)/dθ.

The optimality of the SF given by the above Theorem should be dis-


tinguished from the optimality of the LS estimates based on the GM theo-
rem. The SF optimality (though with some additional assumptions implies
asymptotic optimality of the ML estimate) is essentially optimality of the
'estimating function' while the LS optimality is optimality of the 'estimate'.
The concept underlying optimality criterion of the EF theory became more
vivid and compelling in relation to the problem of nuisance parameters.

3 Conditional SF Optimality
Now let the parameter 0 consist of two components 0\ and #2, θ = (#i,#2)
and the parametric model be /( |0i,#2) where θ\ is real and 02 is a vector;
θ £ Ω, θ\ G Ωi, 02 £ Ω2 and Ω = Ωi x Ω2. Further suppose we want
to estimate only 0\ (the interesting parameter) ignoring 02 (the nuisance
parameter). How to proceed? To this question the ML estimation provides
no satisfactory answer. If 0\ and §2 are jointly ML estimates for θ\ and 02,
as is well known, the estimate 0\ can be inconsistent (unacceptable) in case
the dimensionality of the parameter 02 goes on increasing with the number
of observations (cf. Neyman-Scott, 1948). The EF theory, for the present
ESTIMATING FUNCTIONS

situation implies, restricting to that part of likelihood function which is


governed by the interesting parameter θ\ only. Formally for the parametric
model /( |0i,02), let Q\ be the class of all unbiased EFs g(x,θ\), that is
functions of x and 0χ only:

Gi = {9 9 = s ( M i ) , S(g) = 0, Θ e Ω}.

Further let t be a complete sufficient statistic for the parameter 02, f° r every
fixed 0χ. Assuming the statistic t is independent of the parameter θ\ we have
Conditional SF Theorem (Godambe, 1976). Granting some regularity
conditions, in the class of EFs Q\, the 'optimal' EF g* is given by the con-
ditional SF i.e. g* = dlog/( |t;0i)/30i.
Note in the above theorem the definition of optimality is obtained from
(2.1) just by replacing in it Q by Gi and consequently S(dg/dθ) by ε(dg/dθι).
That is the criterion of optimality is unconditional. In the case of the
Neyman-Scott example, unlike the ML estimate 0, the equation 'conditional
SF = 0' provides a consistent estimate of θ\. Further the EF optimality cri-
terion suggests a definition of 'conditional SF' in case the statistic t depends
on the parameter θ\. If ί(0io) is the value oΐt at θ\ = 0χo then we define the
conditional SF by g* where

g* = {dlogf(.\t(θ10),θ1,θ2)/dθι}θl0=Θl. (3.1)

This definition is motivated as follows. The EF g* in (3.1) £ Q\ though


it depends on 02- It further is 'optimal' in Q\ though only locally at 02
(Lindsay, 1982). Unlike the previous situation, when the sufficient statistic
t, was independent of 0χ, now no universally optimal g* (i.e. for all 02 £ Ω2)
exists in Q\. Further though the EF g* in (3.1) depends on 02, it is orthogonal
to the marginal SF of the sufficient statistic ί, hence the substitution of an
estimate 02 derived from the latter, in the former would still leave the former
nearly optimal for large samples, (Lindsay, 1982; Godambe, 1991; Small and
McLeish, 1994; Liang and Zeger, 1995). The equation

would provide a (nearly optimal) consistent estimate of θ\.


Note in the forgoing discussion, conditioning is used just as a 'technique'
to obtain (unconditionally) 'optimum' EFs; but it is not used as a principle
of inference. In fact, without invoking any conditioning at all, Godambe and
Thompson (1974) established, in case of the normal distribution iV(0χ,02),
the optimality of the EF (s 2 - 02), for the interest parameter 02, ignoring
the nuisance parameter θ\. How this (unconditional) optimality leads to a
very 'flexible conditioning' will be discussed later.
GODAMBE

For a general perspective on the topic of conditioning and optimality


we refer to Small and McLeish (1988), Lindsay and Waterman (1991) and
Lindsay and Li (1995).
Lloyd (1987) and Bhapkar (1991) have given results concerning optimal-
ity of 'marginal SF' under 'conditional completeness'.
From the above discussion it is clear that the EF theory has corrected a
major deficiency of the ML estimation in case of the nuisance parameters.
Some earlier references in respect of the nuisance parameters are Bartlett
(1936), Cox (1958), Barnard (1963), Kalbfleisch and Sprott (1970), Barndorff-
Nielsen (1973) and others. Some of these authors tried to obtain conditions
under which the marginal distribution of t does not contain any information
about θ\ the parameter of interest. As we have seen the optimality criterion
of the EF theory yields such a condition in terms of 'completeness of the
statistic t\ Though not universally applicable (as none can be, I suppose)
it by now has been commonly used for its mathematical manageability. It
also carries with it greater conviction for it is derived from an optimality
criterion which has proved to be fruitful very generally.
In the following we would show that the EF theory, just as it corrected
ML estimation, also corrects some major inadequacies of the LS estimation
and the GM theorem.

4 Quasi-Score Function
We now replace the abstract (observation) sample in the discussion by n real
variates X{ : i = 1, ...,n which are assumed to be independently distributed
with means μi(θ) and variances Vi(θ) (μι and v% being some specified func-
tions of θ) i = 1, ...,n. For simplicity let θ be a scalar parameter. Initially
we consider the special case where μι are linear functions of θ and V{ are in-
dependent of θ. Here the LS equation is given by Σι{xi — μ%){-^)hi — 0
uu
The solution of the equation, as said before, according to GM theorem, has
smallest variance in the class of all linear unbiased estimates of 0; hence
is 'optimal'. The estimating function Σ{χi ~ tJLi){-^E')lvi ιs a l s o 'optimal'

according to criterion (2.1), in the class of all EFs of the form
n
9 = Σ(xi ~ μ%)(H (4.1)

where aι can be any arbitrary functions of θ. (Actually here we minimize


8(g2) subject to holding ε(dg/dθ) = const.. This will explain the standard-
ization of EF mentioned earlier.) Note this EF optimality implies more than
the GM optimality, for the solutions corresponding to all the equations g = 0
include not only all linear unbiased estimates of θ but many more.
ESTIMATING FUNCTIONS

Now let the means μ* and variances V{ be arbitrarily specified functions


of θ. Here the LS equation is given by 3 + B = 0 where

g =
, n ,„. ..jdμi/dθ)
υ%
1
and (4.2)

Clearly in (4.2), S{g) = 0 and £(B) = ΣΐdlogVi/dθ. Note for large n,


(<?/n) ~ 0 while ( £ / n ) could still be very large. Hence because of the bias
term (B) the LS equation Ίj + B = 0 would generally lead to an inconsistent
estimate. On the other hand according to the optimality criterion (2.1) of
the EF theory, in the class of EFs given by (4.1) for different functions di(θ),
i — 1,..., n, g given by (4.2) is Optimal'. Generally the equation ~g = 0 would
lead to a consistent solution. (Here GM theorem cannot be of any avail for
the solution of g = 0 would generally lead to a biased estimate.) For reasons
to be explained soon, we would call the estimating function ~g a quasi-score
function, (Quasi-SF).
Interestingly the EF optimality of the quasi-SF ~g was first established in
a wider setting of discrete stochastic processes with a martingale structure.
Quasi-SF Theorem (Godambe 1985). If μι and V{ denote the means and
variances of X{ conditional on the past observations #i_i,...,α;o i.e. μι =
μi(0, #0, •••, Xi-i) and Vi — Vi(θ, x0,..., x;_i) for i = 1,..., n then

_ λdμi/dθ
9 = 2^{x% - μ%) •— (4-3)

Here g is the optimal EF in the class of the EFs given by (4.1) where aι are
functions of a?<-i, •••, ^o in addition to θ.
Among precursors to the EF g in (4.3) above are included the following:
Durbin (1960) gave a GM theorem analogue for linear time series model.
Klimko and Nelson (1978) obtained conditional LS equations. Kalbfleisch
and Lawless (1983) suggested a special case of g for Markov models.
For further generalizations of the EF optimality results, relating to g in
(4.3) we refer to Godambe (1985), Godambe and Heyde (1987), Godambe
and Thompson (1989).
Returning back for simplicity to the case where the variates x^ i = 1,.., n
are independently distributed, we summarize important properties of the EF
g given by (4.2). The 'optimality' of g is for the semi-parametric model de-
fined by means μ%{θ) and variances Vi(θ). As a special case when μ; is linear
in θ and V{ is independent of θ the EF optimality of ~g implies the GM opti-
mality of the LS estimates. In this special case, if the underlying distribution
10 GODAMBE

is normal, the LS estimates coincide with the ML estimates. Generally, for


the exponential family distributions the SF coincides with the optimal EF
g given by (4.2). Even outside the exponential family of distributions, ~g
2 2
satisfies a very general property of the SF; ί[SF) = -S(dSF/dθ) simi-
2 2
larly we have E(g) = —S(dg/d9) . Further if SF denotes a generic 'score
function' for the class of distributions consistent with the semi-parametric
2 2
model mentioned above then S(g - SF) < E(g - SF) , (the expectation
being taken w.r.t. the distribution that corresponds to the SF), for all the
EFs g given by (4.1). These are the properties which justify the previously
introduced term ' quasi-SF' for ~g. (Even before the EF optimality of g was
discovered the term quasi-likelihood was commonly used in the literature on
generalized linear models, McCullagh and Nelder 1983, 1989).
As we have seen previously the EF theory corrected a major deficiency in
ML estimation relating to nuisance parameters. The above discussion points
to yet another accomplishment of the EF theory. It brought about, via quasi-
score function g, a kind of synthesis of two historically distinct methods of
estimation: LS for semi-parametric models and ML for parametric models.
The same criterion of the EF optimality namely (2.1) is satisfied in case of
the latter by the SF and in case of the former by the quasi-SF ~g in (4.2).
Only the classes of competing EFs are different. They are taken appropriate
to the model (see Godambe and Thompson, 1989 Appendix). Of course, the
forgoing discussion also shows that the quasi-SF g does provide, not only a
unification (of the two methods LS and ML) but much more. It provides a
generalization to deal with problems outside the scope of both LS and ML
methods.
As a further contribution of the EF theory to statistics, below we briefly
outline a very 'flexible conditioning' that the theory permits and the conse-
quent incorporation of the Bayesian factor within its methodology.

5 A Generalization
It was mentioned earlier that within the framework of martingales and cor-
responding filtering, the EF theory suggested use of weighted conditional
least square estimation, on grounds of its optimality property. But to deal
with general spatial processes one needs more flexible conditioning than used
before; this was provided by Godambe and Thompson (1989): Let as before
X = {x} be an abstract sample space and T = {F} be a class of distributions
on X. Further let θ be a real parameter, defined on T\ {Θ(F),F € T) = Ω.
Now suppose hj is a real function on X x Ω and Xj a specified partition (or
a σ-field generated by a partition) of X such that

Xj) = o, j = ι,...,k. (5.i)


ESTIMATING FUNCTIONS 11

The functions /ij, j = l,...,fc are called the elementary EFs; they are not
exhaustive. Their choice is determined by the problem at hand. Now suppose
the elementary EFs, h\,...,hk are mutually orthogonal (Def. Godambe and
Thompson 1989) and the class of underlying distributions T satisfy certain
conditions. Then in the class of all EFs g of the form

(5.2)
3=1

where qj are some real functions o n ί x ί ί which are measurable on Xj,


j = 1,..., k the 'optimal' one is given by

3=1

where q*j = {ε(dhj/dθ\Xj)}/{ε(hj\Xj)}. Here the criterion of optimality


as always is unconditional, given by (2.1) for a real parameter θ (or its
appropriate version if θ is a vector); the expectation is taken with respect to
Fef.
Up to the above results the EF theory was 'restricted' to the classical
setup where distributions on the sample space X for some fixed values of the
parameters are considered. But the formalism of the EF optimality criterion
is flexible enough and the just mentioned 'restriction' can be set aside if we
know something about the prior distribution of 0; for instance its mean (0Q)
and variance (VQ). Under such Bayesian setup the only changes that are
required to be done are as follows: (i) In (5.1) now Xj is not necessarily
a partition of just the sample space X, but it can be a partition of X x
Ω, Ω as before being the parameter space, (ii) Some elementary EFs hj,
j = l,...,fc can now be functions exclusively of the parameter θ. (in) All
expectations in the optimality criterion (2.1) are now with respect to the
joint distributions of (x,θ) (and not as before with respect to distributions
of x given θ). Following is an illustration.
Let the partitions of the sample space Xj and the elementary estimating
functions hj, j = l,...,fc be the same as in (5.1). Further, as suggested
before, let the mean value (0o) and the variance (v$) of the prior distribution
of θ be known. Now, to the set of elementary estimating functions h\,...,hk
we add one more, namely hk+ι = θ — ΘQ. In this case the optimal EF is
given by g* + (θ — ΘQ)/VQ, where g* is the same as in (5.3). Similarly now the
quasi-SF ~g in (4.2), which was obtained under the assumption '0 is fixed'
will now have to be replaced by ~g — (θ — ΘQ)/VQ (Godambe, 1994).
The 'optimality' of the EF given by the derivative of the logarithm of the
posterior density was established in a 'parametric setup' by Ferreira (1982)
and Ghosh (1993). Naik-Nimbalkar and Rajarshi (1995) have established
some optimality results in 'semi-parametric Bayesian setup'.
12 GODAMBE

6 Other Topics

Now, following are a few remarks (possibly only tangential) about the like-
lihoods: empirical, partial, profile, quasi and the like. Basically when the
likelihood function is precisely known, with no nuisance parameters, likeli-
hood ratio test is 'optimal' in the conventional sense of the term. Also the
SF satisfies the EF criterion of Όptimality'. Now the various likelihoods,
empirical partial, quasi just mentioned, try to 'approximate' the underlying
(true, precise) likelihood in situations of nuisance parameters and/or of semi-
parametric models. In similar situations EF theory tries to 'approximate'
the (true) underlying SF. However, unlike the former, the latter 'approxima-
tion' can be assessed with a plausible finite sample criterion. Suppose g(x, θ)
is a real function of the sample x and the parameter of interest θ such that
the expectation £ (g) = 0 for all possible underlying distributions F i.e. for
F G T. Let further SF be a score function corresponding to F in T. Then
the finite sample criterion of assessing the approximation g for SF is given
by £{g - SF)2, for all F £ T. This criterion as said before leads to the Όpti-
mality' criterion (2.1) of the EF theory. As I have previously shown optimal
or approximately optimum EFs are found in many practical problems and in
fact by now they are in common use. Now while optimum EFs and approxi-
mations thereof can provide a handy instrument for constructing confidence
intervals and related tests cf. Rao's test (Rao 1947, Basawa 1991), for some
other problems some kind of 'approximate likelihood' would be more handy.
I think, to be safer, construction of such approximate likelihoods should be
tied to the optimum EFs, whenever possible. It is good to note already a
strong trend in that direction (Qin and Lawless, 1994).
An often asked question (cf. Liang and Zeger 1995) is how does the EF
optimality relate to the properties of the corresponding estimate? How good
is the estimate? Usually the answer is given in terms of the 'error' of the
estimate. Now this 'error' is somewhat of an involved concept. Certainly,
error is not just a square root of an arbitrary (unbiased or nearly so) estimate
of variance. However for a parametric model the concept is clear. The error
is derived from the conditional (or the natural estimate of) variance of the
SF. Thus error is the inverse of the square root of observed Fisher information
(Efron and Hinkley, 1978). This methodology is formalized and extended by
the EF theory. Consider the confidence intervals, θ ± const (error), where
the estimate θ is obtained from the unbiased estimating equation g(θ) = 0.
Here a more direct way of obtaining the confidence intervals is by inverting
the distribution of the standardized version (cf. Godambe 1991, eq. 40) of
the EF g around θ. These intervals, compared to former ones, are easier to
compute. Also if g is the optimal EF, the corresponding intervals are shortest
compared to that of any other unbiased EF, (Godambe and Heyde 1987).
ESTIMATING FUNCTIONS 13

The standardizing factor of the EF g directly leads to the computation of


the 'error' for the estimate θ (Godambe, 1995).
For important previous review articles on the subject we refer to Heyde
(1989) and Godambe and Kale (1991). The present review highlights some
more recent developments and presents older results with different emphasis
and interpretations. A further reference along this line is Desmond (1997).

References

Barnard, G.A. (1963). Some logical aspects of the fiducial argument. J.R.
Statist Soc. B, 25, 111-114.
Barndorff-Nielsen, O.E. (1973). On M-ancillarity. Biometrika 60, 447-455.
Bartlett, M.S. (1936). The information available in small samples. Proc.
Camb. Phil. Soc, 34, 33-40.
Basawa, I.V. (1991). Generalized score tests for composite hypotheses. Es-
timating functions, (ed. V.P. Godambe), Oxford Univ. Press, Oxford.
121-131.
Bhapkar, V.P. (1991). Sufficiency, ancillarity and information in estimating
functions. Estimating Functions. (Ed. V.P. Godambe), Oxford Univ.
Press, Oxford. 240-254.
Cox, D.R. (1958). Some problems connected with statistical inference. Ann.
Math. Statist. 29, 357-372.
Desmond, A.F. (1997). Optimal estimating functions, quasi-likelihood and
statistical modelling (with discussion). J. Stat. Plan. Inf. 60, 77-121.
Durbin, J. (1960). Estimation of parameters in time series regression models.
J. Roy. Statist. Soc. B, 22, 139-153.
Ferreira, P.E. (1982). Multiparametric estimating equations. Ann. Stat.
Math. 34, 423-431.
Fisher, R.A. (1925). Theory of statistical estimation. Proc. Cambridge Phil.
Soc. 22, 700-706.
Ghosh, M. (1990). On a Bayesian analog of the theory of estimating func-
tions. C.G. Khatri Memorial Volume of Gujard Statistical Review, 17A,
47-52.
Godambe, V.P. (1960). An optimum property of regular maximum likeli-
hood estimation. Ann. Math. Statist. 31, 1208-1212.
Godambe, V.P. (1976). Conditional likelihood and unconditional optimum
estimating equations. Biometrika, 63, 277-284.
Godambe, V.P. (1985). The foundations of finite sample estimation in
stochastic processes. Biometrika 72, 419-428.
14 GODAMBE

Godambe, V.P. (1991). Orthogonality of estimating functions and nuisance


parameters. Biometrika 78, 143-151.
Godambe, V.P. (1994). Linear Bayes and optimal estimation. Tech. Report
STAT-94-11, University of Waterloo.
Godambe, V.P. (1995). Discussion of the paper, 'Inference Based on esti-
mating functions in the presence of nuisance parameters' by Liang, K.Y.
and Zeger, S.L. Statistical Science 10, 173-174.
Godambe, V.P. and Heyde, C.C. (1987). Quasi-likelihood and optimal esti-
mation. Int. Stat. Rev. 55, 231-244.
Godambe, V.P. and Kale, B.K. (1991). Estimating functions: an overview.
Estimating Functions. (Ed. V.P. Godambe), Oxford University Press,
Oxford. 1-20.
Godambe, V.P. and Thompson, M.E. (1974). Estimating equations in pres-
ence of nuisance parameters. Ann. Stat. 2,568-571.
Godambe, V.P. and Thompson, M.E. (1989). An extension of quasi-likelihood
estimation (With Discussion). J. Stat. Plan. Inf. 22, 137-172.
Heyde, C.C. (1989). Quasi-likelihood and optimality of estimating functions:
some current unifying themes. Bull. Int. Stat. Inst. Book 1, 19-29.
Kalbfleisch, J.D. and Sprott, D.A. (1970). Applications of likelihood meth-
ods to models involving large number of parameters. (With Discussion).
J.R. Statist. B, 32, 175-208.
Kalbfleisch, J.D., Lawless, J.F. and Vollmer, W.M. (1983). Estimation in
Markov models from aggregate data. Biometrics 39, 907-919.
Klimko, L.A. and Nelson, P.I. (1978). On conditional least squares estima-
tion for stochastic processes. Ann. Statist. 6, 629-642.
Legendre, A.M. (1805). Nouvelles methodes pour la determination des or-
bites des cometes. Paris: Courcier.
Liang, K.Y. and Zeger, S.L. (1995). Inference based on estimating functions
in the presence of nuisance parameters. (With Discussion). Statistical
Science 10, 158-195.
Lindsay, B. (1982). Conditional score functions: some optimality results.
Biometrika 69, 503-512.
Lindsay, B. and Waterman, R.P. (1991). Extending Godambe's method in
nuisance parameter problems. Proceedings of a Symposium in honour of
Prof. V.P. Godambe. University of Waterloo, 1-43.
Lindsay, B.G. and Li, B. (1995). Discussion of the paper, 'Inference based
on estimating functions in the presence of nuisance parameters' by Liang,
K.Y. and Zeger, S.L. Statistical Science 10, 175-177.
Lloyd, C.J. (1987). Optimality of marginal likelihood estimating equations.
Comm. Stat Theory and Meth. 16, 1733-1741.
McCullagh, P. and Nelder, J.A. (1983, 1989). Generalized linear models (1st
and 2nd editions). Chapman and Hall, London.
Naik-Nimbalkar, U.V. and Rajarshi, M.B. (1995). Filtering and smoothing
via estimating functions. J. Amer. Statist. Asso. 90, 301-306.
ESTIMATING FUNCTIONS 15

Neyman, J. and Scott, E.L. (1948). Consistent estimates based on partially


consistent observations. Econometrika, 16, 1-32.
Qin, J. and Lawless, J.F. (1994). Empirical likelihood and general estimating
equations. Annals of Statistics 22, 300-325.
Rao, C.R. (1947). Large sample tests for statistical hypotheses concerning
several parameters with applications to problems of estimation. Proc.
Camb. Phil. Soc. 44, 50-57.
Small, C. and McLeish, D.L. (1988). The theory and applications of sta-
tistical inference functions. Lecture Notes in statistics No. 44, Springer
Verlag. Heidelberg, New York, London.
Small, C. and McLeish, D.L. (1994). Hilbert space methods in probability
and statistical inference. John Wiley and Sons, Inc. New York.

You might also like