Barndorff-Nielsen_1987
Barndorff-Nielsen_1987
Barndorff-Nielsen_1987
0. E. Barndorff-Nielsen
1. Introduction 97
2. Review and Preliminaries 99
3. Transformation Models 118
4. Transformation Submodels 127
5. Maximum Estimation and Transformation Models 130
6. Observed Geometries 135
7. Expansion of c | j |^L 147
8. Exponential Transformation Models 152
9. Appendix 1 154
10. Appendix 2 156
11. Appendix 3 157
12. References 159
95
1. INTRODUCTION
97
98 0. E. Barndorff-Nielsen
parameter of the model, is minimal sufficient. The observed geometries and the
closely related expansion of c|j|^L form a parallel to the "expected geometries"
and the associated conditional Edgeworth expansions for curved exponential
families studied primarily by Amari (cf., in particular, Amari 1985, 1986), but
with some essential differences. In particular, the developments in sections 6
and 7 are, in a sense, closer to the actual data and they do not require inte-
grations over the sample space; instead they employ "mixed derivatives of the
log model function." Furthermore, whereas the studies of expected geometries
have been largely concerned with curved exponential families the approach taken
here makes it equally natural to consider other parametric models, and in par-
ticular transformation models. The viewpoint of conditional inference has been
instrumental for the constructions in question. However, the observed geometri-
cal calculus, as discussed in section 6, does not require the employment of
exact or approximate ancillaries.
The observed geometries provide examples of the concept of
statistical manifolds discussed by Lauritzen (1986).
Throughout the paper examples are given to illustrate the general
results.
2. REVIEW AND PRELIMINARIES
and hence
Jf 0 g (y) = J f (g(y))J g (y). (2.2)
99
100 0. E. Barndorff-Nielsen
The inverse matrices of j and i are referred to as observed and expected forma-
tion, respectively.
Suppose the minimal sufficient statistic t for M^ is of dimension k.
We then speak of M as a (k,d)-model (d being the dimension of the parameter ω).
Let (ω,a) be a one-to-one transformation of t, where ω is the maximum likeli-
hood estimator of ω and a, of dimension k-d, is an auxiliary statistic.
In most applications it will be essential to choose a so as to be
distribution constant either exactly or to the relevant asymptotic order. Then
a is ancillary and according to the conditionality principle the conditional
model for ω given a is considered the appropriate basis for inference on ω.
However, unless explicitly stated, distribution constancy of a is
not assumed in the following.
There will be no loss of generality in viewing the log likelihood
1 = l(ω) in its dependence on the observation x as being a function of the
minimal sufficient (ω,a) only. Henceforth we shall think of 1 in this manner
and we will indicate this by writing
1=1(ω,ω,a).
Differential and Integral Geometry in Statistical Inference 101
j = j(ω;ώ,a)
etc. It turns out to be of interest to consider the function
«*(<*>) =*(ω;a) = l(ω;ω,a), (2.5)
obtained from l(ω;ώ,a) by substituting ω for ω. Similarly we write
£(ω) = a-(ω a) = j( ω ;ω,a). (2.6)
For a general parametric model p(x ω) and for a general auxiliary a
a conditional probability function p*(ω;ω|a) for ω given a may be defined by
C = p(x;ω)/p(x;ω),
uniformly in ω for /ί(ω-ω) bounded, where n denotes sample size. This holds,
in particular, for (k,d) exponential models. For more details and further
102 0. E. Barndorff-Nielsen
cj^L = φ(ω-ω; ^ j 2
(2.9)
3/2
.{1 + 0(n" )}.
Here ψ(w γ) denotes the probability density function of the normal distribution
with mean 0 and variance γ" . Furthermore, C,, A,, and A« are given by
C
l = 2?{-3U4
+ 12U
3,1 " 5 U 3
+ 24U
2 , 1 U 3 " 2 4 U 2,1 " 1 2 U 2 , 2 }
(2 10
>
and
^ 1 2 j l + P 2 (u)U 3
where P (u), i = 1,...,8, are polynomials, the explicit forms of which are
given in Barndorff-Nielsen (1985), and where U = U n and U c are defined as
v v, u v, s
/ x v = 1,2,3,...
,U, /( ω )x _ 9 s {r v ; (ω;ω,a)}
v,s .(v+s)/2
* s = 0,1,2...
v
r ^ denoting the v -th order derivative of 1 = l( ω ;ώ,a) with respect to ω and
8 S indicating differentiation s times with respect to ω. Note that, in the
v+s
repeated sampling situation, U is of order 0(n"^ " '' ). Hence the
v »s
quantities C^, A-j and Ap are of order 0(n ), Oίn"32) and 0(n ), respectively.
Integration of (2.7) yields an approximation to the conditional
distribution of the likelihood ratio statistic
w = 2{l(ω) - l(ω Q ) (2.11)
Differential and I n t e g r a l Geometry in S t a t i s t i c a l Inference 103
as an app^ imation to p(w;ω Q |a). (The leading term of (2.9) together with
p
procedure procedure
π π
r [i](ω) = 1 (D(ω)/i(ω)^
r
Lv] =
dr ίϋil/iίω) 3 5 , v = 2,3,...,
uω
etc. Furthermore, we write l(ψ) for the log likelihood under the parametriza-
Differential and Integral Geometry in Statistical Inference 105
tion by ψ, though formally this is in conflict with the notation l(ω), and
correspondingly we let 1 = 9 1 = 9 l(ψ), etc.; similarly for other parameter
Λ
dependent quantities. Finally, the symbol over such a quantity indicates that
the maximum likelihood estimate has been substituted for the parameter.
Using this notation and adopting the summation convention that if a
suffix occurs repeatedly in a single expression then summation over that suffix
is understood, we have
1 = 1 ω,
P r /p
1 = 1 J. ωS. + 1 J. (2.13)
v
pσ rs /p /σ r /pσ '
etc., where [3] signifies a sum of three similar terms determined by permutation
of the indices p,σ,τ. On substituting ω for ω in (2.13) we obtain the well-
known relation
j
"pσ = J V s < 7 p >
j (Φ a) = 3-rs(ω;a) ^ ^ .
9ψ 9ψ
Equation (2.15) shows that j is a metric tensor on M, for any given value of the
auxiliary statistic a. Moreover, in wide generality 3- will be positive definite
on M^, and we assume henceforth that this is the case. In fact, for any ωεΩ we
have 3- = j, i.e. observed information at the maximum likelihood point, which is
generally positive definite (though counterexamples do exist).
p
Let A(ω) = [A (ω)] be an array, depending on ω and where
s s
l '•• q
each of the p + q indices runs from 1 to d. Then A is said to be a (p,q)
tensor, or a tensor of contravariant rank p and covariant rank q, if under
106 0. E. Barndorff-Nielsen
rlV.. tl t 2 ...
S S u υ
This product is again a tensor, of rank (p1 + p", q' + q") if (p',qf) and
(p",q") are the ranks of A and B.
Lower rank tensors may be derived from higher rank tensors by con-
traction, i.e. by pairwise identification of upper and lower indices (which
implies a summation).
The parameter space as a manifold. The parameter space Ω may be
viewed as a (pseudo-) Riemannian manifold with (pseudo-) metric determined by
a metric tensor φ, i.e. ψ is a rank 2 covariant, regular and symmetric tensor.
o
The associated Riemannian connection v is determined by the Christoffel symbols
Of
r where tu
?* = Φ ? ψ
rs rsu
and
Γ
ίs 3 t
and the transformation law
On the other hand, any set of functions [r ] which satisfy the law (2.18)
constitute the connection symbols of an affine connection on Ω. It follows that
all affine connections on Ω are of the form
ΓL = r + s (2 19)
rs rs rs \^-^)
where the S are characterized by the transformation law
rsL r st
Γ
rst = Γ rs φ tu a n d S rst = S rs φ tu
0
rst rst ύ rst \t-")
and
=
S S^^.ω/ ω# ω, . (2.23)
pστ rst /p /σ /τ '
Thus, in particular, [S .] is a tensor.
Suppose ψ:β -> ω is a mapping of full rank from an open subset B of
a Euclidean space of dimension d^ < d into Ω. Then ψ is said to be an immer-
sion of B in Ω. We denote coordinates of 3 by β ,β , etc. If ψ is a metric
tensor on Ω then the metric tensor on B induced from Ω by ψ is defined by
Γ
a b c ( 3 ) = Γ rst ( ω ) ω /a ω /b ω /c + φ tu ω /ab ω /c " (2
'25)
(On the left hand side the tensor is expressed in ψ coordinates, on the right
hand side in ω coordinates.) Similarly, a connection r is said to be invariant
1f
rj s (ψ) = rj $ (gω), gεG. (2.28)
1
?*rs = rrs +ψΦ t ursu
S
is a G-invariant connection.
Now, let ψ be the information tensor i on Ω. Then (2.16) takes the
form
?
rst •E { V s V -
Obviously,
= E<y$lt} (2.29)
D i f f e r e n t i a l and I n t e g r a l Geometry i n S t a t i s t i c a l Inference 109
satisfies (2.23) and hence, for any real α an affine connection is defined by
?
rst=EίlrsV+¥E{1ΛV <2 3 0 >
These are the α-connections introduced and studied by Chentsov (1972) and
i(3) = ^ H ω ) ^ . (2.31)
Thus 1(3) equals the Riemannian metric induced from the metric i(ω) on Ω to
the imbedded submanifold Ω Q . Furthermore, the α-connection of the model ML
equals the connection on Ω Q induced from the α-connection on Ω , by the general
construction (2.25).
The measures on Ω defined by
li^dω (2.32)
and
\3\ αω (Z.33)
are both geometric measures, relative to expected and observed information
metric, respectively. Note that (2.33) depends on the value of the auxiliary
statistic a. We shall speak of (2.32) and (2.33) as expected and observed
information measure, respectively. It is an important property of these mea-
sures that they are parametrization invariant. This property follows from
the fact that i and a" are covariant tensors of rank 2. As a consequence we
have that c|j| L (of (2.7)) is parametrization invariant.
Invariant measures. A measure μ on x is said to be invariant with
respect to a group G acting on X^ if gy = y for all gεG.
110 0. E. Barndorff-Nielsen
d ( g ~ M ( x ) = χ(g,x)dμ(x).
dμ(x) = J ^ J O J Γ W ) . (2.34)
Here J / % denotes the Jacobian determinant of the mapping γ(g) of X. onto itself
determined by gεG and (z,u) constitutes an orbital decomposition of x, i.e.
(z,u) is a one-to-one transformation of x such that uεX^ and u is maximal
invariant while zεG and x=zu. For a more detailed discussion see section 3
and appendix 1.
Transformation models. Let G be a group acting on the sample space
)L If the class P^ of probability measures given by the statistical model is
invariant under the induced action of G on the set of all probability measures
on X then the model is called a composite transformation model and if IP
Differential and Integral Geometry in Statistical Inference 111
Here k is the order of the model (2.35) and is equal to the common dimension
of the vectors θ(ω) and t(x), while d denotes the dimension of the parameter ω.
The full exponential model generated by (2.35) has model function
p(x θ) = expίθ t(x) - κ(θ) - h(x)} (2.36)
and κ(θ) is the cumulant transform of the canonical statistic t = t(x). From
the viewpoint of inference on ω there is no restriction in assuming x = t,
since t is minimal sufficient, and we shall often do so. We set τ = τ(θ) = E t,
112 0. E. Barndorff-Nielsen
i.e. τ is the mean value parameter of (2.36), and we write T for τ(intθ)
where Θ denotes the canonical parameter domain of the full model (2.36).
Let f be a real differentiate function defined on an open subset
k
of R . The Legendre transform f T of f is defined by
f τ (y) = χ y-f(χ)
where
y = (Df)(x) =fj(x) .
The Legendre transform is a useful tool in studying various, dualistic aspects
of exponential models (cf. Barndorff-Nielsen (1978a), Barndorff-Nielsen and
Blaesild (1983a)).
In particular, we may use the Legendre transform to define the
dual likelihood function 1 of (2.35) by
-1
1 (ω) = e τ(ω) - l(τ(ω)). (2.37)
Here, and elsewhere, ' as top index indicates maximum likelihood estimation
under the full model. Further, in this connection we take 1 as the sup-log-
likelihood function of (2.36) and then 1 is, in fact, the Legendre transform of
K. Note that for τ = τ(θ) ε T we have l(τ) = θ τ - κ(θ). An inference
methodology, parallel to that of likelihood inference for exponential families,
may be developed from the dual likelihood (2.37). The estimates, tests and
confidence regions discussed by Amari and others under the name of α = -1 (or
mixture) procedures are, essentially, part of the dual likelihood methodology.
More generally, based on Amari's concepts of α-geometry and α-
α
divergence, one may for each αε[-l,l] introduce an "α-likelihood" L by
L(ω) = L(ω t) = exp{-Dα(θ,θ(ω))} (2.38)
where
x log x, α= 1
(1+α)/2
f α (x) = - ^ { Ί - x }, -l<α<l . (2.40)
1-α
-log X, α = -1
α α
Letting 1 = log L we have, in particular,
1
l(θ) = l(θ) = -I(θ,θ) = θ t - κ(θ) - ί(t) (2.41)
and
-1
l(θ) = -I(θ,θ) = θ τ - f(τ) - κ(θ) (2.42)
4 -ίψ*l*) +2^κ(θ) - κ ( ^ θ + ^ θ ) }
Kθ) = -K [e 2 2 2 _υ>
Ί-α
Affine subsets of Θ are simple from the likelihood viewpoint while,
correspondingly, affine subsets of T are simple in dual likelihood theory. Dual
affine foliations, of Θ and T respectively, are therefore of some particular
interest. Such foliations have been studied in Barndorff-Nielsen and Blaesild
(1983a), see also Barndorff-Nielsen and Blaesild (1983b).
Suppose that the auxiliary component a of (ω,a) is approximately or
exactly distribution constant, i.e. a is ancillary. For instance, a may be the
affine ancillary or the directed log likelihood ratio statistic, as defined in
Barndorff-Nielsen (1980, 1986b). We may think of the partitions generated,
respectively, by a and ω as foliations of T, to be called the ancillary
foliation and the maximum likelihood foliation. (Amari's ancillary subspaces
are then, in the present terminology and for α = 1, leaves of the maximum like-
lihood foliation.)
Exponential transformation models. A model M^which is both trans-
formational and exponential is called an exponential transformation model. For
such models we have the following structure theorem (Barndorff-Nielsen,
Blaesild, Jensen and Jorgensen (1982), Eriksen (1984b)).
Theorem 2.1. Let M be an exponential transformation model with
114 0. E. Barndorff-Nielsen
where eεG denotes the identity element. Furthermore, the full exponential model
generated by M_ is invariant under G, and &* = {[A(g~ )*,&(g)]: gεG} is a group of
affine transformations of R leaving θ and into invariant in such a way that
We then have
X
4
I
a §i CD
r-4 ^-» O
I X X
I
0)
•P
•H
Ό
CQ •H
P CD u O
U CO Φ ι-t id
A
ϋ t
U
I
CO
CO •H •H
•H β ω
0 en
> 1
n o
x 0 Q) O
U ϋ ϋ •H
II
CO
o g -P 4J
P -H
5
CO
o
en
1 O
in O
CO
g
11 If CD u
5
o
ft
I
•rH CO
Φ
ϋ Ή ϋ
•H «W H
M Φ V4
+J TJ P
<D
•H
4J
CO
a
CQ U
3. TRANSFORMATION MODELS
118
Differential and Integral Geometry in Statistical Inference 119
i.e.
u(gx) = u(x), z(gx) = gz(x).
and only if K consists of the identity element alone. The quantity h para-
metrizes _P.
Suppose G = HK is a factorization of this kind. For most transform-
ation models of interest, if the action of G on X. is not free then there exists
an orbital decomposition (z,u) of x with zεH and such that for every u the iso-
tropy group G equals K and, furthermore, if z and z 1 are different elements of
H then zu f z'u.
Example 3.1. Hyperboloid model. This model (Barndorff-Nielsen
(1978b), Jensen (1981)) is analogous to the von Mises-Fisher model but pertains
to observations x on the unit hyperboloid H^"1 of R k , i.e.
1 0 0
0 -1
0 . -1
f
SO (l,k-l)
bolic-spherical coordinates as
XQ = cosh u
x. = sinh u cos v,
where the parameters ξ and λ, called the mean direction and the precision,
satisfy ξεHk-1 and λ>0, and where
k/2 1 k/2 1
a k (λ) = λ - /{(2π) ' 2K k / 2 _ 1 (λ)} (3.3)
Vl
1 + l+x
X
1X2 Vk-1
r 1+XΛ
x
h= 1+ 2 X k-1 (3.4)
x
k-1 x 1 x
k-l x 2 Λ
. 1 + k-1
Vl 1+Xn 1+x 0 1+Xn
y0 y, y2
z= 1 + (3.10)
y
2
y1 ++ 2 2
y-i + Yo» then (u,z) constitutes an orbital decomposition of
yεR of the type required for the use of formula (3.9). Letting γ denote the
2 2
action of S0 (l,2) on R one finds that J'( z )(u) = l/l + y 2 + y 2 and hence the
Φ 2
measure
dy(y) =
p
is an invariant measure on R . Shifting to hyperbolic-spherical coordinates
(u,v) for (y , 5 y ? ) this measure is transformed to (3.1) with k = 3.
Below and in sections 4 and 5 we shall draw several important con-
clusions from lemma 3.1 and theorem 3.1. Various other applications may be
found in Barndorff-Nielsen, Blaesild, Jensen and Jorgensen (1982).
Corollary 3.1. Let G = HK be a left factorization of G such that
K is the isotropy group of p. Thus the likelihood function depends on g through
h only. Suppose theorem 3.1 applies with S = H and let L(h) = L(h x) be any
version of the likelihood function. Then, the conditional probability
function of s given w may be expressed in terms of the likelihood function as
12 6 0. E. Barndorff-Nielsen
the invariant measure being denoted here by α, as a standard notation for left
invariant measure on G. This formula, which generalizes a similar
expression for the location-scale model due to Fisher (1934), shows how the
"shape and position" of the conditional distribution of s is simply determined
by the observed likelihood function and the observed s Q , respectively.
Formula (3.11), however, besides being slightly more general, seems
more directly applicable in practice.
4. TRANSFORMATIONAL SUBMODELS
Π 1
p(x 1 5 ...,X n ;μ,σ) = σ" Π f(σ" (x.-μ)). (4.1)
1 n ]
i=l
Here G is the affine group with elements [μ,σ] which may be represented by
2 χ 2 matrices
1 0
JJ σ
the group operation being then ordinary matrix multiplication. The Lie algebra
of G, or equivalently TG , is represented as the set of 2 x 2 matrices of the
127
128 0. E. Barndorff-Nielsen
form
0 0
A=
b a
We have
e t A = I + tA + 2T t 2 A 2 +.
b/a(e t a -l) e t a
where u > 0, vε[0,2π) and x > 0, φε[0,2π). The generating group G = S0 f (l;2)
may be represented as the subgroup of GL(3) whose elements are of the form
2 , 2
0 0 coshχ sinhχ 0
!
cosψ sinψ j I sinhx cosh x 0 (4.4)
-sinψ COSφ 0 0 1 ζ
where -α><ζ<-α>. This determines the so called Iwasa decomposition (cf., for
instance, Barut and Raczka (1980) chapter 3) of S0*(l;2) into the product of
three subgroups, the three factors in (4.4) being the generic elements of the
I 0 0 0 ί 0 1 Γo 1
' 0 0 1 , E2 = j 1 0 0
E
l
=
!
ί E
3
= 1 I
0 - 1 0 0 0 1 -1 0
L
Each of the three subgroups of the Iwasawa decomposition generates
third factor in (4.4) yields, when applied to the distribution (4.3) with
p(u,v;ζ)
u
"SΊnh u c o s v
) ~ 2 ^s i n h u s i n v
(2 Γ Ί λ ~λ( c o s h u
^sinh u ^
0 aa b ~i
exp{t 0
I -c 0
In most cases of interest the model has the following additional structure (pos-
sibly after deletion of a null set from _X , cf. also section 3). There exists
a left factorization G = HK of G, a K-invariant function f on X_, and an orbit-
al decomposition (fr,u) of x such that:
(i) G = K for all u and, furthermore, G p = K. Hence, in particu-
lar, H may be viewed as the parameter space of the model.
(ii) For eyery xε_X the function m(h) = f(h" x) has a unique maximum
on H and the maximum point is fr.
(iii) H may be viewed as an open subset of some Euclidean space R
and for each fixed xεX_ the function m is twice continuously differentiate on H
and the matrix * = ^(h) given by
is positive definite.
In these circumstances we have:
Proposition 5.1. The maximum estimator ft is an equivariant mapping
130
Differential and Integral Geometry in Statistical Inference 131
commutes. Let η be the mapping from G to H that sends a gεG into the uniquely
determined hεH such that g = hk for some kεK. For any fr = ίτί(x) in H we have
that γ(g)fr = ίτί(gx) is determined by
f({ίV(gχ)}"Ί gx) > f(h - 1 gx), hεH. (5.3)
Now, by the K-invariance of f,
-1 Ί Ί
f(h gx) = f U g ^ h ^ x ) = f(η(g" h)" x)
and here η(g" h) ranges over all of H when h ranges over H. Hence (5.3) may be
rewritten as
f(h" Ί χ), hεH,
132 0. E. Barndorff-Nielsen
i . e . , by ( i i ) ,
= n(rt(gx))
or, equivalently,
ίπ(χ)κ =
and this, precisely, expresses the commutativity of (5.2), since p~ (h) = hK.
When the mapping x -> (fί,u) is proper the subgroup K is compact
because K = G . Hence there exists an invariant measure on H, cf. appendix 1.
That |tfpdh is such a measure follows from (3.9) and formula (5.10) below.
In particular, then, there is only one action of G on H at play,
namely γ, and
γ(g)h = η(gh). (5.4)
Mίjpί(h) _ M i (η(rh))
^ h ) * (h) (5.7)
and
2 Ί Ί
3 m(h;x) (h) . 3η(ίτΓ h) (h) Λ ( h u) /./ft-lh^ 3n(ίτΓ h)* fh .
(h) . (5.8)
Differential and Integral Geometry in Statistical Inference 133
(5.9)
On inserting ft for h in (5.7), (5.8) and (5.9) (whereby (5.7) becomes 0) and
combining with (2.1) we obtain (5.6).
From (5.6) we may draw two important conclusions.
First, taking determinants we have
5 ]
ne',u)\h (5.10)
and this, by (3.9) and the tensorial nature of *, implies that j-RfωJl^dω is an
invariant measure on Ω. In connection with formula (5.10) it may be noted that
J
γ'(h) ( e ) = J δ ( h ) ( e )
where 6 denotes left action of the group G on itself. A proof of this latter
formula is given in appendix 2.
Secondly, the tensor -K(ω) is found to be G-invariant, whatever the
value of the ancillary. In fact, by (5.4) we have, for any h Q εH and gεG,
Consequently
- γ (g)
and this together with (5.6) and (2.26) establishes the invariance.
In particular, observed information ^determines a G-invariant
Riemannian metric on the parameter space. The expected information metric i
can also be shown to be G-invariant.
From proposition 5.1 and corollary 3.1 we find
Corollary 5.1. The model function p*(ω*,ω|u) = c|ί<|\/t' is exactly
equal to p(ω;ω|u).
By taking m of (ii) equal to the log likelihood function 1 this
corollary specializes to theorem 4.1 of Barndorff-Nielsen (1983).
Suppose, in particular, that the model is an exponential transform-
134 0. E. Barndorff-Nielsen
α
ation model. Then the above theory applies with m(ω) = l(ω). The essential
α -j
property to check is that l(ω;t(x)) is of the form f(h x). This follows simply
α
from the definition of 1 and theorem 2.1.
6. OBSERVED GEOMETRIES
135
136 0. E. Barndorff-Nielsen
-? = -§f(ω;a) = g(ω;ω,a).
f
rst * ''"At " Vrs * Vrt'
Employing the notation established above we have 9.6- = -*c + -Jc. +9 etc.
u rs rsu rs,u
so that
ω ω ω + + ω ω [3]
/p /σ /τ rS /Pσ /τ '
τ
Further, from (2.13) we obtain, on differentiating with respect to ψ and then
substituting parameter for estimate,
= + ω ω ω + ω ω (6 7
V,τ rs;t /p /σ /τ V,t /Pσ /τ' '
we find
Differential and Integral Geometry in Statistical Inference 137
*rs
or
f
rs ~ * *Vsu
with
In particular, we have
1 -1 = (6J1)
V,rs
where to obtain the latter expression we have used
α
The connections -f, which we shall refer to as the observed α-con-
α
nections, are analogues of the expected α-connections r given by (2.30). The
α α
analogy between r and -F becomes more apparent by rewriting the skewness tensor
(2.29) as
T E{1
rst = " rst
the validity of which follows on differentiation of the formula
E{1
rs + V s
}= (6J2)
°'
which, in turn, may be compared to (6.8).
Under the specifications of a of primary statistical interest one
138 0. E. Barndorff-Nielsen
has that, in broad generality, the observed geometries converge to the corre-
sponding expected geometries as the sample size tends to infinity.
For (k,k) exponential models
\ = (t-τ)Ί.θ}Γ (6.14)
and, letting Θ denote the maximum likelihood estimator of θ under the full model
generated by (2 35), the relation + = ? takes the form
r, s rs
V , s ( ω ) = κ ij ( θ ) θ jr*/s
• -<1jk(θ)θ/rθ/sθ/t
rst i j
;rs^t=irst (6.17)
and
( 61 8 )
^ rs-Ίj^/t^/rs-'rsf
and
l Γ (h;x) = T r
(6 21)
(6.22)
A^ = a s n r (h" 1 l
so that
S
" ~γ(h)
while
B
st = 3 s V
B
s;t
B
st
140 0. E. Barndorff-Nielsen
N . = {N(μ,σ2):(c-μ)/σ = u },
where u denotes the α-fractile of the standard normal distribution, and let
xΊ,...,x be a sample from a distribution in N . The model for x = (x,,... ,x,J
1 n —α,c I n
thus defined is a (2,1) exponential model, except for u = 0 when it is a (1,1)
model. Henceforth we suppose that u =)= 0, i.e. α f h. The model is also a
transformation model relative to the subgroup G of the group of one-dimensional
affine transformations given by
G = ί[c(l - λ),λ]:λ>0},
the group operation being
[c(l - λ),λ][c(l - λ'),λ'] = [c(Ί - λλ'),λλ']
and the action of G on the sample space being
[c(Ί - λ),λ](x r ...,x n ) = (c(Ί - λ) + λx r ...,c(l - λ) + λx n ).
(Note that G is isomorphic to the multiplicative group.)
Letting
a = (x - c)/s',
Differential and Integral Geometry in Statistical Inference 141
s n ^^x. x; ,
from which it is evident that the model for ζ given a is a location model.
Indicating differentiation with respect to ζ and ζ by subscripts ς
and ζ, respectively, we find
l ς = n{-l + b - 2 e 2 ( ^ ζ ) + ab" 1 (u α + a t f V ^ e ^ }
and hence
ϊ = n{2b" 2 + ab" ] (u α + 2ab" Ί )}
= n { 4 b
^ζζζ "2 + a b
"1( u
α
+ 4 a b
"]) }
-2
V(a Σa g"(a
3-(μ,σ) = σ
Σa g"(a ) n+Σa2g"(a
f '(a,)
,σ
= -σ"3{2n + 4zal2 g"(a.)
i
+ za?g"'(a.)}
l i
μμμ
-3
yyσ
Differential and Integral Geometry in Statistical Inference 143
=σ 3{4n
Kao "
Furthermore,
*Wo
and hence
α α α α
2 = -F =f = -F =0
XXX XXΦ XΦX ΦΦΦ
α
¥ A Λ = aλ cosh x sinh χ
xΦΦ
α
-F Λ A = -aλ cosh x sinh χ,
ΦΦx
whatever the value of α. Thus, in this case, the α-geometries are identical.
We note again that whereas the auxiliary statistic a is taken so
as to be ancillary in the various examples discussed here - exactly distribu-
144 0. E. Barndorff-Nielsen
tion constant in the three examples above and asymptotically distribution con-
stant in the one to follow - ancillarity is no prerequisite for the general
theory of observed geometries.
Furthermore, let a be any statistic which depends on the minimal
sufficient statistic t, say, only and suppose that the mapping from t to (ω,a)
is defined and one-to-one on some subset T~ of the full range X of values of t
though not, perhaps, on all of ]_. We can then endow the model M^ with observed
geometries, in the manner described above, for values of t in T~. The
next example illustrates this point.
The above considerations allow us to deal with questions of non-
uniqueness and nonexistence of maximum likelihood estimates and nonexistence of
exact ancillaries, especially in asymptotic considerations.
Example 6.4. Inverse Gaussian - Gaussian model. Let x( ) and y( )
2
be independent Brownian motions with a common diffusion coefficient σ = 1 and
drift coefficients μ>0 and ξ, respectively. We observe the process x( ) till it
first hits a level x«>0 and at the time u when this happens we record the value
v = y(u) of the second process. The joint distribution of u and v is then
given by
p(u,v;μ,ξ)
= (2π)" Ί x o ne ° G" 2 e 2
° e 2 2
. (6.26)
Now, assume ξ equal to μ. The model (6.26) is then a (2,1) exponential model,
still with t as minimal sufficient statistic. The maximum likelihood estimate
of μ is undefined if t^T^ where
Differential and Integral Geometry in Statistical Inference 145
IQ = it = (ΰ,v):x0 + v > 0}
he event t^T^ happens with a probability that decreases exponentially fast with
he sample size n and may therefore be ignored for most statistical purposes.
Defining, formally, μ to be given by (6.27) even for t^T^ and let-
ing
a = Φ"(ΰ;2nxQ,2 n μ 2 ) ,
here Φ ( ;x»ψ) denotes the distribution function of the inverse Gaussian dis-
ribution with density function
e have that the mapping t -> (μ,a) is one-to-one from X = {t = (ΰ,v):ΰ>0} onto
-oo,+») x (0,oo) and that a is asymptotically ancillary and has the property
hat p*(μ;μ|a) =c|j p L approximates the actual conditional density of μ given
to order 0 ( n ~ 3 / 2 ) , cf. Barndorff-Nielsen (1984).
Letting Φ ( ;x»ψ) denote the inverse function of φ"( ;χ,ψ) we may
rite the log likelihood function for μ as
-2
=
Π { ( X Q+ V)μ - Uμ }
o that
Xg 92nμ )
+ = 0
μyy
nd
14 6 0. E. Barndorff-Nielsen
^μμ μ = 8n
^(φ" ° W O U ;2nx2,2nμ2)
1 -1
=s = -h $
μμμ μμμ
φ"(x;χ,ψ) = φ ( ψ V - χhx'h) +
1-1= Σ VX
V>>
(ω-ώ) Ί ...(ω-ω) v (8ΓΓ ...3 l)(ω)
v=2 ll ΓΓvv
oo , vv rΊ r
izJJ_ (u- ω ) Ί ...(ω-ω) V
= Σ
v=2
. Σ X P J
(ω-ω) Sl ...(ω-ω) Sp 3. . . . 3 . \
O η d l
..._ .
η l
(7.1)
O μ
1 p 1 v
Consequently, writing δ for ω-ω and 6 "' for (ω-ω) (ω-ω) ..., we have
147
148 0. E. Barndorff-Nielsen
It follows that
-" t U { *Γ S ί + r s t u + + r s t ; u + * r s u ; t + + r s ; t u )
(7.3)
) dd/2
/ 2
= (2π) cφ d (ω-ω;aΉl + A ] + A 2 + ...} (7.4)
where Φ.( a-) denotes the density function of the d-dimensional normal distribu-
tion with mean 0 and precision (i.e. inverse variance-covariance matrix) a- and
where
A + +
l •- " V ^ W \^ ^St(+rs;t+ I *rst) (7
"5)
and
A2 = ± [- 3δ
1
^ » s s n
rs t r s t M vw u
Differential and Integral Geometry in Statistical Inference 149
8
*rst;u
A^ and A 2 being of order Oίn""15) and 0(n ), respectively, under ordinary repeat-
ed sampling.
By integration of (7.4) with respect to ω we obtain
where C-. is obtained from A« by changing the sign of A« and making the sub-
stitutions
δ rstu
the 3 and 15 terms in the two latter expressions being obtained by appropriate
permutations of the indices (thus, for example, <s r s t u -> j - r s ^ t u + > r t ^ s u +
c | j | ^ L = φ ( ω - ω ; ί ) { l + A1 + ( A g + C ^ + . . . } (7.8)
with an error term which in wide generality is of order 0(n-3/2 ) under repeated
sampling. In comparison with an Edgeworth expansion it may be noted that the
expansion (7.8) is in terms of mixed derivatives of the log model function,
rather than in terms of cumulants, and that the error of (7.8) is relative,
rather than absolute.
In particular, under repeated sampling and if the auxiliary statis-
tic is (approximately or exactly) ancillary such that
3/2
p(ω;ω|a) = p*(ω;ω|a){l + 0(n" )}
(cf. section 2) we generally have
150 0. E. Barndorff-Nielsen
Cr
_ 1 r0 rs tu /o ru sv tw ,o rs tu vw λ 1
l " 24{ 3 κ rstu κ κ " κ rst κ uvw ( 2 κ κ κ + 3 κ κ κ ) }
tω =ω + μ . + y^ +
Φd(ω - ω -
where
Since
hΓSt(S';j) = δ ' W * - / V ^ ] (7.14)
we find
h r S t ( δ ' ; ^ r $ t =0
^ab ^ h + H xa^a'iK ' ^^us ^-^^ offers some simplification over the cor-
a κ
|A(h(gx))| = |A(g)||A(h(x))|.
152
Differential and Integral Geometry in Statistical Inference 153
(8.6)
154
Differential and Integral Geometry in Statistical Inference 155
To see the validity of (A1.3) one needs only note that for fixed u the mapping
k -> J ,. x(u) is a multiplier on K and since K is compact this must be the
trivial multiplier 1. Actually, (A1.3) is a necessary and sufficient condition
for the existence of an invariant measure on _Y. This may be concluded from
Kurita (1959), cf. also Santalό (1979), section 10.3.
APPENDIX 2
sections 3 and 5 ) , let γ denote the natural action of G on H and let δ denote
Then J '(h)(e) = J δ ( h ) ^
for a11 hεH#
left action of G on itself.
Proof. Let g = hk denote an arbitrary element of G. Writing g
symbolically as (h,k) and employing the mappings η and ζ defined by
0
ah
Dδ(h')(g) =
9ζ(h'hk) 3ζ(h'hk)
ah 9k
J (e) =
δ(h')
J
γ(h')(e)
156
APPENDIX 3
An inversion result
The validity of formula (6.24) is established by the following
Lemma. Let G = HK be a left factorization of the group G with the
associated mapping η:g = hk -> h (as discussed in sections 3 and 5). Further-
more, let h1 denote an arbitrary element of H. Then
where i indicates the inversion g -> g-1'. This diagram of mappings between dif-
ferentiate manifolds induces a corresponding diagram for the associated dif-
ferential mappings between the tangent spaces of the manifolds, namely
157
158 0. E. Barndorff-Nielsen
"Hi —> TG .
Di
Dη
TH
n(hI-1h)
Acknowledgements
I am much indebted to Poul Svante Eriksen, Peter Jupp, Steffen L.
Lauritzen, Hans Anton Salomonsen and Jorgen Tornehave for helpful discussions*
and to Lars Smedegaard Andersen for a careful checking of the manuscript.
REFERENCES
159
160 0. E. Barndorff-Nielsen