Algorithms 15 00092
Algorithms 15 00092
Article
Mean Estimation on the Diagonal of Product Manifolds
Mathias Højgaard Jensen † and Stefan Sommer *,†
Keywords: diffusion mean; Fréchet mean; bridge simulation; geometric statistics; geometric deep
learning
Citation: Højgaard Jensen, M.;
Sommer, S. Mean Estimation on the 1. Introduction
Diagonal of Product Manifolds.
The Euclidean expected value can be generalized to geometric spaces in several ways.
Algorithms 2022, 15, 92. https://
Fréchet [1] generalized the notion of mean values to arbitrary metric spaces as minimizers
doi.org/10.3390/a15030092
of the sum of squared distances. Fréchet’s notion of mean values therefore naturally
Academic Editor: Stephanie includes means on Riemannian manifolds. On manifolds without metric, for example,
Allassonniere affine connection spaces, a notion of the mean can be defined by exponential barycenters,
see, e.g., [2,3]. Recently, Hansen et al. [4,5] introduced a probabilistic notion of a mean,
Received: 5 February 2022
the diffusion mean. The diffusion mean is defined as the most likely starting point of a
Accepted: 3 March 2022
Published: 10 March 2022
Brownian motion given the observed data. The variance of the data is here modelled in
the evaluation time T > 0 of the Brownian motion, and Varadhan’s asymptotic formula
Publisher’s Note: MDPI stays neutral relating the heat kernel with the Riemannian distance relates the diffusion mean and the
with regard to jurisdictional claims in Fréchet mean in the T → 0 limit.
published maps and institutional affil-
Computing sample estimators of geometric means is often difficult in practice. For
iations.
example, estimating the Fréchet mean often requires evaluating the distance to each sample
point at each step of an iterative optimization to find the optimal value. When closed-
form solutions of geodesics are not available, the distances are themselves evaluated by
Copyright: © 2022 by the authors.
minimizing over curves ending at the data points, thus leading to a nested optimization
Licensee MDPI, Basel, Switzerland.
problem. This is generally a challenge in geometric statistics, the statistical analysis of
This article is an open access article
geometric data. However, it can pose an even greater challenge in geometric deep learning,
distributed under the terms and where a weighted version of the Fréchet mean is used to define a generalization of the
conditions of the Creative Commons Euclidean convolution taking values in a manifold [6]. As the mean appears in each
Attribution (CC BY) license (https:// layer of the network, closed-form geodesics is in practice required for its evaluation to be
creativecommons.org/licenses/by/ sufficiently efficient.
4.0/).
M N
µ̂
Figure 1. (left) The mean estimator viewed as a projection onto the diagonal of a product mani-
fold. Given a set x1 , . . . , xn ∈ M, the tuple ( x1 , . . . , xn ) (blue dot) belongs to the product manifold
M × · · · × M. The mean estimator µ̂ can be identified with the projection of ( x1 , . . . , xn ) onto the
diagonal N (red dot). (right) Diffusion mean estimator in R2 using Brownian bridges conditioned
on the diagonal. Here a Brownian bridge Xt = ( X1,t , . . . , X4,t ) in R8 is conditioned on hitting the
diagonal N ⊆ R8 at time T > 0. The components X j each being two-dimensional processes are
shown in the plot.
Contribution
In this paper, we present a comprehensive investigation of the simulation-based mean
sampling approach. We provide the necessary theoretical background and results for the
construction, we present two separate simulation schemes, and we demonstrate how the
schemes can be used to compute means on high-dimensional manifolds.
2. Background
We here outline the necessary concepts from Riemannian geometry, geometric statistics,
stochastic analysis, and bridge sampling necessary for the sampling schemes presented
later in the paper.
an open subset of Rd , or, equivalently, Rd itself. The exponential normal chart is often a
convenient choice to parameterize a manifold for computational purposes. The exponential
chart is related to the exponential map exp p : Tp M → M that for each p ∈ M is given by
exp p (v) = γv (1), where γv is the unique geodesic satisfying γv (0) = p and γv0 (0) = v.
For each p ∈ M, the exponential map is a diffeomorphism from a star-shaped subset
V centered at the origin of Tp M to its image exp p (V ) ⊆ M, covering all M except for a
subset of (Riemannian) volume measure zero, Cut( p), the cut-locus of p. The inverse map
log p : M \ Cut( p) → Tp M provides a local parameterization of M due to the isomorphism
between Tp M and Rd . For submanifolds N ⊆ M, the cut-locus Cut( N ) is defined in a
fashion similar to Cut( p), see e.g., [8].
Stochastic differential equations on manifolds are often conveniently expressed using
the frame bundle FM, the fiber bundle which for each point p ∈ M assigns a frame or
basis for the tangent space Tp M, i.e., FM consists of a collection of pairs ( p, u), where
u : Rd → Tp M is a linear isomorphism. We let π denote the projection π : FM → M. There
exist a subbundle of FM consisting of orthonormal frames called the orthonormal frame
bundle OM. In this case, the map u : Rd → Tp M is a linear isometry.
In the present context, the metric space is a Riemannian manifold M with Riemannian
distance function d. Given realizations x1 , . . . , xn ∈ M from a distribution on M, the
estimator of the weighted Fréchet mean is defined as
n
µ̂ = arg min ∑ wi d( p, xi )2 , (2)
p∈ M i =1
where w1 , . . . , wn are the corresponding weights satisfying wi > 0 and ∑i wi = 1. When the
weights are identical, (2) is an estimator of the Fréchet mean. Throughout, we shall make
no distinction between the estimator and the Fréchet mean and will refer to both as the
Fréchet mean.
In [6,10], the weighted Fréchet mean was used to define a generalization of the Eu-
clidean convolution to manifold-valued inputs. When closed-form solutions of geodesics
are available, the weighted Fréchet mean can be estimated efficiently with a recursive
algorithm, also denoted an inductive estimator [6].
Please note that the evaluation time is here scaled by the weights. This is equivalent to
scaling the variance of the steps of the Brownian motion [12].
As closed-form expressions for the heat kernel are only available on specific manifolds,
evaluating the diffusion t-mean often rely on numerical methods. One example of this
is using bridge sampling to numerically estimate the transition density [9,13]. If a global
coordinate chart is available, the transition density can be written in the form (see [14,15])
s
det g(v) − ka(z)(z−v)k2
p T (z, v) = e 2T E[ ϕ], (5)
(2πT )2
where g is the metric matrix, a a square root of g, and ϕ denotes the correction factor
between the law of the true diffusion bridge and the law of the simulation scheme. The
expectation over the correction factor can be numerically approximated using Monte
Carlo sampling. The correction factor will appear again when we discuss guided bridge
proposals below.
where pt (·, ·) denotes the transition density of the diffusion X, a = σσ T , and where W̃ is
a d-dimensional Brownian motion under a changed probability law. From a numerical
viewpoint, in most cases, the transition density is intractable and therefore direct simulation
of (7) is not possible.
Algorithms 2022, 15, 92 5 of 16
dPh Rt
h(s,Xs ) T dWs − 21
Rt 2 ds
=e 0 0 k h ( s,Xs )k . (9)
dP Ft
From (7), it is evident that h(t, x ) = σ T ∇ x log p T −t ( x, v) gives the diffusion bridge.
However, different choices of the function h can yield processes which are absolutely
continuous regarding the actual bridges, but which can be simulated directly.
Delyon and Hu [17] suggested to use h(t, x ) = σt−1 ( x )∇ x log q T −t ( x, v), where q
denotes the transition density of a standard Brownian motion with mean v, i.e., qt ( x, v) =
(2πt)−d/2 exp(−k x − vk2 /2t). They furthermore proposed a method that would disregard
the drift term b, i.e., h(t, x )) = σt−1 ( x )∇ x log q T −t ( x, v) − σt−1 ( x )bt ( x ). Under certain
regularity assumptions on b and σ, the resulting processes converge to the target in the sense
that limt→T Yt = v a.s. In addition, for bounded continuous functions f , the conditional
expectation is given by
E[ f ( X )| XT = v] = CE[ f (Y ) ϕ(y)], (10)
where ϕ is a functional of the whole path Y on [0, T ] that can be computed directly. From
the construction of the h-function, it can be seen that the missing drift term is accounted for
in the correction factor ϕ.
The simulation approach of [17] can be improved by the simulation scheme introduced
by Schauer et al. [18]. Here, an h-function defined by h(t, x ) = ∇ x log p̂ T −t ( x, v) is sug-
gested, where p̂ denotes the transition density of an auxiliary diffusion process with known
transition densities. The auxiliary process can for example be linear because closed-form
solutions of transition densities for linear processes are available. Under the appropriate
measure Ph , the guided proposal process is a solution to
dYt = bt (Yt )dt + at (Yt )∇ x log p̂ T −t ( x, v)| x=Yt dt + σt (Yt )dWt . (11)
Note the factor a(t, y) in the drift in (7) which is also present in (11) but not with the
scheme proposed by [17]. Moreover, the choice of a linear process grants freedom to model.
For other choices of an h-functions see e.g., [19,20].
Marchand [19] extended the ideas of Delyon and Hu by conditioning a diffusion
process on partial observations at a finite collection of deterministic times. Where Delyon
and Hu considered the guided diffusion processes satisfying the SDE
Yt − v
dYt = bt (Yt )dt − dt + σt (Yt )dwt , (12)
T−t
for v ∈ Rd over the time interval [0, T ], Marchand proposed the guided diffusion process
conditioned on partial observations v1 , . . . , v N solving the SDE
n
Yt − uk
dYt = bt (Yt )dt − ∑ Ptk (Yt ) 1
Tk − t (Tk −ε k ,Tk )
dt + σt (Yt )dwt , (13)
k =1
where σ denotes the matrix square root of the inverse of the Riemannian metric tensor ( gij )
and bk ( x ) = − 12 gij ( x )Γijk ( x ) is the contraction over the Christoffel symbols (see, e.g., [11]
Chapter 3). Strictly speaking, the solution of Equation (15) is defined by xti = φα ( xt )i .
We thus have two concrete SDEs for the Brownian motion in play: The FM SDE (14)
and the coordinate SDE (15).
Throughout the paper, we assume that M is stochastically
R complete, i.e., the Brownian
motions does not explode in finite time and, consequently, M pt ( x, y)d Vol M (y) = 1, for all
t > 0 and all x ∈ M.
d( x,v)2
k t ( x, v) = (2πt)−n/2 e− 2t J −1/2 ( x ),
and J ( x ) = | det Dexp(v)−1 expv | denotes the Jacobian determinant of the exponential map
at v. Elworthy and Truman used the semi-classical bridge to obtain heat-kernel estimates,
and the semi-classical bridge has been studied by various others [24,25].
Algorithms 2022, 15, 92 7 of 16
By Varadhan’s result (see [11] Theorem 5.2.1), as t → T, we have the asymptotic rela-
tion (( T − t) log p T −t ( x, y) ∼ − 12 d( x, y)2 . In particular, the following asymptotic relation
was shown to hold by Malliavin, Stroock, and Turetsky [26,27] : ( T − t)∇ log p T −t ( x, y) ∼
− 12 ∇d( x, y)2 . From these results, the generators of the Brownian bridge and the semi-
classical bridge differ in the limit by a factor of − 12 ∇ log J ( x ). However, under a certain
boundedness condition, the two processes can be shown to be identical under a changed
probability measure [8] Theorem 4.3.1.
To generalize the heat-kernel estimates of Elworthy and Truman, Thompson [8,28]
considered the Fermi bridge process conditioned to arrive in a submanifold N ⊆ M
at time T > 0. The Fermi bridge is defined as the diffusion process with generator
2 ∆ + ∇ log q T −t (·, N ), where
1
d( x,N )2
qt ( x, N ) = (2πt)−n/2 e− 2t .
For both bridge processes, when M = Rd and N is a point, both the semi-classical
bridge and the Fermi bridge agree with the Euclidean Brownian bridge.
Ref [15] introduce a numerical simulation scheme for conditioned diffusions on Rie-
mannian manifolds, which generalize the method by Delyon and Hu [17]. The guiding
term used is identical to the guiding term of the Fermi bridge when the submanifold is a
single point v.
This can be seen from the fact that if X ∼ N (µ, Σ) then for any linear transformation
−1
AX + b ∼ N b + Aµ, AΣA T. Defining the random variable Z = X − σ12 σ22 Y, the result
−1 −1
applied to ( Z, X ) gives Z ∼ N µ1 − σ12 σ22 µ2 , σ11 − σ12 σ22 σ21 . The conclusion then follows
−1
from X = Z + σ12 σ22 Y. Please note that X and Y are independent if and only if σ12 = σ21 = 0
and the conditioned random variable is in this case identical in law to X.
Algorithms 2022, 15, 92 8 of 16
Y1 |Y1 = · · · = Yn is normal N ∑∑i wwi xi , ∑ Tw . This can be seen inductively: The conditioned
i i i i
random variable Y1 |Y1 = Y2 is identical to Y1 |Y1 − Y2 = 0. Now let X = Y1 and Y = Y1 − Y2
and refer to Example 1. To conclude, assume Zn := Y1 |Y1 = · · · , = Yn−1 follows the desired
normal distribution. Then Zn | Zn = Yn is normally distributed with the desired parameters and
Zn | Zn = Yn is identical to Y1 |Y1 = · · · = Yn .
The following example establishes the weighted average as a projection onto the
diagonal.
Example 3. Let x be a point in (Rd )n and let P be the orthogonal projection to the diagonal of
T
(Rd )n such that Px = nd 1
∑ind
=1 x i . . . 1
∑ nd
nd i =1 x i . We see that the projection yields n copies of
the arithmetic mean of the coordinates. This is illustrated in Figure 2.
M N
µ̂
M
Figure 2. The mean estimator viewed as a projection onto the diagonal of a product manifold.
Conditioning on the closest point in the diagonal yields a density on the diagonal depending on the
time to arrival T > 0. As T tends to zero the density convergence to the Dirac-delta distribution
(grey), whereas as T increases the variance of the distribution increases (rouge).
w −1 w −1
Theorem 1. Let Xt = ( X1,t1 , . . . , Xn,tn ) consist of n independent Brownian motions on M with
variance wi−1 and Xi,0 = xi , and let P∗ the law of the conditioned process Yt = Xt | XT ∈ N,
N = diag( Mn ). Let v be the random variable Y1,T . Then v has density pYv (y) ∝ ∏in=1 p T/wi ( xi ; y)
and v = Yi,T for all i a.s. (almost surely).
Algorithms 2022, 15, 92 9 of 16
w −1
Proof. p TX (( x1 , . . . , xn ), (y, . . . , y)) = ∏in=1 p TX i
( xi , y) because the processes Xi,t are inde-
w −1
X i
pendent. By symmetry of the Brownian motion and the time rescaling property, p T i ( xi , y)
= p T/wi (y, xi ). For elements (y, . . . , y) ∈ diag( Mn ) and x ∈ Mn , pv (y) = pYT ( x, y) ∝
p TX ( x, y). As a result of the conditioning, v = Y1,T = · · · = Yn,T . In combination, this
establishes the result.
Consequently, the set of modes of pv equal the set of the maximizers for the likelihood
L(y; x1 , . . . , xn ) = ∏in=1 p T/wi ( xi ; y) and thus the weighted diffusion mean. This result is
the basis for the sampling scheme. Intuitively, if the distribution of v is relatively well
behaved (e.g., close to normal), a sample from v will be close to a weighted diffusion mean
with high probability.
In practice, however, we cannot sample Yt directly. Instead, we will below use guided
proposal schemes resulting in processes Ỹt with law P̃ that we can actually sample and that
under certain assumptions, will be absolutely continuous with respect to Yt with explicitly
ϕ(ỸT )
computable likelihood ratio so that P∗ = P̃.
EP̃ [ ϕ(ỸT )]
Corollary 1. Let P̃ be the law of Ỹt and ϕ be the corresponding correction factor of the guiding
ϕ(ỸT )
scheme. Let ṽ be the random variable Ỹ1,T with law P̃. Then ṽ has density pṽ (y) ∝
EP̃ [ ϕ(ỸT )]
∏in=1 p T/wi ( xi ; y).
where r̃ denotes the lift of the radial distance to N defined by r̃ N (u) := r N (π (u)) =
d(π (u), N ). The Fermi bridge Y F is the projection of U to M, i.e., YtF := π (Ut ). Let PF
denotes its law.
r N (YsF ) ∂ −1
d log ϕ(YsF ) = (dηs + dLs ) with dηs = log Θ N 2 ds,
T−s ∂r N
dLs := dLs (Y F ) with L being the geometric local time at Cut( N ), and Θ N the determinant of the
derivative of the exponential map normal to N with support on Mn \ Cut( N ) [8].
Algorithms 2022, 15, 92 10 of 16
Since N is a totally geodesic submanifold of dimension d, the results of [8] can be used
to give sufficient conditions to extend the equivalence in (17) to the entire interval [0, T ]. A
set A is said to be polar for a process Xt if the first hitting time of A by X is infinity a.s.
ϕ(YTF )
In particular, F d P F ∝ d P∗ .
E [ ϕ(YTF )]
P
For numerical purposes, the equivalence (17) in Theorem 2 is sufficient as the interval
[0, T ] is finitely discretized. To obtain the result on the full interval, the conditions in
Corollary 2 may at first seem quite restrictive. A sufficient condition for a subset of a
manifold to be polar for a Brownian motion is its Hausdorff dimension being two less than
the dimension of the manifold. Thus, Cut( N ) is polar if dim(Cut( N )) ≤ nd − 2. Verifying
whether this is true requires specific investigation of the geometry of Mn .
The SDE (16) together with (17) and the correction ϕ gives a concrete simulation
scheme that can be implemented numerically. Implementation of the geometric constructs
is discussed in Section 4. The main complication of using Fermi bridges for simulation is that
it involves evaluation of the radial distance r N at each time-step of the integration. Since the
radial distance finds the closest point on N to x1 , . . . , xn , it is essentially a computation of the
Fréchet mean and thus hardly more computationally efficient than computing the Fréchet
mean itself. For this reason, we present a coordinate-based simulation scheme below.
1
X1,t
1
b ( X1,t )
. ..
..
.
X d bd ( X ) Σ ( X1,t , . . . , Xn,t )
1
1,t 1,t
. .. ..
.. =
d dt + dWt .
(18)
. .
Σ ( X1,t , . . . , Xn,t )
1 1 d
Xn,t b ( Xn,t )
.. ..
. .
d
Xn,t bd ( Xn,t )
In the product chart, Σ and b satisfy the same assumptions as the metric and cometric
tensor and drift listed above.
The conditioning XT ∈ N is equivalent to the requiring XT ∈ diag((Rd )n ) in coor-
dinates. diag((Rd )n ) is a linear subspace of (Rd )n , we let L ∈ Md×nd be a matrix with
orthonormal rows and ker L = diag((Rd )n ) so that the desired conditioning reads LXT = 0.
Define the following oblique projection, similar to [19],
Pt ( x ) = a( x ) L T A( x ) L (19)
where
a( x ) = Σ( x )Σ( x )T and At ( x ) = ( La( x ) L T )−1 .
Set β( x ) = Σ( x ) T L T A( x ). The guiding scheme (13) then becomes
LYt
dYt = b(Yt )dt + Σ(Yt )dWt − Σ(Yt ) β(Yt ) 1 (t)dt, Y0 = u. (20)
T − t (T −ε,T )
We have the following result.
With the same assumptions, we also obtain the following result similar to [19] Theorem 3.
Theorem 3. Let Yt be a solution of (20), and assume the drift b is bounded. For any bounded
function f ,
E[ f ( X )| XT ∈ N ] = CE[ f (y) ϕ(Y )], (21)
where C is a positive constant and
k β T −ε (YT −ε ) LYT −ε k2
q
ϕ(Yt ) = det( A(YT )) exp −
2ε
Z T
2( LYs ) T Lb(Ys )ds − ( LYs ) T d( A(Ys )) LYs + d[ A(Ys )ij , ( LYs )i ( LYs ) j ]
−
T −ε 2( T − s )
The theorem can also be applied for unbounded drift by replacing b with a bounded
approximation and performing a Girsanov change of measure.
ϕ
i.e., the rescaling E[ ϕ] of the guided law in Theorem 1. A simple way to achieve this is
to do sampling importance resampling (SIR) as described in Algorithm 1. This yields an
approximation of the weighted diffusion mean. For each sample yi of the guided bridge
process, we compute the corresponding correction factor ϕ(yi ). The resampling step then
j
consists of picking y T with a probability determined by the correction terms, i.e., with J the
j
ϕ(y T )
number of samples we pick sample j with probability Pj = J .
∑i=1 ϕ(yiT )
4. Experiments
We here exemplify the mean sampling scheme on the two-sphere S2 and on finite sets
of landmark configurations endowed with the LDDMM metric [29,30]. With the experiment
on S2 , we aim to give a visual intuition of the sampling scheme and the variation in the
diffusion mean estimates caused by the sampling approach. In the higher-dimensional
landmark example where closed-form solutions of geodesics are not available, we compare
to the Fréchet mean and include rough running times of the algorithms to give a sense of the
reduced time complexity. Note, however, that the actual running times are very dependent
on the details of the numerical implementation, stopping criteria for the optimization
algorithm for the Fréchet mean, etc.
The code used for the experiments is available in the software package Jax Geome-
try http://bitbucket.org/stefansommer/jaxgeometry (accessed on 4 February 2022). The
implementation uses automatic differentiation libraries extensively for the geometry com-
putations as is further described [31].
Figure 3. 3 points on S2 together with a sample mean (red) and the diagonal process in (S2 )n , n = 3
with T = 0.2 conditioned on the diagonal.
Figure 4. (left) 256 sampled data points on S2 (north pole being population mean). (right) 32 samples
of the diffusion mean conditioned on the diagonal of (S2 )n , n = 256, T = 0.2. As can be seen, the
variation in the mean samples is limited.
In Figure 7, an estimated diffusion mean and Fréchet mean for the landmark configu-
rations are plotted together. On a standard laptop, generation of one sample diffusion mean
takes approximately 1 s. For comparison, estimation of the Fréchet mean with the stan-
dard nested optimization approach using the Riemannian logarithm map as implemented
in Jax Geometry takes approximately 4 min. The diffusion mean estimation performed
in [13] using direct optimization of the likelihood approximation with bridge sampling
from the mean candidate to each data point is comparable in complexity to the Fréchet
mean computation.
Figure 5. (left) One configuration of 17 landmarks overlayed the MR image from which the config-
uration was annotated. (right) All 14 landmark configurations plotted together (one color for each
configuration of 17 landmarks).
Figure 6. Samples from the diagonal process with T = 0.2 (left) and T = 1 (right). The effect of
varying the Brownian motion end time T is clearly visible.
Algorithms 2022, 15, 92 15 of 16
Figure 7. One sampled diffusion mean with the sampling scheme (blue configuration) together
with estimated Fréchet mean (green configuration). The forward sampling scheme is significantly
faster than the iterative optimization needed for the Fréchet mean on the landmark manifold where
closed-form solutions of the geodesic equations are not available.
5. Conclusions
In [7], the idea of sampling means by conditioning on the diagonal of product mani-
folds was first described and the bridge sampling construction sketched. In the present
paper, we have provided a comprehensive account of the background for the idea, includ-
ing the relation between the (weighted) Fréchet and diffusion means, and the foundations
in both geometry and stochastic analysis. We have constructed two simulation schemes
and demonstrated the method on both low and a high-dimensional manifolds, the sphere
S2 and the LDDMM landmark manifold, respectively. The experiments show the feasibility
of the method and indicate the potential high reduction in computation time compared to
computing means with iterative optimization.
Author Contributions: Conceptualization, S.S.; methodology, M.H.J. and S.S.; software, S.S.; valida-
tion, M.H.J. and S.S.; formal analysis, M.H.J. and S.S.; investigation, M.H.J. and S.S.; resources, S.S.;
data curation, S.S.; writing—original draft preparation, M.H.J. and S.S.; writing—review and editing,
M.H.J. and S.S.; visualization, S.S.; supervision, S.S.; project administration, M.H.J. and S.S.; funding
acquisition, S.S. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Villum Fonden grant number 22924 and grant number 40582,
and Novo Nordisk Fonden grant number NNF18OC0052000.
Acknowledgments: The work presented is supported by the CSGB Centre for Stochastic Geometry
and Advanced Bioimaging funded by a grant from the Villum foundation, the Villum Foundation
grants 22924 and 40582, and the Novo Nordisk Foundation grant NNF18OC0052000.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or
in the decision to publish the results.
Algorithms 2022, 15, 92 16 of 16
References
1. Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. Ann. L’Institut Henri Poincaré 1948, 10,
215–310.
2. Arnaudon, M.; Li, X.M. Barycenters of measures transported by stochastic flows. Ann. Probab. 2005, 33, 1509–1543. [CrossRef]
3. Pennec, X. Barycentric Subspace Analysis on Manifolds. Ann. Stat. 2018, 46, 2711–2746. [CrossRef]
4. Hansen, P.; Eltzner, B.; Sommer, S. Diffusion Means and Heat Kernel on Manifolds. In Geometric Science of Information; Lecture
Notes in Computer Science; Springer Nature: Cham, Switzerland, 2021; pp. 111–118. [CrossRef]
5. Hansen, P.; Eltzner, B.; Huckemann, S.F.; Sommer, S. Diffusion Means in Geometric Spaces. arXiv 2021, arXiv:2105.12061.
6. Chakraborty, R.; Bouza, J.; Manton, J.H.; Vemuri, B.C. ManifoldNet: A Deep Neural Network for Manifold-Valued Data With
Applications. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 799–810. [CrossRef]
7. Sommer, S.; Bronstein, A. Horizontal Flows and Manifold Stochastics in Geometric Deep Learning. IEEE Trans. Pattern Anal.
Mach. Intell. 2022, 44, 811–822. [CrossRef]
8. Thompson, J. Submanifold Bridge Processes. Ph.D. Thesis, University of Warwick, Coventry, UK, 2015.
9. Pennec, X.; Sommer, S.; Fletcher, T. Riemannian Geometric Statistics in Medical Image Analysis; Elsevier: Amsterdam, The Netherlands,
2020.
10. Pennec, X.; Fillard, P.; Ayache, N. A Riemannian Framework for Tensor Computing. Int. J. Comput. Vis. 2006, 66, 41–66. [CrossRef]
11. Hsu, E.P. Stochastic Analysis on Manifolds; American Mathematical Society: Providence, RI, USA, 2002; Volume 38.
12. Grong, E.; Sommer, S. Most Probable Paths for Anisotropic Brownian Motions on Manifolds. arXiv 2021, arXiv:2110.15634.
13. Sommer, S.; Arnaudon, A.; Kuhnel, L.; Joshi, S. Bridge Simulation and Metric Estimation on Landmark Manifolds. In
Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics; Lecture Notes in Computer Science; Springer:
Berlin/Heidelberg, Germany, 2017; pp. 79–91.
14. Papaspiliopoulos, O.; Roberts, G. Importance sampling techniques for estimation of diffusion models. Stat. Methods Stoch. Differ.
Equ. 2012, 124, 311–340.
15. Jensen, M.H.; Sommer, S. Simulation of Conditioned Semimartingales on Riemannian Manifolds. arXiv 2021, arXiv:2105.13190.
16. Lyons, T.J.; Zheng, W.A. On conditional diffusion processes. Proc. R. Soc. Edinb. Sect. Math. 1990, 115, 243–255. [CrossRef]
17. Delyon, B.; Hu, Y. Simulation of Conditioned Diffusion and Application to Parameter Estimation. Stoch. Process. Appl. 2006,
116, 1660–1675. [CrossRef]
18. Schauer, M.; Van Der Meulen, F.; Van Zanten, H. Guided proposals for simulating multi-dimensional diffusion bridges. Bernoulli
2017, 23, 2917–2950. [CrossRef]
19. Marchand, J.L. Conditioning diffusions with respect to partial observations. arXiv 2011, arXiv:1105.1608.
20. van der Meulen, F.; Schauer, M. Bayesian estimation of discretely observed multi-dimensional diffusion processes using guided
proposals. Electron. J. Stat. 2017, 11, 2358–2396. [CrossRef]
21. Elworthy, D. Geometric aspects of diffusions on manifolds. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–87; Springer:
Berlin/Heidelberg, Germany, 1988; pp. 277–425.
22. Emery, M. Stochastic Calculus in Manifolds; Springer: Berlin/Heidelberg, Germany, 1989.
23. Elworthy, K.; Truman, A. The diffusion equation and classical mechanics: An elementary formula. In Stochastic Processes in
Quantum Theory and Statistical Physics; Springer: Berlin/Heidelberg, Germany, 1982; pp. 136–146.
24. Li, X.M. On the semi-classical Brownian bridge measure. Electron. Commun. Probab. 2017, 22, 1–15. [CrossRef]
25. Ndumu, M.N. Brownian Motion and the Heat Kernel on Riemannian Manifolds. Ph.D. Thesis, University of Warwick, Coventry,
UK, 1991.
26. Malliavin, P.; Stroock, D.W. Short time behavior of the heat kernel and its logarithmic derivatives. J. Differ. Geom. 1996, 44, 550–570.
[CrossRef]
27. Stroock, D.W.; Turetsky, J. Short time behavior of logarithmic derivatives of the heat kernel. Asian J. Math. 1997, 1, 17–33.
[CrossRef]
28. Thompson, J. Brownian bridges to submanifolds. Potential Anal. 2018, 49, 555–581. [CrossRef]
29. Joshi, S.; Miller, M. Landmark Matching via Large Deformation Diffeomorphisms. IEEE Trans. Image Process. 2000, 9, 1357–1370.
[CrossRef]
30. Younes, L. Shapes and Diffeomorphisms; Springer: Berlin/Heidelberg, Germany, 2010.
31. Kühnel, L.; Sommer, S.; Arnaudon, A. Differential Geometry and Stochastic Dynamics with Deep Learning Numerics. Appl.
Math. Comput. 2019, 356, 411–437. [CrossRef]
32. Stegmann, M.B.; Fisker, R.; Ersbøll, B.K. Extending and Applying Active Appearance Models for Automated, High Precision
Segmentation in Different Image Modalities. Available online: http://www2.imm.dtu.dk/pubdb/edoc/imm118.pdf (accessed
on 4 February 2022).