Keywords

1 Introduction

Noninvasive electrophysiological (EP) imaging involves the reconstruction of cardiac electrical activity from high-density body-surface electrocardiograms (ECGs) [6]. It solves an ill-posed inverse problem that deteriorates as the imaging depth increases from the epicardium to the endocardium [9]. One type of increasingly utilized regularization considers knowledge about the well-defined physiological process of cardiac electrical propagation. This is often realized in a model-constrained approach, where the optimization or statistical inference of cardiac electrical activity is constrained by a pre-defined model describing local activation/repolarization and its spatial propagation [4, 11, 12]. Earlier models include step jump functions [10], logistic functions [11], and 3D curve models [4] empirically parameterized to mimic the physiological process. Recently, more expressive cardiac EP simulation models have also been used [7, 12].

These model-constrained approaches are afflicted with a common challenge: they are controlled by high-dimensional parameters often associated with local tissue properties and the origin of electrical activation that are unknown a priori. The more expressive the model is, the more parameters it has. To fix these model parameters in optimization/inference, as is common in existing approaches [12], model errors may be introduced decreasing the accuracy of the estimated electrical activity [12]. To adapt these model parameters to the observed data, as is desired for accurate inference, is however difficult due to their high-dimensionality and nonlinear relationship with the observed ECG data [3].

In this paper, we introduce a novel model-constrained inference framework that replaces the conventional physiological models with a deep generative model that is trained to generate the spatiotemporal dynamics of transmembrane potential (TMP) from a low-dimensional set of generative factors. These generative factors can be viewed as a low-dimensional abstraction of the high-dimensional physical parameters, which allows us to efficiently adapt the prior physiological knowledge to the observed ECG data (through inference of the generative factors) for an improved reconstruction of TMP dynamics.

In specific, the presented method consists of two novel contributions. First, to obtain a generative model that is sufficiently expressive to reproduce the temporal sequence of 3D spatial TMP distributions, we adopt a novel sequence-to-sequence variational auto-encoder (VAE) [2] with cascaded long short-term memory (LSTM) networks. This VAE is trained on a large database of simulated TMP dynamics originating from various myocardial locations and with a wide range of local tissue properties. Second, once trained, the VAE decoder describes the likelihood of the TMP conditioned on a low-dimensional set of generative factors, while the encoder learns the posterior distributions of the generative factors conditioned on the training data. We utilize these two components within the Bayesian inference, and present a variation of the expectation-maximization (EM) algorithm to jointly estimate the generative factors and transmural TMP signals from observed ECG data. In a set of synthetic and real-data experiments, we demonstrate that the presented method is able to improve the accuracy of transmural EP imaging in comparison to statistical inference either constrained by a conventional physiological model [12] or without physiological constraints.

2 Generative Modeling of TMP via Sequential VAE

To learn to generate the spatiotemporal TMP sequences, we use a sequential variation of VAE [8] based on the use of LSTM networks [2].

VAE Architecture: The architecture of the sequential VAE is summarized in the red block in Fig. 1. Both the encoder and the decoder consists of two layers of LSTM, where the second layer includes separate mean and variance networks. The spatial dimension decreases from the original TMP signal \(\mathbf U \) to the latent representation \(\mathbf Z \), while the temporal relationship is modeled by the LSTMs. Note that while the random variables in a standard VAE are vectors, a sequential VAE deals with matrices. By defining the conditional distribution of a matrix as the product of distributions over its columns, we obtained the likelihood distribution \(p_{\theta }(\mathbf U |\mathbf Z )\) and the variational posterior distribution \(q_{\phi }(\mathbf Z |\mathbf U )\) as:

$$\begin{aligned} p_{\theta }(\mathbf U |\mathbf Z )=\prod _{k}{\mathcal {N}(\mathbf U _{:,k}|\mathbf M _{\theta }(\mathbf Z )_{:,k},diag(\mathbf S _{\theta }(\mathbf Z )_{:,k}))} \end{aligned}$$
(1)
$$\begin{aligned} q_{\phi }(\mathbf Z |\mathbf U )=\prod _{k}{\mathcal {N}(\mathbf Z _{:,k}|\mathbf M _{\phi }(\mathbf U )_{:,k},diag(\mathbf S _{\phi }(\mathbf U )_{:,k}))} \end{aligned}$$
(2)

where \(\mathbf M _{\phi }(\mathbf U )\) and \(\mathbf S _{\phi }(\mathbf U )\) are output from the mean and variance networks of the encoder parameterized by \(\phi \), and \(\mathbf M _{\theta }(\mathbf Z )\) and \(\mathbf S _{\theta }(\mathbf Z )\) are output from the mean and variance networks of the decoder parameterized by \(\theta \).

Fig. 1.
figure 1

Red block: VAE architecture. Green block: graphical model in inference.

VAE Training: Training of the VAE is performed by maximizing the variational lower bound on the likelihood of the training data given as:

$$\begin{aligned} \mathcal {L}_{ELB}(\theta ,\phi ; \mathbf U ^{(i)})=-KL(q_{\phi }(\mathbf Z |\mathbf U ^{(i)})||p_{\theta }(\mathbf Z ))+E_{q_{\phi }(\mathbf Z |\mathbf U ^{(i)})}(\log p_{\theta }(\mathbf U ^{(i)}|\mathbf Z )) \end{aligned}$$
(3)

where \(p_{\theta }(\mathbf Z )\) is an isotropic Gaussian prior. The calculation of the KL divergence and cross entropy loss for the presented sequential architecture is carried out in a manner similar to that described in [8]. The training data is generated by the Aliev-Panfilov (AP) model [1], simulating spatiotemporal TMP sequences originated from different ventricular locations with different tissue properties.

3 Transmural EP Imaging

The biophysical relationship between cardiac TMP, \(\mathbf {U}\) and body-surface ECG, \(\mathbf {Y}\) can be described by a linear measurement model: \( \mathbf Y =\mathbf H {} \mathbf U \), where \(\mathbf {H}\) is specific to the heart-torso model of an individual. To estimate \(\mathbf U \) from \(\mathbf Y \) is severely ill-posed and requires the regularization from additional knowledge about \(\mathbf U \).

Probabilistic Modeling of the Inverse Problem: We formulate the inverse problem in the form of statistical inference. We define the likelihood distribution of \(\mathbf Y \) given \(\mathbf U \) by assuming zero-mean measurement errors with variance \(\beta ^{-1}\):

$$\begin{aligned} p(\mathbf Y |\mathbf U , \beta )=\prod _{k}{\mathcal {N}(\mathbf Y _{:,k}|\mathbf{HU }_{:,k},\beta ^{-1}{} \mathbf I )} \end{aligned}$$
(4)

To incorporate physiological knowledge about \(\mathbf U \), we model its prior distribution conditioned on \(\mathbf Z \) using the VAE decoder with trained parameter \(\bar{\theta }\):

$$\begin{aligned} p_{\bar{\theta }}(\mathbf U |\mathbf Z )=\prod _{k}{\mathcal {N}(\mathbf U _{:,k}|\mathbf M _{\bar{\theta }}(\mathbf Z )_{:,k},diag(\mathbf S _{\bar{\theta }}(\mathbf Z )_{:,k}))} \end{aligned}$$
(5)

To further utilize the knowledge about the generative factor \(\mathbf Z \) learned by the VAE from a large training dataset, we also utilize the VAE-encoded marginal posterior distribution of \(\mathbf Z \) as its prior distribution in Bayesian inference. In specific, we approximate samples from this marginalized distribution to be Gaussian:

$$\begin{aligned} p(\mathbf Z )=\prod _{k}{\mathcal {N}(\mathbf Z _{:,k}|\bar{\varvec{Z}}_{:,k},diag(\mathbf C _{:,k}))} \end{aligned}$$
(6)

With this, we complete the statistical formulation of our problem. Our goal is to estimate the joint posterior distributions \( p(\mathbf U ,\mathbf Z |\mathbf Y ) \propto p(\mathbf Y |\mathbf U ) p(\mathbf U |\mathbf Z ) p(\mathbf Z ). \)

Inference: Due to the presence of a deep neural network, the posterior \(p(\mathbf U ,\mathbf Z |\mathbf Y )\) is analytically intractable. To address this issue, we note that conditioned on \(\mathbf Z \), the distribution of \(\mathbf U \) is Gaussian in each column; thus, \(p(\mathbf U |\mathbf Y ,\mathbf Z )\) is analytically available. We leverage this fact and employ a variant of the expectation maximization (EM) algorithm to obtain the maximum a posteriori (MAP) estimate of \(\mathbf Z \) along with the posterior distribution of \(\mathbf U \) given the MAP estimate of \(\mathbf Z \) .

E-step: Conditioned on an estimated value of \(\mathbf Z \) (say \(\hat{\mathbf{Z }}\)), we calculate \({\hat{p}(\mathbf U |\mathbf Y ,\hat{\mathbf{Z }})=}\) \({\prod _k \mathcal {N}(\mathbf U _{:,k}|\hat{\varvec{U}}_{:,k},\hat{\varvec{\varSigma }}_{:,:,k})}\), with the covariance and mean of the \(k^{th}\) column of \(\mathbf U \) as:

$$\begin{aligned} \hat{\varvec{\varSigma } }_{:,:,k}=(\beta \mathbf H ^T\mathbf H +\mathbf D _k^{-1})^{-1},\quad \quad \hat{\varvec{U}}_{:,k}=\hat{\varvec{\varSigma }} _{:,:,k}(\beta \mathbf H ^T\mathbf Y _{:,k}+\mathbf D _k^{-1}{} \mathbf m _k) \end{aligned}$$
(7)

where \(\mathbf{D }_k=diag(\mathbf S _{\theta }(\hat{\mathbf{Z }})_{:,k})\), and \(\mathbf m _k=\mathbf M _{\bar{\theta }}(\hat{\mathbf{Z }})_{:,k}\) and \(\mathbf S _{\bar{\theta }}(\hat{\mathbf{Z }})_{:,k}\) are the \(k^{th}\) column output of the VAE decoder network when \(\hat{\mathbf{Z }}\) is input to it.

M-step: Given \({\hat{p}(\mathbf U |\mathbf Y ,\hat{\mathbf{Z }})}\), we update \(\mathbf{Z }\) by maximizing \({E_{\hat{p}(\mathbf U |\mathbf Y ,\hat{\mathbf{Z }})}\log (p(\mathbf Y ,\mathbf U ,\mathbf Z ))}\)

$$\begin{aligned} \mathcal {L}=E_{\prod _k \mathcal {N}(\mathbf U _{:,k}|\hat{\varvec{U}}_{:,k},\hat{\varvec{\varSigma }}_{:,:,k})}[\log ( p_{\bar{\theta }}(\mathbf U |\mathbf Z ))]+\log ( p(\mathbf Z ))+constant \end{aligned}$$
(8)

Realizing that a complete optimization of \(\mathcal {L}\) with respect to \(\mathbf Z \) would be expensive, we instead take a few gradient descent steps towards the optimum. The gradient of the second term is analytically available. The gradient of the first term is calculated by backpropagation through the decoder network.

The EM steps iterate until convergence, at which we obtain both the MAP value of \(\mathbf Z \) and the posterior distribution of \(\mathbf U \) conditioned on \(\mathbf Z \) and \(\mathbf Y \).

Fig. 2.
figure 2

Examples of TMP signals generated by samples from two different distributions: Left- marginalized posterior density encoded by the VAE; Right- isotropic Gaussian.

4 Results

Synthetic Experiments: Synthetic experiments are carried out on two image-derived human heart-torso models. On each heart, the VAE is trained using around 850 simulated TMP signals considering approximately 50 different origins of ventricular activation in combination with 17 different tissue property configurations. As an initial study, here we focus on tissue properties representing local regions of myocardial scars with varying sizes and locations.

The presented method incorporating the trained VAE model is then tested on simulated 120-lead ECG data from three different settings, each with 20 experiments. The three settings include (1) presence of myocardial scar not included in training data, (2) origin of ventricular activation different from those used in training, and (3) both myocardial scar and activation origin not seen in training. In all experiments, the performance of the presented method is compared to 0-order Tikhonov regularization with temporal constraint (Greensite method) [5] and conventional EP model constrained inference with fixed parameters [12].

The reconstruction accuracy is measured with three metrics: (1) normalized RMSE given by the ratio of Frobenius norm of the error matrix to that of the truth TMP matrix, (2) Euclidean distance between the reconstructed and true origins of ventricular activation, and (3) Dice coefficient of the reconstructed \(S_1\) and true regions of scar \(S_2\) as = 2\(|S_1\cap S_2|\)/(\(|S_1| + |S_2|\)). In the two physiologically constrained methods, region of scar is defined based on absence or delay of activation and shortening of action potential duration; in Greensite method, since the reconstructed signal no longer preserves the temporal shape of TMP, the region of scar is defined based on the peak amplitude of the signal.

Computational Cost: Training of the VAE takes approximately 40 h on a 4 GB Nvidia Quadro P1000 GPU. Generation of training data for each heart takes about 7 h and inference around 30 min on Quadcore CPU.

TMP Generation: Fig. 2 shows examples of local TMP signals generated by the trained VAE decoder against TMP signals simulated by the AP model [1]. Note that, when generating from a isotropic Gaussian (Fig. 2 right), noisy rather than meaningful TMP signals may also be generated. In comparison, when sampling from the approximated posterior distribution of \(\mathbf Z \) as described in Eq. (6), the generated signals closely resemble the simulated TMP signals.

Fig. 3.
figure 3

Snapshots of early TMP pattern reconstructed by the three methods in comparison to the ground truth. The origin of activation is noted on the left in each row.

Fig. 4.
figure 4

Spatial distributions of scar tissues and temporal TMP signals obtained by the three methods in comparison to the ground truth.

Table 1. Quantitative accuracy of the three methods in three settings. Test data is simulated with (1) Top: scar not in VAE training, (2) Middle: activation origin not in training, (3) Bottom: both myocardial scar and activation origin not in training.

Imaging TMP from Various Origins: Fig. 3 shows a snapshot from the early stage of ventricular activation reconstructed by the three methods in comparison to the ground truth. Since the EP model constrained approach assumes general sinus-rhythm activation, it introduces model error that incorrectly dominates the results. The simple Greensite method, free from erroneous model assumption, actually does a better job in comparison. By adapting model generative factors to the data, the presented method demonstrates a significantly improved ability to reconstruct TMP sequence resulting from unknown origins.

Imaging TMP at the Presence of Myocardial Scar: Fig. 4 shows the spatial distribution of scar tissue obtained by the three different methods, along with temporal TMP signals reconstructed in healthy and scar regions, in comparison to the ground truth. Without prior physiological knowledge, the Greensite method is not able to preserve the temporal TMP shape, resulting in high RMSE error as shown in Table 1. By thresholding the maximum amplitude of the reconstructed signals, the identified region of scar has high false positives and resembles poorly with the ground truth. The EP model constrained approach does a better job in retaining the temporal TMP shape. However, without prior knowledge about the scar, the model error again affects the accuracy of TMP reconstruction, especially in the early stage of activation when a smaller amount of ECG data is available for correcting the model error. The presented method, in comparison, is able to recognize the presence of scar tissue, adapting the physiological constraint for improved TMP reconstructions and scar identifications.

Summary: Table 1 summarizes the quantitative comparison of the three methods tested in the three settings as described earlier. Although the test cases were not seen by the VAE during training, the proposed method shows a significant improvement in inverse reconstruction (paired t-test, p < 0.001) when compared with the other two methods in all settings and metrics except with Euclidean distance using Greensite method, where improvement is only marginal. It shows the importance of physiological knowledge and its adaptation to observed data during model-constrained inference.

Fig. 5.
figure 5

Real-data experiments: regions of scar tissues identified by the presented method and conventional EP model constrained method, in comparison to bipolar voltage data (red: scar core; green: scar border; purple: healthy tissue).

Real Data Experiments: Two case studies are performed on real-data from patients who underwent catheter ablation due to scar-related ventricular arrhythmia. Spatiotemporal TMP is reconstructed from 120-lead ECG data using the presented method and the EP model constrained method. In Fig. 5, scar regions (red regions with low voltage) identified from the reconstructed TMP are compared with scar regions (red regions) in the in-vivo bipolar voltage data. In both cases, while the scar tissue identified by two methods are generally in similar locations, the presented method shows less false positives and higher qualitative consistency with bipolar voltage maps.

5 Discussion and Conclusions:

To our knowledge, this is the first work that integrates a generative network learned from numerous examples into a statistical inference framework to allow the adaptation of prior physiological knowledge via a small number of generative factors. The results show the ability of this concept to improve model-constrained inference. Since the present formulation is in a personalized setting, we intend to extend this architecture to learn a geometry-invariant generative model that can be trained on multiple heart models and applied on a new subject.