User Guidance for Point-of-Care Echocardiography Using a Multi-task Deep Neural Network

Grzegorz Toporek¹⁶,
Raghavendra Srinivasa Naidu¹⁶,
Hua Xie¹⁶,
Adriana Simicich¹⁷,
Tony Gades¹⁷ &
…
Balasundar Raju¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11768))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

8188 Accesses

Abstract

Echocardiography is a challenging sonographic examination with high user-dependence and the need for significant training and experience. To improve the use of ultrasound in emergency management, especially by non-expert users, we propose a solely image-based machine-learning algorithm that does not rely on any external tracking devices. This algorithm guides the motion of the probe towards clinically relevant views, such as an apical four-chamber or long axis parasternal view, using a multi-task deep convolutional neural network (CNN). This network was trained on 27 human subjects using a multi-task learning paradigm to: (a) detect and exclude ultrasound frames where quality is not sufficient for the guidance, (b) identify one of three typical imaging windows, including the apical, parasternal, and subcostal to guide the user through the exam workflow, and (c) predict 6-DOF motion of the transducer towards a target view i.e. rotational and translational motion. And besides that, by deploying relatively lightweight architecture we ensured the operation of the algorithm at approximately 25 frames per second on a commercially available mobile device. Evaluation of the system on three unseen human subjects demonstrated that the method can guide an ultrasound transducer to a target view with an average rotational and translation accuracy of 3.3 ± 2.6° and 2.0 ± 1.6 mm respectively, when the probe is close to the target (<5 mm). We believe that this accuracy would be sufficient to find the image on which the user can make quick, qualitative evaluations such as the detection of pericardial effusion, cardiac activity (squeeze, mitral valve motion, cardiac arrest, etc.), as well as performing quantitative calculations such as ejection fraction.

You have full access to this open access chapter, Download conference paper PDF

Quality Assessment of Echocardiographic Cine Using Recurrent Neural Networks: Feasibility on Five Standard View Planes

Echocardiographic Image Quality Assessment Using Deep Neural Networks

Deep learning interpretation of echocardiograms

Article Open access 24 January 2020

Keywords

1 Introduction

Evidence of the benefits of point-of-care ultrasound (POCUS) continues to grow. For instance, ultrasound provides emergency physicians access to real-time clinical information that can help to reduce time to diagnosis [1]. Time is always a precious resource in the emergency department. Fast and accurate ultrasound examinations, particularly examination of the heart, can help avoid severe complications or accelerate transfer of the patient to specialized departments for more thorough cardiac evaluation. Typically emergency physicians perform a focused cardiac ultrasound (FOCUS) that can be used to assess the presence of pericardial effusion and tamponade [2], left ventricular ejection, ventricular equality, exit (aortic root diameter), and entrance (inferior vena cava, diameter and respirophasic variation) [3]. Typically clinicians use one or more of three imaging windows and five views: parasternal long-axis (PLAX), parasternal short-axis (PSSA), apical four-chamber (A4C), subcostal long-axis (SCLA), and subcostal four-chamber (SC4C). Additionally, an apical two-chamber (A2C) view might be used to evaluate all parts of the myocardium. Due to time constraints in the emergency department, a diagnosis will usually be made from two out of the five views, if patient mobility and habitus allows. Finding these five target views is particularly challenging for untrained physicians; it typically requires significant training and experience.

Scanning Assistant:

To assist less experienced physicians for rapid echocardiographic assessment and improve the use of ultrasound in emergency care, we propose an acquisition guidance system (Fig. 1a–b) that enables accurate placement of the ultrasound probe at the right position and orientation with respect to the heart anatomy. An intuitive user interface (Fig. 1a) provides acquisition assistance at all three commonly used imaging windows (apical, parasternal, and subcostal) and the majority of target views specified by the FOCUS protocol [1], including PLAX, PSSA, A4C, A2C and SC4C. Importantly, this navigation system is solely image-based, and does not rely on any external tracking devices.

Deep Learning for Ultrasound:

Image-based guidance in transthoracic echocardiography is non-trivial due to the likely presence of reverberation clutter, acoustic shadow, cardiac and respiratory motion as well as patient’s anatomical and physiological variability. Deep convolutional neural networks (CNN) can be trained to extract high-level features with large spatial context, making them applicable to such complex problems. Consequently, deep learning has significant advantages over standard machine learning methods. Previous methods developed for ultrasound images that required manual selection of features [4] were recently out-performed by deep learning in tasks such as view classification [5, 6] or segmentation [7].

Here we propose an fully end-to-end solution with a multi-task CNN model, which (a) assesses whether the quality of the image is sufficient for guidance, (b) identifies one of three typical imaging windows, including apical, parasternal, and subcostal, and (c) predicts motion of the transducer towards the desired imaging plane (see Fig. 1c).

Key Contributions:

As far as the authors are aware, this paper is the first to propose a solely image-based user guidance system for point-of-care transthoracic echocardiography ready to be deployed on commercial mobile ultrasound scanners, such as Lumify, Philips. This fully end-to-end solution uses a multi-task deep convolutional neural network to predict relative motion of the transducer towards the diagnostically relevant views, as well as assesses image quality and identifies one of the commonly used imaging windows. The key contributions include:

a new technique dedicated to transthoracic echocardiography to achieve entirely image-based navigation with millimeter-level accuracy,
a method that quantitatively guides the positioning of the transducer at five target views (PLAX, PSSA, A4C, A2C, and SC4C) from three different imaging windows (apical, parasternal, and subcostal),
a new light-weight multi-task deep convolutional neural network architecture that regresses both 3-DOF rotation and 3-DOF translation, as well as classifies ultrasound images based on the quality and imaging window,
a solution with potential clinical deployment on a mobile device or similar hardware with limited memory and computing capabilities.

2 Methods

2.1 Data Collection and Labeling

All datasets were obtained from healthy human subjects (N = 30) using a commercial handheld, mobile, USB-based ultrasound system (Lumify, Philips) by a well-trained sonographer. Each loop consisted of a large number of frames. These frames were acquired at all three imaging windows, including apical, parasternal, and subcostal; and covered all five views defined in the FOCUS protocol [3] (i.e. PLAX, PSSA, A4C, A2C, and SC4C), see Fig. 2c. Each cardiac ultrasound frame within a dataset was automatically labelled using a custom-made data acquisition system based on optical tracking. A schematic representation of our acquisition system is shown in Fig. 2a. For ease of annotations, each acquisition was started at one of three reference target views (A4C, PLAX, SC4C) that implicitly defined three standardized coordinate systems at each acoustic window. Positions of remaining frames was determined relative to this coordinate systems. For simplicity, guidance accuracy was evaluated only for these reference views. Remaining target views (A2C, PSSA) were identified by an expert echocardiographer. Importantly, optical tracking was only used to collect the ground truth data but never during the application of the system.

Two rigid markers consisting of four retroreflective spheres (NDI Medical) were attached to the ultrasound probe via a custom-made adapter as well as the patient’s chest using an adjustable belt (see Fig. 2b). The transformation between probe and image ($ ^{{\text{probe}}} T_{{\text{image}}} $) was obtained using a custom-made wire-based ultrasound phantom, similar to the one described in [8]. A patient marker, which establishes the heart coordinate system, was used to account for unexpected motion of the heart with respect to the tracking system. Pose $ T \in SE\left( 3 \right) $ of both probe ($ ^{\text{tracker}} T_{\text{probe}} ) $ and patient marker ($ ^{\text{tracker}} T_{\text{patient}} $) was estimated via a stereoscopic optical camera (Polaris Vega, NDI Medical), and synchronized with the ultrasound images acquired with the portable ultrasound device. All images were then labelled with 3D rigid transformations calculated relative to the reference image in the heart coordinate system, as listed below:

$$ {}_{ }^{\text{patient}} T_{\text{image}} = ({}_{ }^{\text{tracker}} T_{\text{patient}} )^{ - 1} \cdot {}_{ }^{\text{tracker}} T_{\text{probe}} \cdot {}_{ }^{\text{probe}} T_{\text{image}} $$

(1)

Expert echocardiographer identified all low-quality (LQ) frames in each data set. We considered LQ images as those either with poor acoustic coupling or containing organs different than the heart. The remaining frames, accounting for the three different imaging windows, were considered high-quality (HQ), i.e. sufficient for our algorithm to extract features and make predictions. Stored datasets were divided into two separate sets: (a) development dataset (N = 27 subjects; 590,000 frames) from which 80% of cases were randomly chosen to train the weights of the CNN and 20% for validation, and (b) test dataset (N = 3 subjects; 21,000 frames, including: 10,000, 7,000, 1,500, and 2,500 frames for apical, parasternal, subcostal, and LQ class respectively) consisting of data points the model was not trained on. Accuracy of the algorithm was evaluated only on unseen test cases in order to determine the generalizability of the model. Tracking accuracy was evaluated only on HQ frames by calculating average absolute angular errors along each axis (rotation), and mean absolute distance (translation) to the target. Classification performance was assessed by the area under the receiver operator characteristic (ROC) curves (AUC).

2.2 Model Development

A primary feature extractor was a CNN model – broadly known as a SqueezeNet – with 8 so-called fire modules followed by one convolutional layer and global average pooling [9]. This CNN architecture was designed for limited-memory systems and to provide high energy efficiency on mobile devices [10]. This primary CNN simultaneously predicts rotation and translation for all five target views as well as classifies three acoustic windows, thus sharing features among all these tasks. For the rotation and translation tasks, we added two separate regression layers with a π tanh activation function after the primary feature extractor, as described in [11, 12]. For the classification task, a softmax classification layer was added after global average pooling. The total loss function was defined as:

$$ {\text{loss}}_{\text{total}} = \lambda \cdot {\text{loss}}_{\text{rotation}} +\upalpha \cdot {\text{loss}}_{\text{translation}} +\upgamma \cdot {\text{loss}}_{\text{classification}} $$

(2)

$$ {\text{loss}}_{\text{rotation}} = \cos^{ - 1} \left[ {\frac{{tr\left( {\hat{R}^{T} R} \right) - 1}}{2}} \right] $$

(3)

$$ {\text{loss}}_{\text{translation}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left( {\hat{t}_{i} - t_{i} } \right)^{2} } \left| {\left\langle {\frac{{\hat{t}}}{{\left\| {\hat{t}} \right\|}},\frac{t}{{\left\| {{t}} \right\|}}} \right\rangle - 1} \right| $$

(4)

$$ {\text{loss}}_{\text{classification}} = - \sum\nolimits_{c = 1}^{M} {{\text{y}}_{c} { \log }\left( {{\text{p}}_{c} } \right)} $$

(5)

where $ \lambda , \alpha , \gamma $ are hyperparameters used to balance between the rotation, translation, and classification loss respectively; $ R \in SO\left( 3 \right) $ stands for a rotation matrix; $ t = \left[ {t_{i} \ldots t_{N} } \right]^{T} $, and $ N = 3 $, stands for translation vector; $ \left\langle \cdot \right\rangle $ represents inner product of two vectors and $ {\text{loss}}_{\text{classification}} $ is a cross entropy loss for multi-classification task, $ M = 4 $.

The model was trained in TensorFlow using RMSprop optimizer with a batch size of 32, an initial learning rate of 0.0001, and decayed every 4.7M iterations with an exponential rate of 0.5. Batch normalization, weight decay of 0.0005, early stopping criteria were used as regularization techniques. All loss function hyperparameters $ \left( {\lambda , \alpha , \gamma } \right) $ were set to 1. Ultrasound images were converted from Cartesian to Polar space and randomly augmented during the training using various ultrasound-specific techniques, including injection of a reverberation clutter, alteration of penetration depth, change of gain and aspect ratio as well as post-modification of TGC curve.

3 Results

In the vicinity of the target views (see Fig. 2c), the average absolute angular accuracy was 2.5 ± 1.4°, 2.4 ± 1.8°, and 5.5 ± 5.0° around the x, y, z axes respectively (see Table 1).

Table 1. Accuracy of the system as a function of distance $ d $ and angle $ a_{i} $ around each axis (x, y, z), where $ i \in \left\{ {1 \ldots 3} \right\} $; $ \bar{a} $ represents the average absolute angular error, and $ \bar{d} $ represents the absolute translation error with respect to three target views: A4C, PLAX, and SC4C.

Full size table

The average absolute translation accuracy was 2.0 ± 1.6 mm. The overall accuracy decreased when the distance to the target position increased. For instance, the predicted translational inaccuracy measured at the distance above 20 mm from the target view was significantly higher (p-value < 0.0001, unpaired, two-tailed t test) than below 5 mm, 5.6 ± 4.7 mm vs. 2.0 ± 1.6 mm respectively. The average classification accuracy was 98% and 89% for imaging window identification and low-quality frame detection respectively (see Fig. 3b–c). ROC curves with associated AUCs are shown in Fig. 3a.

4 Conclusion and Future Work

A solely image-based scan guidance system for point-of-care transthoracic echocardiography was developed and evaluated on unseen independent in vivo datasets. Our deep learning-based algorithm was trained using a multi-task learning paradigm. A single neural network was used to (a) detect and exclude ultrasound frames where quality is not sufficient for the guidance, (b) identify one of three typical imaging windows, including the apical, parasternal, and subcostal, to guide the user through the exam workflow, and (c) predict 6-DOF motion of the transducer towards clinically-relevant views, such as the four-chamber or long-axis views. To begin with, finding an optimal acoustic window to image the heart can be challenging, especially for technically difficult patients. Our system could possibly accelerate this phase of the examination by providing an objective measure of image quality; herein we demonstrated 95% accuracy for high-quality image classification. Moreover, it was demonstrated that the ultrasound probe could be guided to three pre-defined reference target views with an average rotational accuracy of 3.3 ± 2.6°, when the probe was close to the target (<5 mm). The lowest rotation accuracy was shown around the z-axis; mostly because angles for this axis have the largest span, ranging from 0 to π. This accuracy may be sufficient to perform all assessments relevant in the acute/emergency setting, including presence of a pericardial effusion, left ventricular ejection, ventricular equality, as well as in recognizing cardiac arrest. We noticed that overall system accuracy decreased with the distance to the target. For instance, accuracy decreased to 5.4 ± 4.2° and 5.6 ± 4.7 mm, for rotation and translation respectively, when the distance to the target exceeded 20 mm. This behavior could be attributed to smaller coverage of these regions by the training instances. Due to the fact that adjustment of the probe position is performed in a step-wise iterative manner this behavior is not considered a limitation of our approach. In the future, a series of former predictions provided by the network using recurrent layers, such as Long Short-term Memory (LSTM) units could further enhance the accuracy away from the target location [6].

In addition, our CNN architecture had only 1.2M parameters and required 5 MB of the storage memory. Hence, our method could be readily deployed on commercial portable ultrasound systems. Initial tests on a premium mobile device – with TensorFlow Lite and hardware acceleration enabled – demonstrated an average frame rate of 25 Hz.

Despite promising results, the main limitation of this study is the small training dataset size and inclusion of only healthy subjects, which may limit the performance of the algorithm for technically difficult patients or patients with abnormal physiological conditions. Further work would include adding data from a larger number of subjects, including patients with impaired cardiac function.

References

American College of Emergency Physicians: Ultrasound guidelines: emergency, point-of-care, and clinical ultrasound guidelines in medicine. Ann. Emerg. Med. 69, e27–e54 (2017). https://doi.org/10.1016/j.annemergmed.2016.08.457
Article Google Scholar
Plummer, D., Brunette, D., Asinger, R., Ruiz, E.: Emergency department echocardiography improves outcome in penetrating cardiac injury. Ann. Emerg. Med. 21, 709–712 (1992). https://doi.org/10.1016/S0196-0644(05)82784-2
Article Google Scholar
Kennedy Hall, M., Coffey, E.C., Herbst, M., et al.: The “5Es” of emergency physician-performed focused cardiac ultrasound: a protocol for rapid identification of effusion, ejection, equality, exit, and entrance. Acad. Emerg. Med. 22, 583–593 (2015). https://doi.org/10.1111/acem.12652
Article Google Scholar
Lempitsky, V., Verhoek, M., Noble, J.A., Blake, A.: Random forest classification for automatic delineation of myocardium in real-time 3D echocardiography. In: Ayache, N., Delingette, H., Sermesant, M. (eds.) FIMH 2009. LNCS, vol. 5528, pp. 447–456. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01932-6_48
Chapter Google Scholar
Madani, A., Arnaout, R., Mofrad, M.: Fast and accurate view classification of echocardiograms using deep learning. Nat. Digit. Med. 1–8 (2018). https://doi.org/10.1038/s41746-017-0013-1
Van Woudenberg, N., et al.: Quantitative echocardiography: real-time quality estimation and view classification implemented on a mobile android device. In: Stoyanov, D., et al. (eds.) POCUS/BIVPCS/CuRIOUS/CPM -2018. LNCS, vol. 11042, pp. 74–81. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01045-4_9
Chapter Google Scholar
Leclerc, S., Smistad, E., Pedrosa, J., et al.: Deep learning for segmentation using an open large-scale dataset in 2D echocardiography. IEEE Trans. Med. Imaging 1 (2019). https://doi.org/10.1109/TMI.2019.2900516
Article Google Scholar
Chen, T.K., Abolmaesumi, P., Thurston, A.D., Ellis, R.E.: Automated 3D freehand ultrasound calibration with real-time accuracy control. In: Larsen, R., Nielsen, M., Sporring, J. (eds.) MICCAI 2006. LNCS, vol. 4190, pp. 899–906. Springer, Heidelberg (2006). https://doi.org/10.1007/11866565_110
Chapter Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size, pp. 1–13. arXiv:160207360v4 (2017)
Sehgal, A., Kehtarnavaz, N.: Guidelines and benchmarks for deployment of deep learning models on smartphones as real-time apps, pp. 1–10. arXiv:190102144 (2019)
Article Google Scholar
Toporek, G., Wang, H., Balicki, M., Xie, H.: Autonomous image-based ultrasound probe positioning via deep learning. In: Hamlyn Symposium on Medical Robotics (2018)
Google Scholar
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Philips Research North America, Cambridge, USA
Grzegorz Toporek, Raghavendra Srinivasa Naidu, Hua Xie & Balasundar Raju
Philips Healthcare, Bothell, USA
Adriana Simicich & Tony Gades

Authors

Grzegorz Toporek
View author publications
You can also search for this author in PubMed Google Scholar
Raghavendra Srinivasa Naidu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Xie
View author publications
You can also search for this author in PubMed Google Scholar
Adriana Simicich
View author publications
You can also search for this author in PubMed Google Scholar
Tony Gades
View author publications
You can also search for this author in PubMed Google Scholar
Balasundar Raju
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Grzegorz Toporek .

Editor information

Editors and Affiliations

University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Dinggang Shen
University of Georgia, Athens, GA, USA
Tianming Liu
Western University, London, ON, Canada
Terry M. Peters
Yale University, New Haven, CT, USA
Lawrence H. Staib
University of Strasbourg, Illkirch, France
Caroline Essert
United Imaging Intelligence, Shanghai, China
Sean Zhou
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Pew-Thian Yap
Western University, London, ON, Canada
Ali Khan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Toporek, G., Naidu, R.S., Xie, H., Simicich, A., Gades, T., Raju, B. (2019). User Guidance for Point-of-Care Echocardiography Using a Multi-task Deep Neural Network. In: Shen, D., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science(), vol 11768. Springer, Cham. https://doi.org/10.1007/978-3-030-32254-0_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-32254-0_35
Published: 10 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32253-3
Online ISBN: 978-3-030-32254-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)