Abstract
We introduce a deep learning image segmentation framework that is extremely robust to missing imaging modalities. Instead of attempting to impute or synthesize missing data, the proposed approach learns, for each modality, an embedding of the input image into a single latent vector space for which arithmetic operations (such as taking the mean) are well defined. Points in that space, which are averaged over modalities available at inference time, can then be further processed to yield the desired segmentation. As such, any combinatorial subset of available modalities can be provided as input, without having to learn a combinatorial number of imputation models. Evaluated on two neurological MRI datasets (brain tumors and MS lesions), the approach yields state-of-the-art segmentation results when provided with all modalities; moreover, its performance degrades remarkably gracefully when modalities are removed, significantly more so than alternative mean-filling or other synthesis approaches.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In medical image analysis, image segmentation is an important task and is primordial to visualize and quantify the severity of the pathology in clinical practice. Multi-modality imaging provides complementary information to discriminate specific tissues, anatomies and pathologies. Numerous automatic approaches have been developed to speed up medical image segmentation such as Multi-atlas based approaches [4] and model-based approaches [12].
Both strategies are typically optimized for a specific set of multi-modal images and usually require these modalities to be available. In clinical settings, image acquisition and patient artifacts, among other hurdles, make it difficult to fully exploit all the modalities; as such, it is common to have one or more modalities to be missing for a given instance. This problem is not new, and the subject of missing data analysis has spawned an immense literature in statistics (e.g. [13]). In medical imaging, a number of approaches have been proposed, some of which require to re-train a specific model with the missing modalities or to synthesize them [6]. Synthesis can improve multi-modal classification by adding information of the missing modalities in the context of a simple classifier such as random forests [11]. Approaches to imitate with fewer features a classifier trained with a complete set of features have also been proposed [7]. Nevertheless, it should stand to reason that a more complex model should be capable of extracting relevant features from just the available modalities without relying on artificial intermediate steps such as imputation or synthesis.
This paper proposes a deep learning framework (HeMIS) that can segment medical images from incomplete multi-modal datasets. Deep learning [3] has shown an increasing popularity in medical image processing for segmenting but also to synthesize missing modalities [11]. Here, the proposed approach learns, separately for each modality, an embedding of the input image into a latent space. In this space, arithmetic operations (such as computing first and second moments of a collection of vectors) are well defined and can be taken over the different modalities available at inference time. These computed moments can then be further processed to estimate the final segmentation. This approach presents the advantage of being robust to any combinatorial subset of available modalities provided as input, without the need to learn a combinatorial number of imputation models.
2 Method
2.1 Hetero-Modal Image Segmentation
Typical convolutional neural network (CNN) architectures take a multiplane image as input and process it through a sequence of convolutional layers (followed by nonlinearities such as \(\text {ReLU}(\cdot ) \equiv \max (0,\cdot )\)), alternating with optional pooling layers, to yield a per-pixel or per-image output [3]. In such networks every input plane is assumed to be present within a given instance: since the very first convolutional layer mixes input values coming from all planes, any missing plane introduces a bias in the computation that the network is not equipped to deal with.
We propose an approach wherein each modality is initially processed by its own convolutional pipeline, independently of all others. After a few independent stages, feature maps from all available modalities are merged by computing mapwise statistics such as the mean and the variance, quantities whose expectation does not depend on the number of terms (i.e. modalities) that are provided. After merging, the mean and variance feature maps are concatenated and fed into a final set of convolutional stages to obtain network output. This is illustrated in Fig. 1. In this procedure, each modality contributes a separate term to the mean and variance; in contrast to a vanilla CNN architecture, a missing modality does not throw this computation off: the mean and variance terms simply become estimated with larger uncertainty. In seeking to be robust to any subset of missing modalities, we call this approach hetero-modal rather than multi-modal, recognizing that in addition to taking advantage of several modalities, it can take advantage of a diverse, instance-varying, set of modalities. In particular, it does not require that a “least common denominator” modality be present for every instance, as sometimes needed by common imputation methods.
Let \(k \in \mathcal {K} \subseteq \{1, \ldots , N\}\) denote a modality within the set of available modalities for a given instance, and \(M_k\) represent the image of the k-th modality. For simplicity, in this work we assume 2D data (e.g. a single slice of a tomographic image), but it can be extended in an obvious way to full 3D sections. As shown on Fig. 1, HeMIS proceeds in three stages:
1. Back End: In our implementation, this consists of two convolutional layers with ReLU, the second followed with a (2, 2) max-pooling layer, denoted respectively \(C_k^{(1)}\) and \(C_k^{(2)}\). To ensure that the output layer consists of the same number of pixels as the input image, the convolutions are zero-padded and the stride for all operations (including max-pooling) is 1. In particular, pooling with a stride of 1 does not downsample, but simply “thickens” the feature maps; this is found to add some robustness to the results. The number of feature maps in each layer is given in Fig. 1. Let \(C_{k,\ell }^{(j)}\) be the the \(\ell \)-th feature map of \(C_k^{(j)}\).
2. Abstraction Layer: Modality fusion is computed here, as first and second moments across available modalities in \(C^{(2)}\), separately for each feature map \(\ell \),
with \(\widehat{\mathrm {Var}}_{\ell }[C^{(2)}]\) defined to be zero if \(|\mathcal {K}|=1\) (a single available modality).
3. Front End: Finally the front end combines the merged modalities to produce the final model output. In our implementation, we concatenate all \(\widehat{\mathrm {E}}\left[ C^{(2)}\right] \) and \(\widehat{\mathrm {Var}}\left[ C^{(2)}\right] \) feature maps, pass them through a convolutional layer \(C^{(3)}\) with ReLU activation, to finish with a final layer \(C^{(4)}\) that has as many feature maps as there are target segmentation classes. The pixelwise posterior class probabilities are given by applying a softmax function across the \(C^{(4)}\) feature maps, and a full image segmentation is obtained by taking the pixelwise most likely posterior class. No further postprocessing on the resulting segment classes (such as smoothing) is done.
3 Data and Implementation Details
We studied the HeMIS framework on two neurological pathologies: Multiple Sclerosis (MS) with the MS Grand Challenge (MSGC) and a large Relapsing Remitting MS (RRMS) cohort, as well as glioma with the Brain Tumor Segmentation (BRATS) dataset [8].
MS MSGC: The MSGC dataset [10] provides 20 training MR cases with manual ground truth lesion segmentation and 23 testing cases from the Boston Children’s Hospital (CHB) and the University of North Carolina (UNC). We downloadedFootnote 1 the co-registered T1W, T2W, FLAIR images for all 43 cases as well as the ground truth lesion mask images for the 20 training cases. While lesions masks for the 23 testing cases are not available for download, an automated system is available to evaluate the output of a given segmentation algorithm.
RRMS: This dataset is obtained from a multi-site clinical study with 300 RRMS patients (mean age 37.5 yrs, SD 10.0 yrs). Each patient underwent an MRI that included FLAIR, T1W, T2W and T1 post-contrast (T1C) images.
BRATS. The BRATS-2015 dataset contains 220 subjects with high grade and 54 subjects with low grade tumors. Each subject contains four MR modalities (FLAIR, T1W, T1C and T2) and comes with a voxel level segmentation ground truth of 5 labels: healthy, necrosis, edema, non-enhancing tumor and enhancing tumor. As done in [8], we transform each segmentation map into 3 binary maps which correspond to 3 tumor categories, namely; Complete (which contains all tumor classes), Core (which contains all tumor subclasses except “edema”) and Enhancing (which includes the “enhanced tumor” subclass). For each binary map, the Dice Similarity Coefficient (DSC) is calculated [8].
BRATS-2013 contains two test datasets; Challenge and Leaderboard. The Challenge dataset contains 10 subjects with high grade tumors while the Leaderboard dataset contains 15 subjects with high grade tumors and 10 subject with low grade tumors. There are no ground truth provided for these datasets and thus quantitative evaluation can be achieved via an online evaluation system [8]. In our experiments we used Challenge and Leaderboard datasets to compare the HeMIS segmentation performance to the state-of-the-art, when trained on all modalities. To deal with class imbalance, we adopt the patch-wise training procedure mentioned in [5]. We make the HeMIS architecture robust to missing modalities by randomly dropping any number for a given training example. We refer to this training scheme as pseudo-curriculum training.
4 Experiments and Results
We first validate HeMIS performance against state-of-the-art segmentation methods on the two challenge datasets: MSGC and BRATS. Since the test data and the ranking table for BRATS 2015 are not available, we submitted results to BRATS 2013 challenge and leaderboard. These results are presented in Table 1.Footnote 2 As we observe, HeMIS outperforms Tustison et al. [12], the winner of the BRATS 2013 challenge, on most tumor region categories.
The MSGC dataset illustrates a direct application of HeMIS flexibility as only three modalities (T1W, T2W and FLAIR) are provided for a small training set. Therefore, given the small number of subjects, we first trained HeMIS on RRMS dataset with four modalities and fine-tuned on MSGC. Our results were submitted to the MSGC websiteFootnote 3, with a resuts summary appearing in Table 2. The MSGC segmentation results include three other supervised approaches; when compared to them, HeMIS obtains highly competitive results with a combined score of 83.2 %, where 90.0 % would represent human performance given inter-rater variability.
The main advantage of HeMIS lies in its ability to deal with missing modalities, specifically when different subjects are missing different modalities. To illustrate the model’s flexibility in such circumstances, we compare HeMIS performance to two common approaches to deal with random missing modalities. The first, mean-filling, is to replace a missing modality by the modality’s mean value. In our case since all means are zero by construction, replacing a missing modality by zeros can be viewed as imputing with the mean. The second approach is to train a multi-layer perceptron (MLP) to predict the expected value of specific missing modality given the available ones. Since neural networks are generally trained for a unique task, we need to train 28 different MLPs (one for each \(\circ \) in Table 3 for a given dataset) to account for different possibilities of missing modalities. We used the same MLP architecture for all these models, which consists of 2 hidden layers with 100 hidden units each, trained to minimize the mean squared error.
Table 3 shows the DSC for this experiment on the test set. On the BRATS dataset, for the Core category, HeMIS achieves the best segmentation in almost all cases (14 out of 15) and for the Complete and Enhancing categories it leads in most cases (10 and 9 cases out of 15 respectively). Also, the mean-filling approach hardly outperforms HeMIS or MLP-imputation. These results are consistent with the MS lesion segmentation dataset, where HeMIS outperforms other imputation approaches in 9 out of 15 cases. In scenarios where only one or two modalities are missing, while both HeMIS and MLP-imputation obtain good results, HeMIS outperforms the latter in most cases on both datasets. On BRATS, when missing 3 out of 4 modalities, HeMIS outperforms the MLP in a majority of cases. Moreover, whereas the HeMIS performance only gradually drops as additional modalities become missing, the performance drop for MLP-imputation and mean-filling is much more severe. On the RRMS cohort, the MLP-imputation appears to obtain slightly better segmentations when only one modality is available.
Although it is expected that tumor sub-label segmentations should be less accurate with fewer modalities, we should still hope for the model to report a sensible characterization of the tumor “footprint”. While MLP and mean-filling fail in this respect, HeMIS quite well achieves this goal by outperforming in almost all cases of the Complete and Core tumor categories. This can also be seen in Fig. 2 where we show how adding modalities to HeMIS improves its ability to achieve a more accurate segmentation. From Table 3, we can also infer that the FLAIR modality is the most relevant for identifying the Complete tumor while T1C is the most relevant for identifying Core and Enhancing tumor categories. On the RRMS dataset, HeMIS results are also seen to degrade slower than the other imputation approaches, preserving good segmentation when modalities go missing. Indeed, as seen in Fig. 2, even though with FLAIR alone HeMIS already produces good segmentations, it is capable of further refining its results when adding modalities, by removing false positives and improving outlines of the correctly identified lesions or tumor.
5 Conclusion
We have proposed a new fully automatic segmentation framework for heterogenous multi-modal MRI using a specialized convolutional deep neural network. The embedding of the multi-modal CNN back-end allows to train and segment datasets with missing modalities. We carried out an extensive validation on MS and glioma and achieved state-of-the art segmentation results on two challenging neurological pathology image processing tasks. Importantly, we contrasted the graceful performance degradation of the proposed approach as modalities go missing, compared with other popular imputation approaches, which it achieves without requiring training specific models for every potential missing modality combination. Future work should concentrate on extending the approach to broader modalities outside of MRI, such as CT, PET and ultrasound.
Notes
- 1.
- 2.
Note that the results mentioned in Table 1 are from methods competing in the BRATS 2013 challenge for which a static table is provided at https://www.virtualskeleton.ch/BraTS/StaticResults2013..
- 3.
References
Brosch, T., Yoo, Y., Tang, L.Y.W., Li, D.K.B., Traboulsee, A., Tam, R.: Deep convolutional encoder networks for multiple sclerosis lesion segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 3–11. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24574-4_1
Geremia, E., Menze, B.H., Ayache, N.: Spatially adaptive random forests, pp. 1344–1347 (2013)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press, Cambridge (2016)
Guizard, N., Coupé, P., Fonov, V.S., Manjón, J.V., Arnold, D.L., Collins, D.L.: Rotation-invariant multi-contrast non-local means for ms lesion segmentation. NeuroImage Clin. 8, 376–389 (2015)
Havaei, M., Davy, A., Warde-Farley, D., Biard, A., Courville, A., Bengio, Y., Pal, C., Jodoin, P.M., Larochelle, H.: Brain tumor segmentation with deep neural networks. arXiv preprint (2015). arXiv:1505.03540
Hofmann, M., Steinke, F., Scheel, V., Charpiat, G., et al.: MRI-based attenuation correction for PET/MRI: a novel approach combining pattern recognition and atlas registration. J. Nucl. Med. 49(11), 1875–1883 (2008)
Hor, S., Moradi, M.: Scandent Tree: a random forest learning method for incomplete multimodal datasets. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 694–701. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9_85
Menze, B., Jakab, A., Bauer, S., Kalpathy-Cramer, J., Farahani, K., Kirby, J.E.A.: The multimodal brain tumor image segmentation benchmark (BRATS). IEEE TMI 34(10), 1993–2024 (2015)
Souplet, J., Lebrun, C., Ayache, N., Malandain, G.: An automatic segmentation of T2-FLAIR multiple sclerosis lesions, 7 2008
Styner, M., Lee, J., Chin, B., Chin, M., Commowick, O., Tran, H., Markovic-Plese, S., Jewells, V., Warfield, S.: 3D segmentation in the clinic: A grand challenge ii: Ms lesion segmentation. MIDAS 2008, 1–6 (2008)
Tulder, G., Bruijne, M.: Why does synthesized data improve multi-sequence classification? In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9349, pp. 531–538. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24553-9_65
Tustison, N.J., Shrinidhi, K., Wintermark, M., Durst, C.R., Kandel, B.M., Gee, J.C., Grossman, M.C., Avants, B.B.: Optimal symmetric multimodal templates and concatenated random forests for supervised brain tumor segmentation (simplified) with ANTsR. Neuroinformatics 13(2), 209–225 (2015)
Van Buuren, S.: Flexible imputation of missing data. CRC Press, Boca Raton (2012)
Zhao, L., Wu, W., Corso, J.J.: Semi-automatic brain tumor segmentation by constrained MRFs using structural trajectories. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8151, pp. 567–575. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40760-4_71
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Havaei, M., Guizard, N., Chapados, N., Bengio, Y. (2016). HeMIS: Hetero-Modal Image Segmentation. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_54
Download citation
DOI: https://doi.org/10.1007/978-3-319-46723-8_54
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)