Out of Distribution Detection for Intra-operative Functional Imaging

Tim J. Adler^22,23,
Leonardo Ayala²²,
Lynton Ardizzone²⁴,
Hannes G. Kenngott²⁵,
Anant Vemuri²²,
Beat P. Müller-Stich²⁵,
Carsten Rother²⁴,
Ullrich Köthe²⁴ &
…
Lena Maier-Hein²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11840))

Included in the following conference series:

1831 Accesses
4 Citations
1 Altmetric

Abstract

Multispectral optical imaging is becoming a key tool in the operating room. Recent research has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation. However, the accuracy of these algorithms can only be guaranteed if the spectra acquired during surgery match the ones seen during training. It is therefore of great interest to detect so-called out of distribution (OoD) spectra to prevent the algorithm from presenting spurious results. In this paper we present an information theory based approach to OoD detection based on the widely applicable information criterion (WAIC). Our work builds upon recent methodology related to invertible neural networks (INN). Specifically, we make use of an ensemble of INNs as we need their tractable Jacobians in order to compute the WAIC. Comprehensive experiments with in silico, and in vivo multispectral imaging data indicate that our approach is well-suited for OoD detection. Our method could thus be an important step towards reliable functional imaging in the operating room.

You have full access to this open access chapter, Download conference paper PDF

Uncertainty-aware performance assessment of optical imaging modalities with invertible neural networks

Article 22 March 2019

Physiological Parameter Estimation from Multispectral Images Unleashed

Intraoperative Imaging for Procedures of the Gastrointestinal Tract

1 Introduction

The most commonly applied approach to computer aided surgery (CAS) relies on fusing pre-operative medical images with the current patient anatomy for augmented reality guidance. While this approach is well-suited for displaying subsurface structures detected in pre-operative images, such as tumors or vessels, a main bottleneck is the fact that it cannot account for tissue dynamics; live monitoring of perfusion, for example, is not possible with an approach that relies on ‘offline images’. To address this shortcoming, recent research has focused on intra-operative functional imaging using biophotonics techniques. In this context, multispectral optical imaging is evolving as a key tool. Previous work has shown that machine learning algorithms can be used to convert pixel-wise reflectance measurements to tissue parameters, such as oxygenation [13, 14]. These methods learn to infer tissue parameters via training samples providing a spectrum and the correct corresponding tissue parameter(s) (supervised learning). However, the accuracy of these algorithms is heavily effected by aleatoric and epistemic uncertainties [6].

In this paper, we argue for a multi-stage process for uncertainty handling as illustrated in Fig. 1. (1) To investigate whether the input is sufficiently close to the training data, out of distribution (OoD) detection is performed. (2) If the input is regarded as valid, the corresponding functional tissue parameters are computed, and a full posterior probability distribution is provided as output for each tissue parameter. As the second part of this pipeline has already been addressed in recent publications [1, 2], we will focus on the first part. The following sections present and validate our proposed approach to OoD detection.

2 Methods

While we are not aware of any previous work in OoD detection in the field of optical imaging, the topic has gained increasing interest in the machine learning community. To implement the proposed multi-stage process for uncertainty handling in multispectral image analysis (Fig. 1), we build our method upon the work by Choi et al. [3] who proposed the widely applicable information criterion (WAIC) as a means to measure the closeness of a new sample to the training distribution. The advantage of this method lies in the fact that it outperforms many other ensemble based unsupervised learning methods while still being easily computable. An unsupervised approach is integral to the method as it is not feasible to generate enough labeled negative samples to train a discriminator between in- and outliers [3, 10]. The challenge in applying WAIC is the fact that it is an ensemble based method leading to the necessity of training a model multiple times. Depending on the data dimensions this can become prohibitively expensive both in terms of time and hardware requirements. In this work, we use invertible neural networks (INN) [2] to estimate WAIC on multispectral endoscopic imaging data.

In this section, we briefly revisit WAIC [3] and give an intuition for this quantity (Sect. 2.1), present the INN architecture as an integral ingredient to apply WAIC in the surgical domain (Sect. 2.2) and describe our experimental validation (Sect. 2.3).

2.1 Principle of WAIC

In the original contribution [12], WAIC is defined as

$$\begin{aligned} {\text {WAIC}}(x) = {\text {Var}}_{\varTheta }[\log p(x\mid \varTheta )] - \mathbb {E}_{\varTheta }[\log p(x\mid \varTheta )], \end{aligned}$$

(1)

where ${\text {WAIC}}(x)$ quantifies the proximity of a sample x to the distribution of the training data $X^{\text {tr}}$, and $\varTheta $ is distributed according to $p(\varTheta \mid X^{\text {tr}})$. In a very recent publication [3] it was suggested to use WAIC as a means for OoD in the setting of neural networks.^{Footnote 1} The variance term in Eq. (1) measures ‘how certain’ the posterior distribution $p(\cdot \mid \varTheta )$ is about a sample $x$, the heuristic being that it should be more certain about samples that are close to what it has seen before. The expectation term in Eq. (1) is used for normalization. The idea is that if the expectation of $\log p(x \mid \varTheta )$ is high then the spread measured by the variance might also be larger without really measuring internal uncertainty of the model. Hence, it is subtracted to account for this effect.

2.2 WAIC Computation with INNs

WAIC only works for parametrized models. To meet this precondition, we use a deep neural network to encode the spectra $X$ in a latent space $Z$ following an analytically tractable distribution, which we chose to be a multivariate standard Gaussian. Let $f_\varTheta :X \subset \mathbb {R}^n \rightarrow Z \subset \mathbb {R}^n$ denote the the neural network with parameters $\varTheta $. Then we can use the change of variable formula to compute the log-likelihood $\log p(x\mid \varTheta )$ for a spectrum $x$ as

$$\begin{aligned} \log p(x \mid \varTheta ) = -\frac{1}{2} \Vert f_\varTheta (x)\Vert ^2_2 - \frac{n}{2}\log (2\pi ) + \log |\det Jf_\varTheta (x)|, \end{aligned}$$

(2)

where $Jf_\varTheta $ denotes its Jacobian [11]. Equation (2) shows that it is mandatory for the log-Jacobi determinant of our network to be efficiently computable. One established architecture is the one of normalizing flows originally introduced in [4] and refined in [2] under the name of invertible neural networks (INN). For each of the experiments described in the next section, we trained an ensemble of INNs to estimate $p(\varTheta \mid X^\text {tr})$. Each network consisted of 10 layers of so called coupling blocks (see [4]) each followed by a permutation layer. Each coupling block consisted of a 3 layer fully connected network with ReLU activation functions. The networks were trained using Maximum-Likelihood training, i. e. by minimizing the loss $L(x) = -\log p(x \mid \varTheta )$ as given in Eq. (2) using the Adam optimizer [7].

2.3 Experiments

The purpose of our experiments was to validate our approach to OoD detection in silico (Sect. 2.3), and to present in vivo use cases (Sect. 2.3).

In Silico Quantitative Validation. In our simulation framework, a multispectral imaging pixel is generated from a 8-valued vector $\mathbf {t}_i$ of tissue properties, which are assumed to be relevant for the image formation process. Plausible tissue samples $\mathbf {t}_i$, are drawn from a layered tissue model as proposed in [14]. The framework was used to generate a data set $X_\text {raw}$, consisting of 550,000 high resolution spectra and corresponding ground truth tissue properties. It was split in a training $X_\text {raw}^\text {tr}$ and test set $X_\text {raw}^\text {te}$, comprising 500,000 and 50,000 spectra respectively. For the in silico quantitative validation we converted the (high resolution) spectra of the simulated data sets to plausible camera measurements using the filter response functions of the 8-band Pixelteq SpectroCam. We use a subscript (here: SC for SpectroCam) to refer to the data set $X_\text {raw}$ after it was adapted to a certain camera. $X_\text {SC}^\text {tr}$ was split into a small training set $X_\text {SC}^\text {tr,s}$ and a superset $X_\text {SC}^\text {sup}$, such that the support of $X_\text {SC}^\text {tr,s}$ lay within the support of $X_\text {SC}^\text {sup}$ and $X_\text {SC}^\text {sup}$ consisted of a cluster of data points outside of the support of $X_\text {SC}^\text {tr,s}$, as illustrated in Fig. 2 (a). This led to a split of $X_\text {SC}^\text {tr,d}$ of approximately 49% of $X_\text {SC}^\text {tr}$ and $X_\text {SC}^\text {sup}$ of approximately 51% of $X_\text {SC}^\text {tr}$. An ensemble of five INNs was trained on $X_\text {SC}^\text {tr,s}$ and the WAIC value was evaluated on $X_\text {SC}^\text {sup}$. We defined $X_\text {SC}^\text {sup,r}$ as the reduced data set of $X_\text {SC}^\text {sup}$ lying in the support of $X_\text {SC}^\text {tr,s}$. We then investigated (1) whether the WAIC distribution of the $X_\text {SC}^\text {sup,r}$ matches that of the $X_\text {SC}^\text {tr,s}$ and whether (2) the part of $X_\text {SC}^\text {sup}$ not in the support of $X_\text {SC}^\text {tr,s}$ was correctly classified as outliers by our method.

In Vivo Application. While the goal of the previous experiment was to confirm the validity of our approach in an in silico setting, the purpose of the in vivo experiments were to showcase applications in which the OoD detection could be useful.

Anomaly/Novelty Detection: Detecting (parts of) a multispectral image in which the spectra do not closely match the training data distribution can be useful for many reasons. Possible applications include the detection of abnormal tissue or of artifical objects (e. g. instruments). To investigate this aspect, we used the complete $X_\text {SC}^\text {tr}$ to train an ensemble of five INNs. As in vivo test data, we acquired endoscopic images of porcine organs which we classified as organs lying in the simulation domain $X^\text {iD}$ and organs not lying in the simulation domain $X^\text {oD}$. These spectra were acquired using a Pixelteq SpectroCam on a 30$^\circ $ Stortz laparascope with a Stortz Xenon light source (Storz D-light P 201337 20). We classified liver, spleen, abdominal wall, diaphragm and bowl as in domain organs as hemoglobin can be assumed to be the main absorber in these. In contrast, we classified gallbladder as an out of domain organ, since bile is a notable absorber but has not been considered in our simulation framework. With this, $X^\text {iD}$ and $X^\text {oD}$ consisted of 50000 spectra and 10000 spectra respectively. Our hypothesis was that the WAIC values of $X^\text {iD}$ should be much lower than those for $X^\text {oD}$. For reference, we also compared the resulting WAIC distributions to that of the simulated test data $X_\text {SC}^\text {te}$.

Detection of Scene Changes: Intra-operative image modalities often rely on a careful calibration of the device. When recovering blood oxygenation from multispectral measurements, for example, the regressor is typically trained with the light source that is used during test time. To investigate whether WAIC is applicable to detect illumination changes (which would substantially harm the method and render the estimation results invalid), we adapted $X_\text {raw}$ to a xiQ XIMEA (Muenster, Germany) $SNm4\times 4$ mosaic camera consisting of 16 bands assuming a Wolf LED light source (Wolf Endolight LED 2.2). We trained an ensemble of five INNs on $X_\text {Xim}^\text {tr}$. Furthermore, we recorded 200 $512\times 272$-pixel images of the lip of a healthy human volunteer using the xiQ XIMEA camera and a 30$^\circ $ Stortz laparascope (cf. Fig. 4(b)). At around image 80 we switched the endoscope from a Stortz Xenon light source (Storz D-light P 201337 20) to a Wolf LED light source (Wolf Endolight LED 2.2). Based on the hypothesis that the switch in light source would be detected by WAIC analysis, we computed the WAIC time series for the region of interest depicted in Fig. 4(a).

3 Results

In Silico Validation. The distribution of the reduced training data $X_\text {SC}^\text {tr,s}$ and the superset $X_\text {SC}^\text {sup}$ (projected to the first two principal components) can be found in Fig. 2(a). It can be seen that the samples with poor WAIC values (red) concentrate in the superset part not contained in $X_\text {SC}^\text {tr,s}$, whereas samples with low WAIC values (green) are contained in the interior of $X_\text {SC}^\text {tr,s}$. Figure 2(b) shows a comparison between the WAIC distributions of $X_\text {SC}^\text {tr,s}$, $X_\text {SC}^\text {sup}$ and the restricted superset $X_\text {SC}^\text {sup,r}$. The data sets $X_\text {SC}^\text {tr,s}$ and $X_\text {SC}^\text {sup,r}$ are in excellent agreement. The superset $X_\text {SC}^\text {sup}$ only differs in the regard that there are far more outliers, which can be accounted to the data points outside of $X_\text {SC}^\text {tr,s}$.

Application. The WAIC distribution for the test data $X_\text {SC}^\text {te}$, the in domain organs $X^\text {iD}$ and the out of domain organ $X^\text {oD}$ can be found in Fig. 3. The distribution of the test data is by far the sharpest with a maximum a posterior probability (MAP) at −4.9. The distribution closely matches that of the training data (MAP = −4.9). The distribution of $X^\text {iD}$ also possesses a sharp maximum, however with a far heavier tail. The MAP estimate yields 9.3 which indicates a still existent domain gap between our simulation domain and the organ domain. The distribution of $X^\text {oD}$ is very noisy and has an even heavier tail than $X^\text {iD}$. The MAP lies at 34. This indicates that our WAIC estimate is suitable to distinguish between in domain tissue and out of domain tissue. Similarly, Fig. 4 illustrates that the change in illumination as performed in the detection of scence changes experiment results in a drastic change of WAIC values.

4 Discussion

The accuracy of machine learning-based regression methods in multispectral imaging crucially depend on whether the spectra acquired during surgery match the ones seen during training. Although initial steps with respect to uncertainty estimation and compensation have been taken in the field of optical imaging [1, 2, 5, 8, 9, 15], we are, to our knowledge, the first to address the problem of OoD detection to prevent algorithms from presenting spurious results.

The application to endoscopic organ data showed that our method is well-suited for anomaly detection. The in distribution organs are well separated from the out of distribution organ (gallbladder). Moreover, this experiment reveals a shortcoming of the simulation framework proposed by [14]: The large difference in the WAIC distribution between the test set and the real data indicates a domain gap that remains to be tackled.

Our experiments with human lips show that WAIC is able to distinguish between different lighting conditions. The uncertainty prior to the change of lighting can most likely be explained by the short darkness stemming from the light source switch. The jump at image 100 was due to involuntary movement of the volunteer leading to the image being out of focus. One reason for the generally high WAIC values is the fact that melanin (a chromophore in the skin) was not simulated in the training data.

In the present implementation we used five INNs in our ensembles. According to preliminary experiments, this number is sufficient. We computed the WAIC on the data sets used for the anomality detection experiment (Sect. 2.3) for up to 20 ensemble members. For both the simulated test data and the in domain organs the values stabilized below $n = 10$. For the out of domain data (gallbladder) the WAIC values increased throughout. This merits further investigation. However, there should be no impact on the method performance, as $X^\text {iD}$ and $X^\text {oD}$ were well separated.

Our findings underline the power of WAIC in the setting of medical OoD detection. However, there are still some open questions. A general downside of WAIC is its ‘arbitrary units’ and it is not straightforward to define a threshold for outlier detection. One approach to tackle this shortcoming would be to find a suitable normalization. Another possibility might be to just mask the worst $n$ pixels in a certain ROI. Additionally, to this conceptual question, there are also practical limitations. The estimation of WAIC requires an ensemble of neural networks, which was feasible in our case, but becomes prohibitively expensive for larger input dimensions. For the future, methods for network compression might be adapted to tackle this problem.

In conclusion, this paper is the first to address the topic of OoD detection in intra-operative imaging. Due to the promising results obtained in this study, the approach proposed could not only become a valuable tool for increasing the reliability of machine learning-based regression methods but could also boost research in unsupervised intra-operative anomaly detection.

Notes

1.
Please note that the sign convention of WAIC of Choi and Watanabe are opposite. We chose Watanabe’s definition.

References

Adler, T.J., et al.: Uncertainty-aware performance assessment of optical imaging modalities with invertible neural networks. Int. J. Comput. Assist. Radiol. Surg. (2019)
Google Scholar
Ardizzone, L., Kruse, J., Rother, C., Köthe, U.: Analyzing inverse problems with invertible neural networks. In: International Conference on Learning Representations (2019)
Google Scholar
Choi, H., Jang, E., Alemi, A.A.: Waic, but why? Generative ensembles for robust anomaly detection. CoRR (2018)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using Real NVP. CoRR (2016)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing Model Uncertainty in deep learning (2016)
Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kohl, S.A.A., et al.: A probabilistic U-Net for segmentation of ambiguous images (2018)
Google Scholar
Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. (2017)
Google Scholar
Markou, M., Singh, S.: Novelty detection: a reviewpart 1: statistical approaches. Sig. Process. (2003)
Google Scholar
Walter, R.: Real and Complex Analysis (1987)
Google Scholar
Watanabe, S.: Algebraic Geometry and Statistical Learning Theory. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Wirkert, S.J., et al.: Robust near real-time estimation of physiological parameters from megapixel multispectral images with inverse Monte Carlo and random forest regression. Int. J. Comput. Assist. Radiol. Surg. (2016)
Google Scholar
Wirkert, S.J., et al.: Physiological parameter estimation from multispectral images unleashed. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10435, pp. 134–141. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66179-7_16
Chapter Google Scholar
Zhu, Y., Zabaras, N.: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification. J. Comput. Phys. (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Division Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
Tim J. Adler, Leonardo Ayala, Anant Vemuri & Lena Maier-Hein
Faculty of Mathematics and Computer Science, Heidelberg University, Heidelberg, Germany
Tim J. Adler
Visual Learning Lab, Heidelberg University, Heidelberg, Germany
Lynton Ardizzone, Carsten Rother & Ullrich Köthe
Division of Minimally-invasive Surgery of the Department of General Surgery, Heidelberg University, Heidelberg, Germany
Hannes G. Kenngott & Beat P. Müller-Stich

Authors

Tim J. Adler
View author publications
You can also search for this author in PubMed Google Scholar
Leonardo Ayala
View author publications
You can also search for this author in PubMed Google Scholar
Lynton Ardizzone
View author publications
You can also search for this author in PubMed Google Scholar
Hannes G. Kenngott
View author publications
You can also search for this author in PubMed Google Scholar
Anant Vemuri
View author publications
You can also search for this author in PubMed Google Scholar
Beat P. Müller-Stich
View author publications
You can also search for this author in PubMed Google Scholar
Carsten Rother
View author publications
You can also search for this author in PubMed Google Scholar
Ullrich Köthe
View author publications
You can also search for this author in PubMed Google Scholar
Lena Maier-Hein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tim J. Adler or Lena Maier-Hein .

Editor information

Editors and Affiliations

Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
University College London, London, UK
Ryutaro Tanno
Fraunhofer Singapore, Nanyang Technological University, Singapore, Singapore
Marius Erdt
McGill University, Montreal, QC, Canada
Tal Arbel
ETH Zürich, Zürich, Switzerland
Christian Baumgartner
Massachusetts Institute of Technology, Harvard Medical School, Cambridge, MA, USA
Adrian Dalca
University College London, King's College London, London, UK
Carole H. Sudre
Harvard Medical School, Boston, MA, USA
William M. Wells
Aachen University of Applied Sciences, Aachen, Germany
Klaus Drechsler
Children’s National Healthcare System, Washington, D.C., DC, USA
Marius George Linguraru
Fraunhofer IGD, Darmstadt, Germany
Cristina Oyarzun Laura
Children's National Healthcare System, Washington, D.C., DC, USA
Raj Shekhar
Fraunhofer IGD, Darmstadt, Germany
Stefan Wesarg
ICREA - Universitat Pompeu Fabra, Barcelona, Spain
Miguel Ángel González Ballester

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adler, T.J. et al. (2019). Out of Distribution Detection for Intra-operative Functional Imaging. In: Greenspan, H., et al. Uncertainty for Safe Utilization of Machine Learning in Medical Imaging and Clinical Image-Based Procedures. CLIP UNSURE 2019 2019. Lecture Notes in Computer Science(), vol 11840. Springer, Cham. https://doi.org/10.1007/978-3-030-32689-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-32689-0_8
Published: 07 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32688-3
Online ISBN: 978-3-030-32689-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)