Improved Post-hoc Probability Calibration for Out-of-Domain MRI Segmentation

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13563))

Included in the following conference series:

International Workshop on Uncertainty for Safe Utilization of Machine Learning in Medical Imaging

649 Accesses
2 Citations

Abstract

Probability calibration for deep models is highly desirable in safety-critical applications such as medical imaging. It makes output probabilities of deep networks interpretable, by aligning prediction probability with the actual accuracy in test data. In image segmentation, well-calibrated probabilities allow radiologists to identify regions where model-predicted segmentations are unreliable. These unreliable predictions often occur to out-of-domain (OOD) images that are caused by imaging artifacts or unseen imaging protocols. Unfortunately, most previous calibration methods for image segmentation perform sub-optimally on OOD images. To reduce the calibration error when confronted with OOD images, we propose a novel post-hoc calibration model. Our model leverages the pixel susceptibility against perturbations at the local level, and the shape prior information at the global level. The model is tested on cardiac MRI segmentation datasets that contain unseen imaging artifacts and images from an unseen imaging protocol. We demonstrate reduced calibration errors compared with the state-of-the-art calibration algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using Soft Labels to Model Uncertainty in Medical Image Segmentation

Assessing Reliability and Challenges of Uncertainty Estimations for Medical Image Segmentation

Post hoc calibration of medical segmentation models

Article Open access 25 February 2025

Notes

1.
To ensure that the calibration does not affect the accuracy of the task network, for each spatial location (m, n) in $\textbf{T}_i$, it is usually assumed that $\textbf{T}_i(c_j, m, n) = \textbf{T}_i(c_k, m, n), \ \forall (c_j, c_k) \in \{1,2,3,...,C\}$, i.e., temperature values remain the same for different channels/classes) [3, 6].
2.
We do not explicitly highlight it as aleatoric uncertainty, since we do not have the ground truth to evaluate the accuracy of this estimation of aleatoric uncertainty.

References

Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE CVPR, pp. 427–436 (2015)
Google Scholar
Gonzalez, C., Gotkowski, K., Bucher, A., Fischbach, R., Kaltenborn, I., Mukhopadhyay, A.: Detecting when pre-trained nnU-Net models fail silently for Covid-19 lung lesion segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 304–314. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_29
Chapter Google Scholar
Ding, Z., Han, X., Liu, P., Niethammer, M.: Local temperature scaling for probability calibration. In: Proceedings of the IEEE/CVF ICCV, pp. 6889–6899 (2021)
Google Scholar
Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61–74 (1999)
Google Scholar
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: ICML, vol. 1, pp. 609–616. Citeseer (2001)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330. PMLR (2017)
Google Scholar
Tomani, C., Buettner, F.: Towards trustworthy predictions from deep neural networks with fast adversarial calibration. In: Proceedings of the AAAI Conference, vol. 35, pp. 9886–9896 (2021)
Google Scholar
Ji, B., Jung, H., Yoon, J., Kim, K., et al.: Bin-wise temperature scaling (BTS): improvement in confidence calibration performance through simple scaling techniques. In: IEEE/CVF ICCV Workshop, pp. 4190–4196. IEEE (2019)
Google Scholar
Ovadia, Y., et al.: Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In: Advances in NeurIPS, vol. 32 (2019)
Google Scholar
Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P., Dokania, P.: Calibrating deep neural networks using focal loss. In: Advances in NeurIPS, vol. 33, pp. 15288–15299 (2020)
Google Scholar
Karimi, D., Gholipour, A.: Improving calibration and out-of-distribution detection in deep models for medical image segmentation. IEEE Trans. Artif. Intell., 1 (2022, early access). https://ieeexplore.ieee.org/document/9735278
Kireev, K., Andriushchenko, M., Flammarion, N.: On the effectiveness of adversarial training against common corruptions. arXiv preprint arXiv:2103.02325 (2021)
Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)
Google Scholar
Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in NIPS, vol. 30 (2017)
Google Scholar
Wang, G., Li, W., Aertsen, M., Deprest, J., Ourselin, S., Vercauteren, T.: Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing 338, 34–45 (2019)
Article Google Scholar
Mehrtash, A., Wells, W.M., Tempany, C.M., Abolmaesumi, P., Kapur, T.: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging 39(12), 3868–3878 (2020)
Article Google Scholar
Baumgartner, C.F., et al.: PHiSeg: capturing uncertainty in medical image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 119–127. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_14
Chapter Google Scholar
Zhang, L., et al.: Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Trans. Med. Imaging 39(7), 2531–2540 (2020)
Article Google Scholar
Chen, C., et al.: Realistic adversarial data augmentation for MR image segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 667–677. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_65
Chapter Google Scholar
Ouyang, C., et al.: Causality-inspired single-source domain generalization for medical image segmentation. arXiv preprint arXiv:2111.12525 (2021)
Larrazabal, A.J., Martínez, C., Glocker, B., Ferrante, E.: Post-DAE: anatomically plausible segmentation via post-processing with denoising autoencoders. IEEE Trans. Med. Imaging 39(12), 3813–3820 (2020)
Article Google Scholar
Liu, Q., Chen, C., Dou, Q., Heng, P.A.: Single-domain generalization in medical image segmentation via test-time adaptation from shape dictionary (2022)
Google Scholar
Chen, C., Hammernik, K., Ouyang, C., Qin, C., Bai, W., Rueckert, D.: Cooperative training and latent space data augmentation for robust medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 149–159. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_14
Chapter Google Scholar
Robinson, R., et al.: Automatic quality control of cardiac MRI segmentation in large-scale population imaging. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 720–727. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_82
Chapter Google Scholar
Li, K., Yu, L., Heng, P.A.: Towards reliable cardiac image segmentation: assessing image-level and pixel-level segmentation quality via self-reflective references. Med. Image Anal. 78, 102426 (2022)
Article Google Scholar
Wang, S., et al.: Deep generative model-based quality control for cardiac MRI segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 88–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59719-1_9
Chapter Google Scholar
Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)
Google Scholar
Raju, A., et al.: Deep implicit statistical shape models for 3D medical image delineation. arXiv (2021)
Google Scholar
Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)
Article Google Scholar
Pérez-García, F., Sparks, R., Ourselin, S.: TorchIO: a python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 208, 106236 (2021)
Article Google Scholar
Zhuang, X., et al.: Cardiac segmentation on late gadolinium enhancement MRI: a benchmark study from multi-sequence cardiac MR segmentation challenge. Med. Image Anal. 81, 102528 (2022)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using Bayesian binning. In: Twenty-Ninth AAAI Conference (2015)
Google Scholar

Download references

Acknowledgments

This work was in part supported by EPSRC Programme Grants (EP/P001009/1, EP/W01842X/1) and in part by the UKRI London Medical Imaging and Artificial Intelligence Centre for Value Based Healthcare (No. 104691). S.W. was also supported by the Shanghai Sailing Programs of Shanghai Municipal Science and Technology Committee (22YF1409300).

Author information

Authors and Affiliations

BioMedIA Group, Department of Computing, Imperial College London, London, UK
Cheng Ouyang, Chen Chen, Zeju Li, Wenjia Bai, Bernhard Kainz & Daniel Rueckert
School of Basic Medical Sciences, Fudan University, Shanghai, China
Shuo Wang
Department of Brain Sciences, Imperial College London, London, UK
Wenjia Bai
Data Science Institute, Imperial College London, London, UK
Wenjia Bai
Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
Bernhard Kainz
Klinikum rechts der Isar, Technical University of Munich, Munich, Germany
Daniel Rueckert

Authors

Cheng Ouyang
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Zeju Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenjia Bai
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Kainz
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rueckert
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cheng Ouyang .

Editor information

Editors and Affiliations

University College London, London, UK
Carole H. Sudre
University of Tübingen, Tübingen, Germany
Christian F. Baumgartner
Massachusetts General Hospital, Charlestown, MA, USA
Adrian Dalca
Imperial College London, London, UK
Chen Qin
Google DeepMind, London, UK
Ryutaro Tanno
Technical University of Denmark, Kongens Lyngby, Denmark
Koen Van Leemput
Harvard Medical School, Boston, MA, USA
William M. Wells III

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouyang, C. et al. (2022). Improved Post-hoc Probability Calibration for Out-of-Domain MRI Segmentation. In: Sudre, C.H., et al. Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2022. Lecture Notes in Computer Science, vol 13563. Springer, Cham. https://doi.org/10.1007/978-3-031-16749-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-16749-2_6
Published: 14 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16748-5
Online ISBN: 978-3-031-16749-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)