[go: up one dir, main page]

Skip to main content

Improved Post-hoc Probability Calibration for Out-of-Domain MRI Segmentation

  • Conference paper
  • First Online:
Uncertainty for Safe Utilization of Machine Learning in Medical Imaging (UNSURE 2022)

Abstract

Probability calibration for deep models is highly desirable in safety-critical applications such as medical imaging. It makes output probabilities of deep networks interpretable, by aligning prediction probability with the actual accuracy in test data. In image segmentation, well-calibrated probabilities allow radiologists to identify regions where model-predicted segmentations are unreliable. These unreliable predictions often occur to out-of-domain (OOD) images that are caused by imaging artifacts or unseen imaging protocols. Unfortunately, most previous calibration methods for image segmentation perform sub-optimally on OOD images. To reduce the calibration error when confronted with OOD images, we propose a novel post-hoc calibration model. Our model leverages the pixel susceptibility against perturbations at the local level, and the shape prior information at the global level. The model is tested on cardiac MRI segmentation datasets that contain unseen imaging artifacts and images from an unseen imaging protocol. We demonstrate reduced calibration errors compared with the state-of-the-art calibration algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    To ensure that the calibration does not affect the accuracy of the task network, for each spatial location (mn) in \(\textbf{T}_i\), it is usually assumed that \(\textbf{T}_i(c_j, m, n) = \textbf{T}_i(c_k, m, n), \ \forall (c_j, c_k) \in \{1,2,3,...,C\}\), i.e., temperature values remain the same for different channels/classes) [3, 6].

  2. 2.

    We do not explicitly highlight it as aleatoric uncertainty, since we do not have the ground truth to evaluate the accuracy of this estimation of aleatoric uncertainty.

References

  1. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE CVPR, pp. 427–436 (2015)

    Google Scholar 

  2. Gonzalez, C., Gotkowski, K., Bucher, A., Fischbach, R., Kaltenborn, I., Mukhopadhyay, A.: Detecting when pre-trained nnU-Net models fail silently for Covid-19 lung lesion segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12907, pp. 304–314. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87234-2_29

    Chapter  Google Scholar 

  3. Ding, Z., Han, X., Liu, P., Niethammer, M.: Local temperature scaling for probability calibration. In: Proceedings of the IEEE/CVF ICCV, pp. 6889–6899 (2021)

    Google Scholar 

  4. Platt, J., et al.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in Large Margin Classifiers, vol. 10, no. 3, pp. 61–74 (1999)

    Google Scholar 

  5. Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: ICML, vol. 1, pp. 609–616. Citeseer (2001)

    Google Scholar 

  6. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330. PMLR (2017)

    Google Scholar 

  7. Tomani, C., Buettner, F.: Towards trustworthy predictions from deep neural networks with fast adversarial calibration. In: Proceedings of the AAAI Conference, vol. 35, pp. 9886–9896 (2021)

    Google Scholar 

  8. Ji, B., Jung, H., Yoon, J., Kim, K., et al.: Bin-wise temperature scaling (BTS): improvement in confidence calibration performance through simple scaling techniques. In: IEEE/CVF ICCV Workshop, pp. 4190–4196. IEEE (2019)

    Google Scholar 

  9. Ovadia, Y., et al.: Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. In: Advances in NeurIPS, vol. 32 (2019)

    Google Scholar 

  10. Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P., Dokania, P.: Calibrating deep neural networks using focal loss. In: Advances in NeurIPS, vol. 33, pp. 15288–15299 (2020)

    Google Scholar 

  11. Karimi, D., Gholipour, A.: Improving calibration and out-of-distribution detection in deep models for medical image segmentation. IEEE Trans. Artif. Intell., 1 (2022, early access). https://ieeexplore.ieee.org/document/9735278

  12. Kireev, K., Andriushchenko, M., Flammarion, N.: On the effectiveness of adversarial training against common corruptions. arXiv preprint arXiv:2103.02325 (2021)

  13. Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)

    Google Scholar 

  14. Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in NIPS, vol. 30 (2017)

    Google Scholar 

  15. Wang, G., Li, W., Aertsen, M., Deprest, J., Ourselin, S., Vercauteren, T.: Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks. Neurocomputing 338, 34–45 (2019)

    Article  Google Scholar 

  16. Mehrtash, A., Wells, W.M., Tempany, C.M., Abolmaesumi, P., Kapur, T.: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging 39(12), 3868–3878 (2020)

    Article  Google Scholar 

  17. Baumgartner, C.F., et al.: PHiSeg: capturing uncertainty in medical image segmentation. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11765, pp. 119–127. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32245-8_14

    Chapter  Google Scholar 

  18. Zhang, L., et al.: Generalizing deep learning for medical image segmentation to unseen domains via deep stacked transformation. IEEE Trans. Med. Imaging 39(7), 2531–2540 (2020)

    Article  Google Scholar 

  19. Chen, C., et al.: Realistic adversarial data augmentation for MR image segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12261, pp. 667–677. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59710-8_65

    Chapter  Google Scholar 

  20. Ouyang, C., et al.: Causality-inspired single-source domain generalization for medical image segmentation. arXiv preprint arXiv:2111.12525 (2021)

  21. Larrazabal, A.J., Martínez, C., Glocker, B., Ferrante, E.: Post-DAE: anatomically plausible segmentation via post-processing with denoising autoencoders. IEEE Trans. Med. Imaging 39(12), 3813–3820 (2020)

    Article  Google Scholar 

  22. Liu, Q., Chen, C., Dou, Q., Heng, P.A.: Single-domain generalization in medical image segmentation via test-time adaptation from shape dictionary (2022)

    Google Scholar 

  23. Chen, C., Hammernik, K., Ouyang, C., Qin, C., Bai, W., Rueckert, D.: Cooperative training and latent space data augmentation for robust medical image segmentation. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12903, pp. 149–159. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87199-4_14

    Chapter  Google Scholar 

  24. Robinson, R., et al.: Automatic quality control of cardiac MRI segmentation in large-scale population imaging. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 720–727. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_82

    Chapter  Google Scholar 

  25. Li, K., Yu, L., Heng, P.A.: Towards reliable cardiac image segmentation: assessing image-level and pixel-level segmentation quality via self-reflective references. Med. Image Anal. 78, 102426 (2022)

    Article  Google Scholar 

  26. Wang, S., et al.: Deep generative model-based quality control for cardiac MRI segmentation. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12264, pp. 88–97. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59719-1_9

    Chapter  Google Scholar 

  27. Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: CVPR Workshops, vol. 2 (2019)

    Google Scholar 

  28. Raju, A., et al.: Deep implicit statistical shape models for 3D medical image delineation. arXiv (2021)

    Google Scholar 

  29. Bernard, O., et al.: Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE Trans. Med. Imaging 37(11), 2514–2525 (2018)

    Article  Google Scholar 

  30. Pérez-García, F., Sparks, R., Ourselin, S.: TorchIO: a python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Comput. Methods Programs Biomed. 208, 106236 (2021)

    Article  Google Scholar 

  31. Zhuang, X., et al.: Cardiac segmentation on late gadolinium enhancement MRI: a benchmark study from multi-sequence cardiac MR segmentation challenge. Med. Image Anal. 81, 102528 (2022)

    Article  Google Scholar 

  32. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  33. Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using Bayesian binning. In: Twenty-Ninth AAAI Conference (2015)

    Google Scholar 

Download references

Acknowledgments

This work was in part supported by EPSRC Programme Grants (EP/P001009/1, EP/W01842X/1) and in part by the UKRI London Medical Imaging and Artificial Intelligence Centre for Value Based Healthcare (No. 104691). S.W. was also supported by the Shanghai Sailing Programs of Shanghai Municipal Science and Technology Committee (22YF1409300).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cheng Ouyang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouyang, C. et al. (2022). Improved Post-hoc Probability Calibration for Out-of-Domain MRI Segmentation. In: Sudre, C.H., et al. Uncertainty for Safe Utilization of Machine Learning in Medical Imaging. UNSURE 2022. Lecture Notes in Computer Science, vol 13563. Springer, Cham. https://doi.org/10.1007/978-3-031-16749-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-16749-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-16748-5

  • Online ISBN: 978-3-031-16749-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics