Abstract
The rapid increase in telemedicine coupled with recent advances in diagnostic artificial intelligence (AI) create the imperative to consider the opportunities and risks of inserting AI-based support into new paradigms of care. Here we build on recent achievements in the accuracy of image-based AI for skin cancer diagnosis to address the effects of varied representations of AI-based support across different levels of clinical expertise and multiple clinical workflows. We find that good quality AI-based support of clinical decision-making improves diagnostic accuracy over that of either AI or physicians alone, and that the least experienced clinicians gain the most from AI-based support. We further find that AI-based multiclass probabilities outperformed content-based image retrieval (CBIR) representations of AI in the mobile technology environment, and AI-based support had utility in simulations of second opinions and of telemedicine triage. In addition to demonstrating the potential benefits associated with good quality AI in the hands of non-expert clinicians, we find that faulty AI can mislead the entire spectrum of clinicians, including experts. Lastly, we show that insights derived from AI class-activation maps can inform improvements in human diagnosis. Together, our approach and findings offer a framework for future studies across the spectrum of image-based diagnostics to improve human–computer collaboration in clinical practice.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
Data availability
The origin of training set images is reported in the dataset publication of HAM10000 (https://doi.org/10.1038/sdata.2018.161)17. Training set and test set images with reader data are available from the Harvard Dataverse (https://doi.org/10.7910/DVN/DBW86T), where lesion segmentation data will be available upon publication of this work. Test-set images with ground truth are available upon request from the ISIC archive (challenge@isic-archive.com).
Code availability
Code for the CNN is available upon request from the corresponding author for academic use.
References
Webster, P. Virtual health care in the era of COVID-19. Lancet 395, 1180–1181 (2020).
He, J. et al. The practical implementation of artificial intelligence technologies in medicine. Nat. Med. 25, 30–36 (2019).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Haenssle, H. A. et al. Man against machine: diagnostic performance of a deep learning convolutional neural network for dermoscopic melanoma recognition in comparison to 58 dermatologists. Ann. Oncol. 29, 1836–1842 (2018).
Han, S. S. et al. Classification of the clinical images for benign and malignant cutaneous tumors using a deep learning algorithm. J. Invest. Dermatol. 138, 1529–1538 (2018).
Marchetti, M. A. et al. Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images. J. Am. Acad. Dermatol. 78, 270–277 (2018).
Tschandl, P. et al. Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study. Lancet Oncol. 20, 938–947 (2019).
Garg, A. X. et al. Effects of computerized clinical decision support systems on practitioner performance and patient outcomes: a systematic review. JAMA 293, 1223–1238 (2005).
Codella, N. C. F. et al. Collaborative human–AI (CHAI): evidence-based interpretable melanoma classification in dermoscopic images. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications (eds., Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, ETH Zurich, Ben Glocker, Roland Wiest, Yaniv Gur, Hayit Greenspan, Anant Madabhushi) 97–105 (Springer International Publishing, 2018).
Bien, N. et al. Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet. PLoS Med. 15, e1002699 (2018).
Mobiny, A., Singh, A. & Van Nguyen, H. Risk-aware machine learning classifier for skin lesion diagnosis. J. Clin. Med. 8, 1241 (2019).
Han, S. S. et al. Augment intelligence dermatology: deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Invest. Dermatol. https://doi.org/10.1016/j.jid.2020.01.019 (2020).
Hekler, A. et al. Superior skin cancer classification by the combination of human and artificial intelligence. Eur. J. Cancer 120, 114–121 (2019).
Lakhani, P. & Sundaram, B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284, 574–582 (2017).
Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 (2018).
Codella, N. et al. Skin lesion analysis toward melanoma detection 2018: a challenge hosted by the International Skin Imaging Collaboration (ISIC). Preprint at https://arxiv.org/abs/1902.03368 (2019).
Sadeghi, M., Chilana, P. K. & Atkins, M. S. How users perceive content-based image retrieval for identifying skin images. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications (eds., Kenji Suzuki, Mauricio Reyes, Tanveer Syeda-Mahmood, ETH Zurich, Ben Glocker, Roland Wiest, Yaniv Gur, Hayit Greenspan, Anant Madabhushi) 141–148 (Springer International Publishing, 2018).
Tschandl, P., Argenziano, G., Razmara, M. & Yap, J. Diagnostic accuracy of content-based dermatoscopic image retrieval with deep classification features. Br. J. Dermatol. 181, 155–165 (2019).
Cai, C. J. et al. Human-centered tools for coping with imperfect algorithms during medical decision-making. In Proc. 2019 CHI Conference on Human Factors in Computing Systems 1–14 (Association for Computing Machinery, 2019).
Wang, M. & Deng, W. Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018).
Finlayson, S.G. et al. Adversarial attacks on medical machine learning. Science 363, 1287–1289 (2019).
Navarrete-Dechent, C. et al. Automated dermatological diagnosis: hype or reality? J. Invest. Dermatol. 138, 2277–2279 (2018).
Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).
Cai, C. J., Winter, S., Steiner, D., Wilcox, L. & Terry, M. ‘Hello AI’: uncovering the onboarding needs of medical practitioners for human–AI collaborative decision-making. In Proc. ACM on Human–Computer Interaction (Association for Computing Machinery, 2019).
Janda, M. et al. Accuracy of mobile digital teledermoscopy for skin self-examinations in adults at high risk of skin cancer: an open-label, randomised controlled trial. Lancet Digit. Health 2, e129–e137 (2020).
Gessert, N., Nielsen, M., Shaikh, M., Werner, R. & Schlaefer, A. Skin lesion classification using ensembles of multi-resolution EfficientNets with meta data. MethodsX 7, 100864 (2020).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).
Li, X., Wu, J., Chen, E. Z. & Jiang, H. From deep learning towards finding skin lesion biomarkers. Conf. Proc. IEEE Eng. Med. Biol. Soc. 2019, 2797–2800 (2019).
Bissoto, A., Fornaciali, M., Valle, E. & Avila, S. (De)constructing bias on skin lesion datasets. Preprint at https://arxiv.org/abs/1904.08818 (2019).
Lapuschkin, S. et al. Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, 2019).
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (eds. Wallach, H. et al.) 8026–8037 (Curran Associates, 2019).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (2016).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In Proc. 3rd International Conference for Learning Representations (eds., Bengio, Y., LeCun, Y.) (2015).
Barata, C., Celebi, M. E. & Marques, J. S. Improving dermoscopy image classification using color constancy. IEEE J. Biomed. Health Inform. 19, 1146–1152 (2015).
Rinner, C., Kittler, H., Rosendahl, C. & Tschandl, P. Analysis of collective human intelligence for diagnosis of pigmented skin lesions harnessed by gamification via a web-based training platform: simulation reader study. J. Med. Internet Res. 22, e15597 (2020).
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70 (1979).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlag, 2016).
Acknowledgements
We thank all dermachallenge.com users for their participation. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Author information
Authors and Affiliations
Contributions
P.T., C.R. and H.K. conceived and designed experiments. P.T. trained the CNN and produced image predictions. C.R. and P.T. developed the web-based reader platforms together with H.K., and M.J. and H.P.S. provided data for the telemedicine study. H.K. and P.T. conducted statistical analyses. H.K., P.T., C.R., Z.A., G.A., N.C., A.H., M.J., A.L., C.L., J.M., J.P., S.P., C.R., H.P.S. and I.Z. helped collect rater data and interpreted findings. H.K., P.T., C.R., N.C. and A.H. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare the following competing interests: P.T. received fees from Silverchair and an unrestricted 1-year postdoc grant from MetaOptima Technology. N.C. is an IBM employee and owns diverse investments across technology and health-care companies. A.H. is a consultant to Canfield Scientific and an advisory board member of Scibase. M.J. is funded by a National Health and Medical Research Council (NHMRC) TRIP Fellowship (APP1151021). H.P.S. is a shareholder of MoleMap and e-derm-consult and undertakes regular teledermatological reporting for both companies, is a medical consultant for Canfield Scientific and Revenio Research Oy and a medical advisor for First Derm and MetaOptima Technology, and has a Medical Advisory Board Appointment with MoleMap. H.P.S. holds an Australian NHMRC Practitioner Fellowship (APP1137127). All other authors report no conflict of interest in the topic of this manuscript. This project was conducted after ethical committee review at the Medical University of Vienna under protocol numbers 1804/2017, 1503/2018 and 2308/2019. All participants of the study platform agreed to academic research of usage data upon registration and have continuous ability to withdraw that consent.
Additional information
Peer review information Javier Carmona was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 The neural network puts more relative attention to the non-lesion background in actinic keratoses.
Gradient-weighted Class Activation Maps (Grad-CAM, right column) for the top-1 prediction class of the CNN were created for all HAM10000 images, a sample for every ground-truth class is shown in the left column. The mean activation value per pixel of background- and lesion-area were estimated using manual segmentation masks (middle column). The quotient of background over lesion activation showed higher background activation for the predictions of the class AKIEC class versus all other classes (mean .48 vs. .32, p = 4.6 × 10−12, two-sided unpaired t-test). Thick central lines denote the median, lower and upper box limits the first and third quartiles, whiskers extend from the box to the most extreme value not further than 1.5 times the IQR.
Extended Data Fig. 2 Raters choose an asymmetric decision cutoff for malignancy.
a, When changing answers from benign to malignant (dark blue) or malignant to benign (light blue) diagnoses, the average cutoff for the AI-provided malignancy-probability was not 50% but <25% (yellow dotted line). b, On the ROC-curve for detecting malignant cases of the underlying AI (black line), this cutoff chosen inherently by the users (yellow dot), that is without instructions or prior knowledge about the AI accuracy, had a higher sensitivity and was closer to the ideal cutoff (blue dot), as measured by Youden’s index, than the ‘symmetric’ 50% cutoff (black dot).
Supplementary information
Supplementary Information
Supplementary Tables 1–4.
Rights and permissions
About this article
Cite this article
Tschandl, P., Rinner, C., Apalla, Z. et al. Human–computer collaboration for skin cancer recognition. Nat Med 26, 1229–1234 (2020). https://doi.org/10.1038/s41591-020-0942-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41591-020-0942-0
This article is cited by
-
Investigating the Quality of DermaMNIST and Fitzpatrick17k Dermatological Image Datasets
Scientific Data (2025)
-
CORE-MD clinical risk score for regulatory evaluation of artificial intelligence-based medical device software
npj Digital Medicine (2025)
-
Value Creation for Healthcare Ecosystems through Artificial Intelligence Applied to Physician-to-Physician Communication: A Systematic Review
Neural Processing Letters (2025)
-
The Epistemic Cost of Opacity: How the Use of Artificial Intelligence Undermines the Knowledge of Medical Doctors in High-Stakes Contexts
Philosophy & Technology (2025)
-
Redefining intelligence: collaborative tinkering of healthcare professionals and algorithms as hybrid entity in public healthcare decision-making
AI & SOCIETY (2025)