The curious case of the test set AUROC

Roberts, Michael; Hazan, Alon; Dittmer, Sören; Rudd, James H. F.; Schönlieb, Carola-Bibiane

doi:10.1038/s42256-024-00817-7

Comment
Published: 04 April 2024

The curious case of the test set AUROC

Nature Machine Intelligence volume 6, pages 373–376 (2024)Cite this article

1819 Accesses
8 Citations
4 Altmetric
Metrics details

Subjects

A Publisher Correction to this article was published on 12 April 2024

This article has been updated

The area under the receiver operating characteristic curve (AUROC) of the test set is used throughout machine learning (ML) for assessing a model’s performance. However, when concordance is not the only ambition, this gives only a partial insight into performance, masking distribution shifts of model outputs and model instability.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Artificial intelligence in the identification and prediction of adverse transfusion reactions(ATRs) and implications for clinical management: a systematic review of models and applications
- Mahdie ShojaeiBaghini
- , Mohammad Mehdi Ghaemi
- & Alihasan Ahmadipour
BMC Medical Informatics and Decision Making Open Access 28 October 2025
Diagnosing pathologic myopia by identifying morphologic patterns using ultra widefield images with deep learning
- Yang Liu
- , Keming Zhao
- … Jiansong Ji
npj Digital Medicine Open Access 13 July 2025
Developing multifactorial dementia prediction models using clinical variables from cohorts in the US and Australia
- Caitlin A. Finney
- , David A. Brown
- & Artur Shvetcov
Translational Psychiatry Open Access 21 January 2025

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Equivalence of ROC and AUROC for different distributions.**

**Fig. 2: Multiple data cohort discrepancies.**

**Fig. 3: Single-dataset discrepancy scores.**

Change history

12 April 2024
A Correction to this paper has been published: https://doi.org/10.1038/s42256-024-00834-6

References

Halligan, S., Altman, D. G. & Mallett, S. Eur. Radiol. 25, 932–939 (2015).
Article Google Scholar
Lobo, J. M., Jiménez-Valverde, A. & Real, R. Glob. Ecol. Biogeogr. 17, 145–151 (2008).
Article Google Scholar
Kwegyir-Aggrey, K., Gerchick, M., Mohan, M. Horowitz, A. & Venkatasubramanian, S. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’23) 1570–1583 (ACM, 2023).
White, N., Parsons, R., Collins, G. & Barnett, A. BMC Med. 21, 339 (2023).
Article Google Scholar
Rabe, C. et al. Alzheimers Dement. 19, 1393–1402 (2023).
Article Google Scholar
Roberts, M. et al. Nat. Mach. Intell. 3, 199–217 (2021).
Article Google Scholar
Wynants, L. et al. BMJ 369, m1328 (2020).
Article Google Scholar
Chicco, D. & Jurman, G. BioData Min. 16, 4 (2023).
Article Google Scholar
Hazan, A. & Dittmer, S. CodeOcean https://doi.org/10.24433/CO.1960655.v1 (2023).
Article Google Scholar

Download references

Author information

These authors contributed equally: Michael Roberts, Alon Hazan, Sören Dittmer.
Unaffiliated

Authors and Affiliations

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, UK
Michael Roberts, Sören Dittmer & Carola-Bibiane Schönlieb
Department of Medicine, University of Cambridge, Cambridge, UK
Michael Roberts & James H. F. Rudd
ZeTeM, University of Bremen, Bremen, Germany
Sören Dittmer

Authors

Michael Roberts
View author publications
Search author on:PubMed Google Scholar
Alon Hazan
View author publications
Search author on:PubMed Google Scholar
Sören Dittmer
View author publications
Search author on:PubMed Google Scholar
James H. F. Rudd
View author publications
Search author on:PubMed Google Scholar
Carola-Bibiane Schönlieb
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Michael Roberts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roberts, M., Hazan, A., Dittmer, S. et al. The curious case of the test set AUROC. Nat Mach Intell 6, 373–376 (2024). https://doi.org/10.1038/s42256-024-00817-7

Download citation

Published: 04 April 2024
Version of record: 04 April 2024
Issue date: April 2024
DOI: https://doi.org/10.1038/s42256-024-00817-7

This article is cited by

Artificial intelligence in the identification and prediction of adverse transfusion reactions(ATRs) and implications for clinical management: a systematic review of models and applications
- Mahdie ShojaeiBaghini
- Mohammad Mehdi Ghaemi
- Alihasan Ahmadipour
BMC Medical Informatics and Decision Making (2025)
Developing multifactorial dementia prediction models using clinical variables from cohorts in the US and Australia
- Caitlin A. Finney
- David A. Brown
- Artur Shvetcov
Translational Psychiatry (2025)
Diagnosing pathologic myopia by identifying morphologic patterns using ultra widefield images with deep learning
- Yang Liu
- Keming Zhao
- Jiansong Ji
npj Digital Medicine (2025)