Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging

Gatidis, Sergios; Früh, Marcel; Fabritius, Matthias P.; Gu, Sijing; Nikolaou, Konstantin; Fougère, Christian La; Ye, Jin; He, Junjun; Peng, Yige; Bi, Lei; Ma, Jun; Wang, Bo; Zhang, Jia; Huang, Yukun; Heiliger, Lars; Marinov, Zdravko; Stiefelhagen, Rainer; Egger, Jan; Kleesiek, Jens; Sibille, Ludovic; Xiang, Lei; Bendazzoli, Simone; Astaraki, Mehdi; Ingrisch, Michael; Cyran, Clemens C.; Küstner, Thomas

doi:10.1038/s42256-024-00912-9

Analysis
Published: 30 October 2024

Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging

Nature Machine Intelligence volume 6, pages 1396–1405 (2024)Cite this article

3278 Accesses
10 Citations
126 Altmetric
Metrics details

Subjects

A preprint version of the article is available at Research Square.

Abstract

Automated detection of tumour lesions on positron emission tomography–computed tomography (PET/CT) image data is a clinically relevant but highly challenging task. Progress in this field has been hampered in the past owing to the lack of publicly available annotated data and limited availability of platforms for inter-institutional collaboration. Here we describe the results of the autoPET challenge, a biomedical image analysis challenge aimed to motivate research in the field of automated PET/CT image analysis. The challenge task was the automated segmentation of metabolically active tumour lesions on whole-body ¹⁸F-fluorodeoxyglucose PET/CT. Challenge participants had access to a large publicly available annotated PET/CT dataset for algorithm training. All algorithms submitted to the final challenge phase were based on deep learning methods, mostly using three-dimensional U-Net architectures. Submitted algorithms were evaluated on a private test set composed of 150 PET/CT studies from two institutions. An ensemble model of the highest-ranking algorithms achieved favourable performance compared with individual algorithms. Algorithm performance was dependent on the quality and quantity of data and on algorithm design choices, such as tailored post-processing of predicted segmentations. Future iterations of this challenge will focus on generalization and clinical translation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Challenge organization and participation.**

**Fig. 2: Overview of technical details of the final leaderboard submissions.**

**Fig. 3: Overview of algorithm performance.**

**Fig. 4: Qualitative examples of automated lesion segmentation.**

Fig. 5: Impact of data source (UKT versus LMU) and number of training samples on the performance of the baseline nn-UNet model in terms of Dice score, false-negative (FN) volume and false-positive (FP) volume.

**Fig. 6: Challenge data and evaluation metrics.**

A whole-body FDG-PET/CT Dataset with manually annotated Tumor Lesions

Article Open access 04 October 2022

Low-count whole-body PET with deep learning in a multicenter and externally validated study

Article Open access 23 August 2021

High-resolution positron emission microscopy of patient-derived tumor organoids

Article Open access 07 October 2021

Data availability

The training data used in this challenge, including manual annotations of tumour lesions, are publicly available on TCIA¹¹ (https://www.cancerimagingarchive.net/collection/fdg-pet-ct-lesions/, https://doi.org/10.7937/gkr0-xv29). Test data cannot be shared publicly as they will be part of the private test dataset of future iterations of the autoPET challenge. Analyses using the private test data can be performed if they do not interfere with the challenge execution (after completion of the autoPET challenge series). Data can be requested by contacting the challenge organizing team as listed on https://autopet.grand-challenge.org/Organization/.

Code availability

All code used for data processing and performance analysis as part of this challenge, including a trained baseline model, is publicly available via GitHub at https://github.com/lab-midas/autoPET (ref. ³⁷) under the MIT licence.

References

Antonelli, M. et al. The Medical Segmentation Decathlon. Nat. Commun. 13, 4128 (2022).
Article Google Scholar
Menze, B. H. et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).
Article Google Scholar
Halabi, S. S. et al. The RSNA Pediatric Bone Age Machine Learning Challenge. Radiology 290, 498–503 (2019).
Article Google Scholar
Weisman, A. J. et al. Comparison of 11 automated PET segmentation methods in lymphoma. Phys. Med. Biol. 65, 235019 (2020).
Article Google Scholar
Groendahl, A. R. et al. A comparison of fully automatic segmentation of tumors and involved nodes in PET/CT of head and neck cancers. Phys. Med. Biol. https://doi.org/10.1088/1361-6560/abe553 (2021).
Article Google Scholar
Capobianco, N. et al. Deep-learning ¹⁸F-FDG uptake classification enables total metabolic tumor volume estimation in diffuse large B-cell lymphoma. J. Nucl. Med. 62, 30–36 (2021).
Article Google Scholar
Oreiller, V. et al. Head and neck tumor segmentation in PET/CT: the HECKTOR challenge. Med. Image Anal. 77, 102336 (2022).
Article Google Scholar
Chardin, D. et al. Baseline metabolic tumor volume as a strong predictive and prognostic biomarker in patients with non-small cell lung cancer treated with PD1 inhibitors: a prospective study. J. Immunother. Cancer 8, e000645 (2020).
Article Google Scholar
Bradley, J. et al. Impact of FDG-PET on radiation therapy volume delineation in non-small-cell lung cancer. Int. J. Radiat. Oncol. Biol. Phys. 59, 78–86 (2004).
Article Google Scholar
Unterrainer, M. et al. Recent advances of PET imaging in clinical radiation oncology. Radiat. Oncol. 15, 88 (2020).
Article Google Scholar
Gatidis, S. & Kuestner, T. FDG-PET-CT-Lesions—A whole-body FDG-PET/CT dataset with manually annotated tumor lesions. TCIA https://www.cancerimagingarchive.net/collection/fdg-pet-ct-lesions/ (2022).
Gatidis, S. et al. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions. Sci. Data 9, 601 (2022).
Article Google Scholar
Maier-Hein, L. et al. BIAS: transparent reporting of biomedical image analysis challenges. Med. Image Anal. 66, 101796 (2020).
Article Google Scholar
Ma, J. et al. Loss odyssey in medical image segmentation. Med. Image Anal. 71, 102035 (2021).
Article Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170802002L (2017).
Wu, Z., Shen, C. & van den Hengel, A. Bridging category-level and instance-level semantic image segmentation. Preprint at https://ui.adsabs.harvard.edu/abs/2016arXiv160506885W (2016).
Berman, M., Rannen Triki, A. & Blaschko, M. B. The Lovász-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170508790B (2017).
Sadegh Mohseni Salehi, S., Erdogmus, D. & Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170605721S (2017).
Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Article Google Scholar
Cardoso, M. J. et al. MONAI: an open-source framework for deep learning in healthcare. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221102701C (2022).
Barrington, S. F. & Meignan, M. Time to prepare for risk adaptation in lymphoma by standardizing measurement of metabolic tumor burden. J. Nucl. Med. 60, 1096–1102 (2019).
Article Google Scholar
Erickson, N. et al. AutoGluon-Tabular: robust and accurate AutoML for structured data. Preprint at https://ui.adsabs.harvard.edu/abs/2003.06505 (2020).
Dagogo-Jack, I. & Shaw, A. T. Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 15, 81–94 (2018).
Article Google Scholar
Gatidis, S., Küstner, T., Ingrisch, M., Fabritius, M. & Cyran, C. Automated lesion segmentation in whole-body FDG-PET/CT. Zenodo https://zenodo.org/records/7845727 (2022).
Rosenfeld, A. & Pfaltz, J. L. Sequential operations in digital picture processing. J. ACM 13, 471–494 (1966).
Article Google Scholar
Maier-Hein, L. et al. Metrics reloaded: recommendations for image analysis validation. Nat. Methods 21, 195–212 (2024).
Article Google Scholar
Ye, J. et al. Exploring Vanilla U-Net for lesion segmentation from whole-body FDG-PET/CT scans. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221007490Y (2022).
Peng, Y., Kim, J., Feng, D. & Bi, L. Automatic tumor segmentation via false positive reduction network for whole-body multi-modal PET/CT images. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220907705P (2022).
Ma, J. & Wang, B. nnU-Net for automated lesion segmentation in whole-body FDG-PET/CT. GitHub https://github.com/JunMa11/PETCTSeg/blob/main/technical_report.pdf (2022).
Zhang, J., Huang, Y., Zhang, Z. & Shi, Y. Whole-body lesion segmentation in ¹⁸F-FDG PET/CT. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220907851Z (2022).
Heiliger, L. et al. AutoPET challenge: combining nn-Unet with Swin UNETR augmented by maximum intensity projection classifier. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220901112H (2022).
Sibille, L., Zhan, X. & Xiang, L. Whole-body tumor segmentation of ¹⁸F-FDG PET/CT using a cascaded and ensembled convolutional neural networks. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221008068S (2022).
Bendazzoli, S. & Astaraki, M. PriorNet: lesion segmentation in PET-CT including prior tumor appearance information. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221002203B (2022).
Wiesenfarth, M. et al. Methods and open-source toolkit for analyzing and visualizing challenge results. Sci. Rep. 11, 2369 (2021).
Article Google Scholar
Ross, T. et al. Beyond rankings: learning (more) from algorithm validation. Med. Image Anal. 86, 102765 (2023).
Article Google Scholar
Sundar, L. K. S. et al. Fully automated, semantic segmentation of whole-body ¹⁸F-FDG PET/CT images based on data-centric artificial intelligence. J. Nucl. Med. 63, 1941–1948 (2022).
Article Google Scholar
Gatidis, S. & Küstner, T. AutoPET Challenge 2022 code repository. Zenodo https://doi.org/10.5281/zenodo.13119561 (2020).

Download references

Acknowledgements

This project was partly supported by the Leuze Foundation, Owen/Teck, Germany (S.G.). This project was conducted under Germany’s Excellence Strategy–EXC-Number 2064/1–390727645 and EXC 2180/1-390900677 (S.G. and T.K.). This study is part of the doctoral thesis of Alexandra Kubičkova.

Author information

Authors and Affiliations

Empirical Inference Department, Max-Planck Institute for Intelligent Systems, Tübingen, Germany
Sergios Gatidis
Department of Radiology, University Hospital Tübingen, Tübingen, Germany
Sergios Gatidis, Marcel Früh, Konstantin Nikolaou & Thomas Küstner
Department of Radiology, Stanford University, Stanford, CA, USA
Sergios Gatidis
Department of Radiology, University Hospital, LMU Munich, Munich, Germany
Matthias P. Fabritius, Sijing Gu, Michael Ingrisch & Clemens C. Cyran
Cluster of Excellence iFIT (EXC 2180) ‘Image-Guided and Functionally Instructed Tumor Therapies’, Tübingen, Germany
Konstantin Nikolaou & Christian La Fougère
Department of Nuclear Medicine and Clinical Molecular Imaging, University Hospital Tübingen, Tübingen, Germany
Christian La Fougère
German Cancer Consortium, Partner Site Tübingen, Tübingen, Germany
Christian La Fougère
Shanghai AI Lab, Shanghai, China
Jin Ye & Junjun He
School of Computer Science, University of Sydney, Sydney, New South Wales, Australia
Yige Peng & Lei Bi
Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada
Jun Ma & Bo Wang
Peter Munk Cardiac Centre, University Health Network, Toronto, Ontario, Canada
Jun Ma & Bo Wang
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada
Bo Wang
Vector Institute, Toronto, Ontario, Canada
Bo Wang
United Imaging Healthcare, Shanghai, China
Jia Zhang & Yukun Huang
Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany
Lars Heiliger, Jan Egger & Jens Kleesiek
Cancer Research Center Cologne Essen, University Medicine Essen, Essen, Germany
Lars Heiliger & Jens Kleesiek
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, Germany
Zdravko Marinov & Rainer Stiefelhagen
HIDSS4Health – Helmholtz Information and Data Science School for Health, Karlsruhe/Heidelberg, Germany
Zdravko Marinov
German Cancer Consortium, Partner Site Essen, Essen, Germany
Jens Kleesiek
University Duisburg–Essen, Essen, Germany
Jens Kleesiek
Subtle Medical, Menlo Park, CA, USA
Ludovic Sibille & Lei Xiang
Department of Biomedical Engineering and Health Systems, KTH Royal Institute of Technology, Stockholm, Sweden
Simone Bendazzoli & Mehdi Astaraki
Department of Clinical Science, Intervention and Technology, Karolinska Institutet, Stockholm, Sweden
Simone Bendazzoli & Mehdi Astaraki
Munich Center for Machine Learning, Munich, Germany
Michael Ingrisch

Authors

Sergios Gatidis
View author publications
Search author on:PubMed Google Scholar
Marcel Früh
View author publications
Search author on:PubMed Google Scholar
Matthias P. Fabritius
View author publications
Search author on:PubMed Google Scholar
Sijing Gu
View author publications
Search author on:PubMed Google Scholar
Konstantin Nikolaou
View author publications
Search author on:PubMed Google Scholar
Christian La Fougère
View author publications
Search author on:PubMed Google Scholar
Jin Ye
View author publications
Search author on:PubMed Google Scholar
Junjun He
View author publications
Search author on:PubMed Google Scholar
Yige Peng
View author publications
Search author on:PubMed Google Scholar
Lei Bi
View author publications
Search author on:PubMed Google Scholar
Jun Ma
View author publications
Search author on:PubMed Google Scholar
Bo Wang
View author publications
Search author on:PubMed Google Scholar
Jia Zhang
View author publications
Search author on:PubMed Google Scholar
Yukun Huang
View author publications
Search author on:PubMed Google Scholar
Lars Heiliger
View author publications
Search author on:PubMed Google Scholar
Zdravko Marinov
View author publications
Search author on:PubMed Google Scholar
Rainer Stiefelhagen
View author publications
Search author on:PubMed Google Scholar
Jan Egger
View author publications
Search author on:PubMed Google Scholar
Jens Kleesiek
View author publications
Search author on:PubMed Google Scholar
Ludovic Sibille
View author publications
Search author on:PubMed Google Scholar
Lei Xiang
View author publications
Search author on:PubMed Google Scholar
Simone Bendazzoli
View author publications
Search author on:PubMed Google Scholar
Mehdi Astaraki
View author publications
Search author on:PubMed Google Scholar
Michael Ingrisch
View author publications
Search author on:PubMed Google Scholar
Clemens C. Cyran
View author publications
Search author on:PubMed Google Scholar
Thomas Küstner
View author publications
Search author on:PubMed Google Scholar

Contributions

S. Gatidis, M.F., S. Gu, M.P.F., M.I., C.C.C. and T.K.: organization of the challenge, preparation of training and test data, contribution of software, data analysis, and drafting of the paper. K.N. and C.L.F.: scientific and clinical consultation during challenge preparation, data analysis, and critical revision of the paper. J.Y., J.H., Y.P., L.B., J.M., B.W., J.Z., Y.H., L.H., Z.M., J.K., R.S., J.E., L.S., L.X., S.B. and M.A.: members of the best-performing participating teams, contribution of software, participation in drafting and critical revision of the paper.

Corresponding author

Correspondence to Sergios Gatidis.

Ethics declarations

Competing interests

The authors declare no competing financial or non-financial interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Masatoshi Hotta and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quantitative comparison of training data (top) and test data (bottom).

Training data were drawn from UKT (blue), test data partially from UKT (blue) and partially from LMU (green). As shown in the histogram plots, the distribution of age, mean Standardized Uptake Value (SUV) of lesions and Metabolic Tumor Volume (MTV) of lesions was similar between training and test data and between the two data sources (UKT and LMU).

Extended Data Fig. 2 Example of a complete dataset.

Computed tomography (CT) volumes, CT-based organ segmentation masks and Fluorodeoxyglucose Positron Emission Tomography (FDG-PET/CT) volumes (or subsets of these volumes) were used as input data. Metabolically active tumor lesions were segmented manually (colored in blue) and automatically (colored in red). This particular dataset is drawn from the UKT test set and shows excellent alignment between manual lesion segmentation and automated lesion segmentation using an nnU-Net baseline with CT and FDG-PET as input.

Extended Data Fig. 3 Results of ranking robustness analysis using a bootstrap approach.

The median rank order is largely concordant with the final challenge result indicating overall robustness of the challenge outcome regarding sampling from the challenge dataset.

Extended Data Fig. 4 Association of PET/CT image properties with evaluation metrics.

Left column: Mixed effects analysis revealed a positive association between the Dice score and the mean lesion volume (p = 0.0002) and the Dice score and the lesion count (p = 0.008). Middle column: Mixed effects analysis revealed a positive association between the BMI and the false positive volume (p = 0.006) and a negative association between the lesion count and the false positive volume (p = 0.008). Right Column: Mixed effect analysis revealed a negative association between mean lesion volume and false negative volume (p = 0.01). These findings indicate that the detection of small lesions was challenging for submitted algorithms and that patient-related factors such as BMI may have an impact on overall algorithm performance. Mixed effects analysis was performed in Python using the Statsmodels module (Version 0.14.0) based on a two-tailed Wald test. P-values are corrected for multiple testing (Bonferroni correction).

Extended Data Fig. 5 Typical PET image examples of patients with high BMI (left) and low BMI (right).

Arrows I and II demonstrate metabolically active tumor lesions (blue = manual segmentation, red = automated segmentation using the nnU-Net basline). Arrow II demonstrates typical false-positive automated segmentation of the myocardium.

Extended Data Fig. 6 Typical PET image examples of studies with small (left) and high (right) average lesion volume (ground truth segmentation in blue, automated segmentation in red).

Arrows (I) indicate small lesions that were missed by automated segmentation using the nnU-Net baseline. Overall, smaller lesions tended to be missed by submitted algorithms more often than larger lesions.

Extended Data Fig. 7 Algorithm performance of final leaderboard submissions with respect to the metrics (from top to bottom) overlapping Dice score, normalized surface dice, panoptic quality metric and per lesion sensitivity.

Overall, the team ranking with respect to these scores varied but was similar to the challenge leaderboard ranking. Importantly, the ensemble model again performs favorably for these metrics compared to individual algorithms.

Extended Data Fig. 8 Per lesion sensitivity depending on lesion volume and lesion tracer uptake (mean SUV).

Lesions were grouped by size (left) or mean SUV (right) into inter-decile ranges and the respective sensitivities were computed across all final leaderboard algorithms. Smaller lesions and lesions with very low tracer uptake were more often missed compared to larger lesions and lesions with higher tracer uptake. A total number of 680 distinct lesions were present (68 lesions per decile interval). Box plots represent the mean (horizontal line) and inter-quartile range (IQR) (boxes), as well as the range (whiskers) of datapoints; outliers (outside 1.5 times IQR of the upper or lower quartiles) are represented as diamonds.

Extended Data Fig. 9 Inter-reader Dice score versus best performing algorithm (ensemble) Dice score on a subset of challenge samples.

Overall, quantitative metrics suggest similar performance of the second reader compared to the ensemble model. FP volume = false positive volume, FN volume = false negative volume, NSD = Normalized Surface Dice.

Extended Data Table 1 Technical details of final phase submissions

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gatidis, S., Früh, M., Fabritius, M.P. et al. Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging. Nat Mach Intell 6, 1396–1405 (2024). https://doi.org/10.1038/s42256-024-00912-9

Download citation

Received: 10 February 2023
Accepted: 17 September 2024
Published: 30 October 2024
Issue date: November 2024
DOI: https://doi.org/10.1038/s42256-024-00912-9

This article is cited by

Artificial intelligence-enabled opportunistic identification of immune checkpoint inhibitor-related adverse events using [18F]FDG PET/CT
- Clemens P. Spielvogel
- Aleksa Lazarevic
- Raffaella Calabretta
European Journal of Nuclear Medicine and Molecular Imaging (2025)
Artificial intelligence & nuclear medicine: an emerging partnership
- Luigi Manco
- Klarisa Elena Szilagyi
- Luca Urso
Clinical and Translational Imaging (2025)
Optimizing MR-based attenuation correction in hybrid PET/MR using deep learning: validation with a flatbed insert and consistent patient positioning
- Hanzhong Wang
- Yue Wang
- Biao Li
European Journal of Nuclear Medicine and Molecular Imaging (2025)