Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging

A preprint version of the article is available at Research Square.

Abstract

Automated detection of tumour lesions on positron emission tomography–computed tomography (PET/CT) image data is a clinically relevant but highly challenging task. Progress in this field has been hampered in the past owing to the lack of publicly available annotated data and limited availability of platforms for inter-institutional collaboration. Here we describe the results of the autoPET challenge, a biomedical image analysis challenge aimed to motivate research in the field of automated PET/CT image analysis. The challenge task was the automated segmentation of metabolically active tumour lesions on whole-body 18F-fluorodeoxyglucose PET/CT. Challenge participants had access to a large publicly available annotated PET/CT dataset for algorithm training. All algorithms submitted to the final challenge phase were based on deep learning methods, mostly using three-dimensional U-Net architectures. Submitted algorithms were evaluated on a private test set composed of 150 PET/CT studies from two institutions. An ensemble model of the highest-ranking algorithms achieved favourable performance compared with individual algorithms. Algorithm performance was dependent on the quality and quantity of data and on algorithm design choices, such as tailored post-processing of predicted segmentations. Future iterations of this challenge will focus on generalization and clinical translation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Challenge organization and participation.
Fig. 2: Overview of technical details of the final leaderboard submissions.
Fig. 3: Overview of algorithm performance.
Fig. 4: Qualitative examples of automated lesion segmentation.
Fig. 5: Impact of data source (UKT versus LMU) and number of training samples on the performance of the baseline nn-UNet model in terms of Dice score, false-negative (FN) volume and false-positive (FP) volume.
Fig. 6: Challenge data and evaluation metrics.

Similar content being viewed by others

Data availability

The training data used in this challenge, including manual annotations of tumour lesions, are publicly available on TCIA11 (https://www.cancerimagingarchive.net/collection/fdg-pet-ct-lesions/, https://doi.org/10.7937/gkr0-xv29). Test data cannot be shared publicly as they will be part of the private test dataset of future iterations of the autoPET challenge. Analyses using the private test data can be performed if they do not interfere with the challenge execution (after completion of the autoPET challenge series). Data can be requested by contacting the challenge organizing team as listed on https://autopet.grand-challenge.org/Organization/.

Code availability

All code used for data processing and performance analysis as part of this challenge, including a trained baseline model, is publicly available via GitHub at https://github.com/lab-midas/autoPET (ref. 37) under the MIT licence.

References

  1. Antonelli, M. et al. The Medical Segmentation Decathlon. Nat. Commun. 13, 4128 (2022).

    Article  Google Scholar 

  2. Menze, B. H. et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 34, 1993–2024 (2015).

    Article  Google Scholar 

  3. Halabi, S. S. et al. The RSNA Pediatric Bone Age Machine Learning Challenge. Radiology 290, 498–503 (2019).

    Article  Google Scholar 

  4. Weisman, A. J. et al. Comparison of 11 automated PET segmentation methods in lymphoma. Phys. Med. Biol. 65, 235019 (2020).

    Article  Google Scholar 

  5. Groendahl, A. R. et al. A comparison of fully automatic segmentation of tumors and involved nodes in PET/CT of head and neck cancers. Phys. Med. Biol. https://doi.org/10.1088/1361-6560/abe553 (2021).

    Article  Google Scholar 

  6. Capobianco, N. et al. Deep-learning 18F-FDG uptake classification enables total metabolic tumor volume estimation in diffuse large B-cell lymphoma. J. Nucl. Med. 62, 30–36 (2021).

    Article  Google Scholar 

  7. Oreiller, V. et al. Head and neck tumor segmentation in PET/CT: the HECKTOR challenge. Med. Image Anal. 77, 102336 (2022).

    Article  Google Scholar 

  8. Chardin, D. et al. Baseline metabolic tumor volume as a strong predictive and prognostic biomarker in patients with non-small cell lung cancer treated with PD1 inhibitors: a prospective study. J. Immunother. Cancer 8, e000645 (2020).

    Article  Google Scholar 

  9. Bradley, J. et al. Impact of FDG-PET on radiation therapy volume delineation in non-small-cell lung cancer. Int. J. Radiat. Oncol. Biol. Phys. 59, 78–86 (2004).

    Article  Google Scholar 

  10. Unterrainer, M. et al. Recent advances of PET imaging in clinical radiation oncology. Radiat. Oncol. 15, 88 (2020).

    Article  Google Scholar 

  11. Gatidis, S. & Kuestner, T. FDG-PET-CT-Lesions—A whole-body FDG-PET/CT dataset with manually annotated tumor lesions. TCIA https://www.cancerimagingarchive.net/collection/fdg-pet-ct-lesions/ (2022).

  12. Gatidis, S. et al. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions. Sci. Data 9, 601 (2022).

    Article  Google Scholar 

  13. Maier-Hein, L. et al. BIAS: transparent reporting of biomedical image analysis challenges. Med. Image Anal. 66, 101796 (2020).

    Article  Google Scholar 

  14. Ma, J. et al. Loss odyssey in medical image segmentation. Med. Image Anal. 71, 102035 (2021).

    Article  Google Scholar 

  15. Lin, T.-Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170802002L (2017).

  16. Wu, Z., Shen, C. & van den Hengel, A. Bridging category-level and instance-level semantic image segmentation. Preprint at https://ui.adsabs.harvard.edu/abs/2016arXiv160506885W (2016).

  17. Berman, M., Rannen Triki, A. & Blaschko, M. B. The Lovász-Softmax loss: a tractable surrogate for the optimization of the intersection-over-union measure in neural networks. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170508790B (2017).

  18. Sadegh Mohseni Salehi, S., Erdogmus, D. & Gholipour, A. Tversky loss function for image segmentation using 3D fully convolutional deep networks. Preprint at https://ui.adsabs.harvard.edu/abs/2017arXiv170605721S (2017).

  19. Isensee, F., Jaeger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).

    Article  Google Scholar 

  20. Cardoso, M. J. et al. MONAI: an open-source framework for deep learning in healthcare. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221102701C (2022).

  21. Barrington, S. F. & Meignan, M. Time to prepare for risk adaptation in lymphoma by standardizing measurement of metabolic tumor burden. J. Nucl. Med. 60, 1096–1102 (2019).

    Article  Google Scholar 

  22. Erickson, N. et al. AutoGluon-Tabular: robust and accurate AutoML for structured data. Preprint at https://ui.adsabs.harvard.edu/abs/2003.06505 (2020).

  23. Dagogo-Jack, I. & Shaw, A. T. Tumour heterogeneity and resistance to cancer therapies. Nat. Rev. Clin. Oncol. 15, 81–94 (2018).

    Article  Google Scholar 

  24. Gatidis, S., Küstner, T., Ingrisch, M., Fabritius, M. & Cyran, C. Automated lesion segmentation in whole-body FDG-PET/CT. Zenodo https://zenodo.org/records/7845727 (2022).

  25. Rosenfeld, A. & Pfaltz, J. L. Sequential operations in digital picture processing. J. ACM 13, 471–494 (1966).

    Article  Google Scholar 

  26. Maier-Hein, L. et al. Metrics reloaded: recommendations for image analysis validation. Nat. Methods 21, 195–212 (2024).

    Article  Google Scholar 

  27. Ye, J. et al. Exploring Vanilla U-Net for lesion segmentation from whole-body FDG-PET/CT scans. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221007490Y (2022).

  28. Peng, Y., Kim, J., Feng, D. & Bi, L. Automatic tumor segmentation via false positive reduction network for whole-body multi-modal PET/CT images. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220907705P (2022).

  29. Ma, J. & Wang, B. nnU-Net for automated lesion segmentation in whole-body FDG-PET/CT. GitHub https://github.com/JunMa11/PETCTSeg/blob/main/technical_report.pdf (2022).

  30. Zhang, J., Huang, Y., Zhang, Z. & Shi, Y. Whole-body lesion segmentation in 18F-FDG PET/CT. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220907851Z (2022).

  31. Heiliger, L. et al. AutoPET challenge: combining nn-Unet with Swin UNETR augmented by maximum intensity projection classifier. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv220901112H (2022).

  32. Sibille, L., Zhan, X. & Xiang, L. Whole-body tumor segmentation of 18F-FDG PET/CT using a cascaded and ensembled convolutional neural networks. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221008068S (2022).

  33. Bendazzoli, S. & Astaraki, M. PriorNet: lesion segmentation in PET-CT including prior tumor appearance information. Preprint at https://ui.adsabs.harvard.edu/abs/2022arXiv221002203B (2022).

  34. Wiesenfarth, M. et al. Methods and open-source toolkit for analyzing and visualizing challenge results. Sci. Rep. 11, 2369 (2021).

    Article  Google Scholar 

  35. Ross, T. et al. Beyond rankings: learning (more) from algorithm validation. Med. Image Anal. 86, 102765 (2023).

    Article  Google Scholar 

  36. Sundar, L. K. S. et al. Fully automated, semantic segmentation of whole-body 18F-FDG PET/CT images based on data-centric artificial intelligence. J. Nucl. Med. 63, 1941–1948 (2022).

    Article  Google Scholar 

  37. Gatidis, S. & Küstner, T. AutoPET Challenge 2022 code repository. Zenodo https://doi.org/10.5281/zenodo.13119561 (2020).

Download references

Acknowledgements

This project was partly supported by the Leuze Foundation, Owen/Teck, Germany (S.G.). This project was conducted under Germany’s Excellence Strategy–EXC-Number 2064/1–390727645 and EXC 2180/1-390900677 (S.G. and T.K.). This study is part of the doctoral thesis of Alexandra Kubičkova.

Author information

Authors and Affiliations

Authors

Contributions

S. Gatidis, M.F., S. Gu, M.P.F., M.I., C.C.C. and T.K.: organization of the challenge, preparation of training and test data, contribution of software, data analysis, and drafting of the paper. K.N. and C.L.F.: scientific and clinical consultation during challenge preparation, data analysis, and critical revision of the paper. J.Y., J.H., Y.P., L.B., J.M., B.W., J.Z., Y.H., L.H., Z.M., J.K., R.S., J.E., L.S., L.X., S.B. and M.A.: members of the best-performing participating teams, contribution of software, participation in drafting and critical revision of the paper.

Corresponding author

Correspondence to Sergios Gatidis.

Ethics declarations

Competing interests

The authors declare no competing financial or non-financial interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Masatoshi Hotta and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Quantitative comparison of training data (top) and test data (bottom).

Training data were drawn from UKT (blue), test data partially from UKT (blue) and partially from LMU (green). As shown in the histogram plots, the distribution of age, mean Standardized Uptake Value (SUV) of lesions and Metabolic Tumor Volume (MTV) of lesions was similar between training and test data and between the two data sources (UKT and LMU).

Extended Data Fig. 2 Example of a complete dataset.

Computed tomography (CT) volumes, CT-based organ segmentation masks and Fluorodeoxyglucose Positron Emission Tomography (FDG-PET/CT) volumes (or subsets of these volumes) were used as input data. Metabolically active tumor lesions were segmented manually (colored in blue) and automatically (colored in red). This particular dataset is drawn from the UKT test set and shows excellent alignment between manual lesion segmentation and automated lesion segmentation using an nnU-Net baseline with CT and FDG-PET as input.

Extended Data Fig. 3 Results of ranking robustness analysis using a bootstrap approach.

The median rank order is largely concordant with the final challenge result indicating overall robustness of the challenge outcome regarding sampling from the challenge dataset.

Extended Data Fig. 4 Association of PET/CT image properties with evaluation metrics.

Left column: Mixed effects analysis revealed a positive association between the Dice score and the mean lesion volume (p = 0.0002) and the Dice score and the lesion count (p = 0.008). Middle column: Mixed effects analysis revealed a positive association between the BMI and the false positive volume (p = 0.006) and a negative association between the lesion count and the false positive volume (p = 0.008). Right Column: Mixed effect analysis revealed a negative association between mean lesion volume and false negative volume (p = 0.01). These findings indicate that the detection of small lesions was challenging for submitted algorithms and that patient-related factors such as BMI may have an impact on overall algorithm performance. Mixed effects analysis was performed in Python using the Statsmodels module (Version 0.14.0) based on a two-tailed Wald test. P-values are corrected for multiple testing (Bonferroni correction).

Extended Data Fig. 5 Typical PET image examples of patients with high BMI (left) and low BMI (right).

Arrows I and II demonstrate metabolically active tumor lesions (blue = manual segmentation, red = automated segmentation using the nnU-Net basline). Arrow II demonstrates typical false-positive automated segmentation of the myocardium.

Extended Data Fig. 6 Typical PET image examples of studies with small (left) and high (right) average lesion volume (ground truth segmentation in blue, automated segmentation in red).

Arrows (I) indicate small lesions that were missed by automated segmentation using the nnU-Net baseline. Overall, smaller lesions tended to be missed by submitted algorithms more often than larger lesions.

Extended Data Fig. 7 Algorithm performance of final leaderboard submissions with respect to the metrics (from top to bottom) overlapping Dice score, normalized surface dice, panoptic quality metric and per lesion sensitivity.

Overall, the team ranking with respect to these scores varied but was similar to the challenge leaderboard ranking. Importantly, the ensemble model again performs favorably for these metrics compared to individual algorithms.

Extended Data Fig. 8 Per lesion sensitivity depending on lesion volume and lesion tracer uptake (mean SUV).

Lesions were grouped by size (left) or mean SUV (right) into inter-decile ranges and the respective sensitivities were computed across all final leaderboard algorithms. Smaller lesions and lesions with very low tracer uptake were more often missed compared to larger lesions and lesions with higher tracer uptake. A total number of 680 distinct lesions were present (68 lesions per decile interval). Box plots represent the mean (horizontal line) and inter-quartile range (IQR) (boxes), as well as the range (whiskers) of datapoints; outliers (outside 1.5 times IQR of the upper or lower quartiles) are represented as diamonds.

Extended Data Fig. 9 Inter-reader Dice score versus best performing algorithm (ensemble) Dice score on a subset of challenge samples.

Overall, quantitative metrics suggest similar performance of the second reader compared to the ensemble model. FP volume = false positive volume, FN volume = false negative volume, NSD = Normalized Surface Dice.

Extended Data Table 1 Technical details of final phase submissions

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gatidis, S., Früh, M., Fabritius, M.P. et al. Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging. Nat Mach Intell 6, 1396–1405 (2024). https://doi.org/10.1038/s42256-024-00912-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-024-00912-9

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer