Background & Summary

Brain metastases (BM) are the most common type of neoplasia affecting the central nervous system, with 20–40% of cancer patients developing BM at some point during their disease progression1,2. BM are among the main complications associated with lung cancer, breast cancer and melanoma3,4,5. The advent of new therapeutic approaches is believed to have contributed to an increase in metastatic brain disease cases6,7. This is likely due to extended patient survival following diagnosis, as well as advancements in imaging technology enabling earlier and more precise detection8,9.

Management of BMs involves various therapeutic modalities: surgery, chemotherapy, immunotherapy, targeted therapies and radiotherapy10. A key challenge with radiotherapy (RT) is achieving an optimal balance between maximizing tumor control and minimizing adverse side effects. Presently, two methods of external beam radiation therapy for BMs are primarily used: whole-brain radiotherapy (WBRT) and stereotactic radiosurgery (SRS)11. WBRT is often recommended for patients with more than three BMs, as it targets the entire brain and can eliminate undetectable metastases12,13. The primary disadvantage of WBRT is that it can cause radiation-induced damage across the brain. On the other hand, SRS is typically used for patients with one to three BMs and a favorable prognosis14. In SRS, highly precise, localized radiation is applied to the BMs in a single session to optimize local tumor control while minimizing exposure to the surrounding healthy brain tissue.

Magnetic resonance imaging (MRI) is currently the modality of choice for the detection, treatment planning, and monitoring of the outcome of BM therapy. Due to its superior soft-tissue contrast and improved visualization of intracranial structures compared to other modalities, MRI is an indispensable modality in neuro-oncologic imaging15. Brain MRI protocols in neuro-oncology typically include T1-weighted (T1W), fluid-attenuated inversion recovery (FLAIR), T2-weighted (T2W) and post-contrast T1-weighted (T1-Contrast) sequences16,17. Contrast enhanced T1-weighted sequences are considered the gold standard for the detection and characterization of BMs, providing detailed information on lesion size, morphology and adjacent brain tissue17. Additionally, T2W and FLAIR images are commonly utilized to characterize BMs, as they are effective in highlighting perilesional edema typically surrounding BM lesions18.

Intracranial metastatic disease can exhibit diverse imaging patterns, with brain metastases potentially developing in the brain parenchyma, ventricles, leptomeninges, dura, bone, and extracranial soft tissues19. Precise identification of lesion number and location is crucial for surgery or radiotherapy planning. Traditionally, BMs are detected manually by a radiologist and contoured by a radiation oncologist using a radiation therapy planning software20. Automating BMs detection can serve as a valuable tool to assist clinicians in evaluating images, aiding in treatment planning for SRS, and assessing treatment outcomes during patient follow-up. Furthermore, this automation can effectively minimize human errors and accomplish tasks with minimal manual intervention. Recent advancements in deep learning have demonstrated the significant potential AI-driven methods in medical image analysis, particularly in segmentation, cancer detection and classification21,22 while some studies have reported models for BM segmentation in MRI data22,23,24,25. In addition to the automatic segmentation of BMs, morphological MRI-based features have demonstrated effectiveness in the context of other brain tumors; however, their applicability to BMs identification remains largely underexplored. Radiomic analyses have the potential to enhance the standard of care for patients with BM by extracting high-dimensional quantitative evidence that reflect imaging phenotypes from tumor segmentation, thereby supporting clinical decision-making and personalized therapy of brain cancer.

Although AI-based approaches have demonstrated encouraging outcomes in diagnosing and automatically segmenting BMs, a significant gap remains in their clinical integration and widespread adoption. A key factor contributing to this gap is their limited ability to generalize to real-world datasets. Many of these models have been developed using small, single-institution hospital datasets that lack diversity in patient demographics and imaging protocols, both of which are essential for broader clinical application26. Therefore, there is an urgent need to develop extensive, diverse and publicly accessible datasets to enhance the training of AI algorithms and to evaluate models accurately across a wide range of patient cases.

The availability of publicly accessible datasets for BMs is quite limited. The Cancer Imaging Archive27 remains the most prominent repository for cancer imaging data – it offers data for various human cancers, including brain cancers. Currently, dedicated publicly available BM datasets28,29,30, as well as the BraTS-METS 202323 dataset, are limited in that they only include pre-treatment segmentations and MRI data, capturing only a limited range of tumor subregions – typically the contrast-enhancing region of the tumor or the necrotic tumor area28,30. Moreover, existing studies provide minimal to no information about the therapeutic approaches BM patients undergone, including radiation, chemotherapy, or immunotherapy. Addressing these gaps is crucial for advancing AI-driven BM analysis and improving clinical decision-making.

In this contribution, we present a unique BM dataset of cancer patients in Cyprus, which includes standard MRI sequences (T1W, T2W, T1-Contrast, FLAIR; see Fig. 1), the radiotherapy plan (RTP) and computed tomography (CT) scans, and detailed annotations of tumor subregions – specifically labels for the enhancing tumor, the edema, the necrotic region. Additionally, we provide two supplementary Excel files containing clinical evidence: one file includes essential demographic, clinical, and treatment-related information, offering valuable insights for medical analysis and AI-driven research, while the other file provides quantitative imaging data for each patient case.

Fig. 1
figure 1

Representative MRI scans of a patient displaying the brain in four imaging modalities: (a) T1-weighted, (b) T2-weighted, (c) post-contrast T1-weighted (T1-Contrast), and (d) fluid-attenuated inversion recovery (FLAIR). (e) Tumor subregions are delineated according to the BraTS annotation standards, with necrotic core (red), peritumoral edema (green), and enhancing tumor (yellow). (f) The radiotherapy overlaid on a computed tomography scan of brain tumor, illustrating dose distribution and anatomical structures involved in treatment.

Each patient record includes pre-treatment tumor segmentations and MRI scans, as well as post-treatment follow-up imaging data. All patients had at least one follow-up, conducted at predetermined intervals of 6 weeks, 3 months, 6 months, 9 months and 12 months, enabling a comprehensive longitudinal assessment of treatment response and disease progression. Importantly, the inclusion of post-treatment imaging data addresses a core requirement of the BraTS-METS 2025 Lighthouse Challenge and enhances the dataset’s utility for developing and validating AI algorithms.

The present data resource is designed to support the training and validation of future machine learning and in silico models for BM diagnosis and therapeutic monitoring, with the ultimate goal of facilitating their translation and adoption in clinical practice. The multiple follow-up scans (at predefined time points) that are available in the present dataset further enables the investigation of longitudinal disease dynamics, treatment response, and temporal evolution of imaging biomarkers. This distinguishes our dataset from existing BM resources and provides a foundation for developing models that account for time-dependent changes in tumor biology.

Methods

Subject characteristic

The data collected in this study include the pretreatment and follow-up studies from 6 different diagnostic centers in Cyprus, where all patients were identified from a reference clinical oncology center, the Bank of Cyprus Oncology Centre (BoCOC) between 2019 and 2024. All recruited patients signed informed consent. Inclusion required a clinical or pathological diagnosis of BM and the availability of all four standard MRI sequences (i.e., T1W, T2W, T1-Contrast, and FLAIR). All scans were qualitatively inspected for image quality by an experienced radiologist, and those exhibiting significant motion-related blurring or ghosting artifacts were excluded. Additionally, quantitative consistency of the imaging data was assessed at the protocol level by reviewing voxel-related parameters, including slice thickness and slice spacing, extracted from DICOM headers and are summarized in Table 1. The dataset comprises of a total of 40 patients. The primary tumor origins among these patients were as follows: non-small cell lung cancer (65%), breast cancer (32.5%), and small-cell lung cancer (2.5%). In summary, the cohort of 40 patients included in this study exhibit a total of 65 BMs. The dataset consists of 40 patients, including 744 distinct MRI scans, along with 45 unique RTP and 45 CT scans. Notably, 5 patients required two separate RTP and CT due to the emergence of new brain metastases. All patients underwent stereotactic radiosurgery or stereotactic fractionated radiotherapy. Lesions measuring less than 3 millimeters in size were ignored and excluded from imaging analysis to ensure accurate and reliable assessments.

Table 1 Summary of the MRI parameters for T1W, T1-Contrast, T2W and FLAIR sequences.

Ethical approval

The present study was conducted according to the guidelines of the Declaration of Helsinki and has been approved by the Cyprus National Bioethics Committee (EEBK/EΠ/2021/72; February 24, 2022). All participants provided their written consent for data anonymization and approved open publication of the data.

Clinical data and anonymization

Clinical data were retrieved from the BoCOC electronic medical record system and include the following information: ethnicity, gender, age at diagnosis, the origin of the primary tumor, the treatment details related to chemotherapy and radiosurgery, the Karnofsky performance status, the neurocognitive status, presence and extent of extracranial disease, and the time to death or last documented entry in the electronic medical record as of December 2024. For each metastatic lesion, the recorded details included a unique identifier to differentiate it from other metastases within the same patient, its precise anatomical location within the brain (classified as hemispheric frontal, temporal, parietal, or occipital, brainstem, or cerebellar, including laterality), the date of initial detection on MRI, and the corresponding treatment details administered. Treatment-related data included the type of therapy dosages, number of SRT fractions, start dates, and where available, end dates of treatment.

All DICOM files were de-identified prior to deposition using the anonymization tools integrated in the Eclipse Radiotherapy Treatment Planning System (Varian Medical Systems). The de-identification process was executed on site at the BoCOC research data server immediately following receipt of the DICOM images from the PACS production system. This process automatically removed protected health information, private DICOM tags, and any fields containing sensitive or identifying data, ensuring participant confidentiality.

Image acquisition

Table 1 presents a summary of all imaging parameters for the MRI acquisitions. T1W contrast-enhanced images were acquired following intravenous administration of a paramagnetic contrast agent (Gadovist) at a dose of approximately 0.1 mmol/kg body weight. The 185 MRI studies were conducted under free-breathing conditions using either 3 T (n = 126; 68.5%) or 1.5 T (n = 58; 31.5%) MRI scanners. The MRI examinations were performed using scanners manufactured by Philips (n = 126; 68.5%), Siemens (n = 55; 29.9%) and Toshiba (n = 3; 1.6%).

Image preprocessing

The imaging data, initially available in the Digital Imaging and Communications in Medicine (DICOM) format, were converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format using dcm2niix31. This conversion ensured compatibility with the computational tools used while maintaining image quality, spatial orientation and resolution. Subsequently, all MRI data were transformed to the Brain Tumor Segmentation (BraTS) spatial framework using the BraTS-Preprocessor32, to adhere to the requirements of the BraTS challenge. This transformation process involved resampling and registering all images to the BraTS-defined space, thereby achieving uniformity in voxel dimensions and anatomical alignment. By adhering to the standards BraTS challenge, the dataset was optimized for interoperability, usability and reproducibility, thus, enabling seamless application in prospective BraTS challenges and promoting the development of brain tumor segmentation algorithms. Lastly, the RTP and CT data were aligned to the processed T1W images using non-rigid registration33, while ensuring precise spatial correspondence with the BraTS standards to facilitate further utilization of the data.

Segmentation procedure

The MRI scans, were processed using multiple segmentation methodologies, including the deep learning model DeepBraTumIA34, the nnU-Net35 model trained on the BraTS 2020 dataset, and the proprietary software Imalytics Preclinical36. Both DeepBraTumIA and nnU-Net models were used to automatically segment each MRI scan into three tumor subregions: enhancing tumor edema (T2/FLAIR hyperintensity) and tumor necrosis (see Fig. 1). In cases where patients had multiple metastases, DeepBraTumIA and nnU-Net were unable to detect all lesions. To address this limitation, a semi-automated segmentation method using Imalytics Preclinical36, was carried out. In this approach, tumors were initially manually delineated across several central slices of the 3D volume, followed by automatic detection across all slices using built-in algorithms of the software. The edema and necrotic tissue segmentations were performed manually by experienced professionals to ensure accuracy. In addition, the segmentation of white matter, grey matter, ventricle and cerebrospinal fluid components of normal brain tissue were automatically annotated using a geodesical information flows-based algorithm36 that is incorporated into the open-source toolkit NiftySeg. The segmentation outputs generated by DeepBraTumIA, nnU-Net and Imalytics Preclinical, were subsequently transformed into the BraTS space (Fig. 1e) to ensure accurate and reliable alignment with the corresponding MRIs. To guarantee consistency and clinical reliability, all segmentations were thoroughly reviewed and manually refined by a certified neuroradiologist (L.S.) with over 10 years of clinical experience. Open-source software ITK-SNAP 4.0.2 was used to visualize and verify the segmentation results during the revision process.

Radiomic-based analysis

A comprehensive radiomic-based analysis pipeline was conducted using all available MRI modalities and corresponding features were extracted from the brain region with tissue necrosis, the tumor region and the edema, respectively. Prior to the radiomics feature extraction, N4 bias field correction was utilized using the SimpleITK Python package (version 2.4.1) to surpass bias field degradation in the acquired MRIs, and to increase the number of reproducible features37,38. Subsequently, image slices were subjected to intensity clipping using the 0.1 and 99.9 percentile of the 1-dimensional histogram as thresholds to reduce outlier signal intensities (supported by function ClampImageFilter of the SimpleITK Python package). Image rescaling (0 to 1024) was then performed (rescale_intensity from Python package scikit-image, version 0.25) and pre-processed images were imported into the Pyradiomics Python package (version 3.1.0)39, deriving a total of 110 features from each MRI modality, segmentation region, and acquisition time. These features consist of 19 first-order statistics, 16 3D shape-based descriptors, and 75 textural characteristics: 24 from the gray level co-occurrence matrix, 16 from the Gray Level Run Length matrix, 5 from the neighboring Gray Tone Difference matrix, 16 Gray Level Size Zone features, and 14 from the Gray Level Dependence matrix. Since all images were isotropic interpolated during the segmentation phase, “interpolator” from the Pyradiomics was disabled. Discretization of the image gray level was set to a bin width of 5.

Data Record

The dataset has been deposited in Zenodo40. The present dataset contains medical imaging data (MRI scans: baseline and follow-up timepoints; RTP scans; CT scans) and two Excel spreadsheets that contain the clinical and demographic data (‘PROTEAS-Clinical_and_demographic_data.xlsx’) and the radiomics data (‘PROTEAS-MRI_radiomics_data.xlsx’) from all 40 patients included in this study. For each patient, raw pretreatment and follow-up scans are stored in DICOM file format, which preserves the original medical image coordinates. Tumor segmentations from baseline and follow-up scans are provided in NIfTI format after preprocessing, as described in the Image Preprocessing section. All patients are assigned a unique generic identifier; for example, the data for patient 01 are stored in folder ‘P01’, for patient 11 in ‘P11’, and so on. An exception applies to patients 04, 07 17, 20 and 23, for whom two directories are provided (e.g., in patient 23 we have folders: ‘P23a’ to ‘P23b’). As reported in the clinical data spreadsheet, these patients underwent two rounds of radiotherapy due to the presence of distinct BM located in separate regions of the brain. Each patient directory contains a folder entitled ‘BraTS’, which includes subfolders labeled ‘baseline,’ ‘fu1,’ ‘fu2’ and so forth, depending on the number of available follow-up scans. Each of these subfolders contains four NifTI files, provided in BraTS space, corresponding to the MRI sequences collected in this study: ‘t1.nii.gz’ (T1W), ‘t2.nii.gz’ (T2W), ‘t1c.nii.gz’ (T1-Contrast), and ‘fla.nii.gz’ (FLAIR). In addition, each patient directory contains: (i) a NifTI file named ‘P??_brain_mask.nii.gz’ with the healthy brain tissue regions, (ii) a NifTI file named ‘P??_CT.nii.gz’ with the CT data, (iii) a NifTI file named ‘P??_RTP.nii.gz’ file with the RTP data, and (iv) a folder entitled ‘tumor_segmentation’ that includes the tumor/edema/necrosis masks for the baseline scan (named ‘P??_tumor_mask_baseline.nii.gz’) and for each available follow-up respectively (e.g., the 2nd follow-up for patient 30 the filename is ‘P30_tumor_mask_fu2.nii.gz’). All brain and tumor segmentation files adhere to a consistent label convention: ‘Label 1’ represents the necrotic core, ‘Label 2’ represents the enhancing tumor region, ‘Label 3’ represents the edema, ‘Label 10’ represents the brain ventricles, ‘Label 30’ represents the white matter, ‘Label 40’ represents the grey matter, and ‘Label 50’ represents the cerebrospinal fluid region.

In summary, the present dataset provides pretreatment MRI scans and multiple follow-up time points, along with corresponding tumor segmentations and radiomic features for each patient. The dataset enables the investigation of disease progression, treatment response, and temporal changes in tumor subregions (necrotic core, enhancing tumor, edema). Researchers can also study the evolution of imaging biomarkers and radiomic signatures over time, therefore, making the present dataset a valuable resource for longitudinal analyses and predictive modeling in brain metastasis research.

Technical Validation

The study included only patients with a confirmed BM, with primary tumors pathologically and clinically confirmed by a radiation oncologist (M.T.) prior to inclusion. Data curation and evaluation of inclusion criteria were conducted by an MRI analysis professional (D.F.). The final dataset comprised only patients with high-quality T1W, T2W, T1-Contrast and FLAIR images, free from significant motion artifacts. Lastly, all automatic and semi-automatic segmentations performed in this study were manually corrected by an expert neuroradiologist (L.S.) after being carried out by medical image analysis experts (D.F., C.P.P).

Usage Notes

To process the medical images and segmentations provided in this work, it is strongly advised to use a medical imaging software that supports either the DICOM or NIfTI format that ensures consistent handling of the images’ physical space and orientation. We confirmed that all NIfTI files, including segmentations and images, were successfully loaded using FSLeyes (version 1.12.6), ITK-SNAP 4.2.2 and 3D Slicer (version 5.8.1) software, and DICOM files were successfully loaded using Horos (version 4.01) and 3D Slicer (version 5.8.1) software.