Abstract
Identifying predictive and resistance biomarkers remains one of the most relevant unmet needs in clinical cancer research. Artificial Intelligence (AI) represents a powerful tool to develop predictive algorithms tailored to individual patients. Thanks to its ability to process large quantities of heterogeneous, patient-level information, the AI-based approach is progressively fostering the growth of a data-driven paradigm to complement traditional, hypothesis-driven clinical research. However, the development of reliable AI models requires access to large, high-quality, and continuously updated datasets. Despite this necessity, no infrastructure currently exists to enable federated, multi-omic, standardized, prospective, and large-scale collection and analysis of real-world clinical and biological data in the context of lung cancer. We established the APOLLO11 consortium, a distributed, nationwide, updated Italian lung cancer network designed to build a decentralized, long-term, population-based, real-world data repository and a multilevel biobank, locally stored and centrally annotated. This strategy seeks to lay the foundation for the clinical implementation of data-driven research, ultimately advancing precision oncology.
Similar content being viewed by others
Introduction
With the advent of several innovative therapies, such as immunotherapy (IO), target therapies, and other next-generation treatments, the identification of predictive biomarkers has become the main goal of clinical and translational research in advanced lung cancer1,2,3,4. Indeed, the approval of different IO-based therapeutics and targeted treatments radically changed the treatment landscape of advanced Non-Small Cell Lung Cancer (aNSCLC) and advanced Small Cell Lung Cancer (aSCLC) patients, significantly prolonging the overall survival (OS) and also inducing a long-term remission for a quote of patients with metastatic disease5,6,7,8,9,10,11,12,13,14. However, around half of patients do not benefit from these novel therapies, either due to primary refractoriness or due to secondary resistance, which occurs after an initial benefit11,14,15,16. Current biomarkers are inadequate to guide treatment decisions in the context of novel therapeutics. As an example, the role of Programmed Death Ligand 1 (PD-L1), as evaluated by immunohistochemistry (IHC) on tumor specimens, to predict IO utility remains poorly defined, and no other biomarkers are currently used to tailor IO-based treatments (e.g., IO alone vs IO in combination with chemotherapy)15,17,18,19.
Conventional statistical methods are widely used in oncology to find associations between patient characteristics and outcomes, and to test whether data provide sufficient support for a specific hypothesis. However, conventional approaches have limited capacity to comprehensively evaluate the complexity of cancer biology, and to integrate the vast, heterogeneous, multi-modal data available from oncological patients20,21. In addition, the increasing advancement in technology has led to an unprecedented acceleration of drug discovery, making the oncology field continuously and dramatically changing, even in a short timeframe. For this reason, the hypothesis-driven research, which is based on establishing ad hoc studies to test just a single or a few hypotheses at the same time, cannot keep pace with the recent advances in oncology22,23. Artificial Intelligence (AI) frameworks, which synthesize and correlate information from different data sources, are a potentially highly efficient instrument to construct algorithms reinforcing the individual patient prediction. AI also allows the extraction of information from unstructured data, such as medical images, digitized slides, and mobile app monitoring, which can be associated with prognosis or benefit from treatments and therefore adoptable as potential novel biomarkers. Through the ability of AI to handle large quantities of single-patient information at the same time, the data-driven paradigm is increasingly rising, in parallel to the traditional hypothesis-driven clinical research24,25. This approach allows multiple clinical or scientific questions to be tested simultaneously using existing data, providing timely insights from open questions that are constantly arising from clinical practice or translational research.
The increasing availability of real-world data (RWD) and the application of AI are enabling the generation of novel hypotheses and accelerating translational insights. However, to achieve this goal, a large amount of high-quality, updated, and multisource data is mandatory to appropriately train AI models and reach satisfactory accuracy to make them applicable in clinical practice17,18,20,26. To address the lack of adequate data to perform the analyses required to address unmet clinical needs, we established the APOLLO11 consortium, which is a distributed, nationwide, continuously updated Italian lung cancer network. It encompasses the development of a decentralized long-term national database collecting RWD, settled locally in each center, and a “multilevel” biobank, locally stored and centrally annotated. The multilevel structure is designed to maximize the contribution to biological samples collection, including centers with more limited facilities. APOLLO11 aims to create a large platform for the collection of data from multiple sources, to accelerate the conduct of academic research, and to generate knowledge to answer new questions arising from clinical practice. This strategy aims to become the foundation for the clinical implementation of data-driven research27.
APOLLO11 will address several clinically-relevant, unsolved scientific questions27. The first scientific objective of this project is to find a predictive multi-omic algorithm of IO efficacy in aNSCLC patients. To achieve this scientific aim, multi-modal data were collected from aNSCLC patients treated with IO-based therapy across centers participating in the consortium. Machine Learning (ML) and Deep Learning (DL) AI-based models will be used to generate and synthesize biomarkers to accomplish the highest performance in prediction using multi-modal data. With the help of EXplainable trustworthy AI (XAI) methodologies and fairness auditing, trustworthy, human-readable models will be generated, leading to the creation of a responsible AI-based tool. This tool aims to support individualized treatment decisions about the use of IO in aNSCLC, optimizing treatment outcomes while minimizing undue toxicity.
In this manuscript, we will describe the rationale, structure, and detail the early implementation of the APOLLO11 nationwide infrastructure, with the aim of presenting the framework underpinning the collection, harmonization, and integration of clinical, radiomic, and multi-omic data in lung cancer.
Results
To keep the pace of the drug discovery process, the APOLLO11 consortium aims to collect updated large-scale data to identify and combine different types of biomarkers (clinical, radiologic, genetic, molecular, and immunological) across different lung cancer histological and biological entities and stages. We expect to identify markers predicting response to therapy at baseline, markers associated with primary and secondary resistance mechanisms, markers capable of predicting relapse, and markers associated with treatment toxicity.
APOLLO11 aims to become a network model for data collection and analysis in the data-driven research era, which can be applied to virtually all fields of oncology and different geographical and territorial contexts24,25. To pursue this objective, different steps are crucial for its implementation, starting from network building to results generation and application. APOLLO11 workflow as a model for data-driven research is shown in Fig. 1.
A Patient enrollment and data collection; B Biological and sample collection; C Data and sample sharing; D Model generation. The figure shows the entire process from the enrollment of a lung cancer patient at the local center, including the annotation of RWD, medical images and biological samples on REDCap (A), the local collection of biological material (B), the distribution of samples and data among centers (C) and the final predictive model generation to accomplish scientific tasks (D). The figure underlines some of the crucial aspects of data-driven research implemented in the APOLLO11 study: broad inclusion criteria, adoption of one-shot informed consent, central data annotation, use of federated learning for data sharing, lab manual, and data dictionary to standardize data for the analyses. ADC antibody-drug conjugate. cfDNA cell-free deoxyribonucleic acid. CT computed tomography. eCRF electronic case report form. HE hematoxylin-eosin. IO immune-oncology treatment. miRNA micro ribonucleic acid. MRI magnetic resonance imaging. NSCLC non-small cell lung cancer. PD progressive disease. PD-L1 programmed death ligand 1. PBMC peripheral blood mononuclear cell. PET positron emission tomography. PMN polymorphonuclear leukocyte. RWD real-world data. SCLC small cell lung cancer. SNP single-nucleotide polymorphism. TT targeted therapy.
Network establishment
Firstly, we established an Italian network of clinical centers with expertise in the treatment of lung cancer. The oncological network in Italy, as is often the case in the field, is characterized by a “hub and spokes” model of organization, in which local “spokes” centers generally take charge of patients, referring more complex clinical situations or treatment within clinical trials to the “hub” centers. The “hub” centers, on the other hand, are specialized in testing new drugs or new therapeutic approaches, or on treatment of rarer clinical conditions. This type of healthcare approach has led to a fragmentation of the collection of clinical data and biological materials, which often remain unused when they are collected at “spokes” centers. The consequence of this model is that RWD and biological material collected in non-research institutions cannot be used for research purposes, thus excluding “spokes” from academic research28,29.
The APOLLO11 project is breaking down these logistical and administrative barriers, enabling even smaller centers to contribute to research activity and, on the other hand, to channel a large amount of data to the ‘hub’ centers, making it available for scientific purposes. For this project, the identification of centers comes from a careful selection of public, academic, or private hospitals reflecting specific criteria. In particular, aspects considered crucial for inclusion in the network are: the presence of clinical oncologists with the experience and motivation to conduct academic research; software able to support electronic health records (EHRs) on which patient data and images can be unambiguously traced; one or more experienced staff dedicated to data collection and data management and biologists dedicated to the handling of biological samples; experience in the management of clinical trials, including academic ones. Moreover, to guarantee a nationwide reproducible data collection reaching the population-based level, an effort is made to include centers equally distributed across the national territory, including rural ones, such as those in the island territories. This will allow patients with diverse demographics, varying access to healthcare, and different social circumstances and health-related behaviors. On the other hand, the ‘spokes’ centers are guaranteed the opportunity to actively participate in data collection and analytic processes, and access to training and support resources provided by the APOLLO11 network. Additionally, participating centers benefit from increased visibility and recognition within the oncology community, as well as the opportunity to contribute to cutting-edge research and advancements in cancer care.
Infrastructure
Each participating center is collecting data using the secure web-based REDCap platform30. APOLLO11 data collection encompasses both retrospective and prospective observational phases. During the retrospective phase, all participating centers are contributing to the collection of RWD on lung cancer patients already treated with innovative systemic therapies. RWD collected includes demographic, epidemiological, treatment-related, blood-based (e.g., cell blood counts, biochemical), tumor-related (e.g., biological evaluations performed on tumor specimens as per clinical practice), as well as treatment details, survival outcomes, radiological response, and toxicity data. In the prospective phase, the study enrolls lung cancer patients who are candidates for an innovative systemic therapy, including newly diagnosed patients not previously exposed to such treatments. For the purposes of the study, innovative therapy is defined as any medical treatment that has been registered in Italy since the year 2010. Patient enrollment at each center started from the date of local committee approval, whereas individual data collection began when patients provided informed consent for data handling. RWD will be pseudonymized and entered into the local REDCap database, with a continuous update. Data sharing between participating centers and the coordinating center is governed by a dedicated Data Management Plan to ensure confidentiality and compliance with regulatory requirements31. Only the collection, sharing, or analysis of data with study objectives unrelated to the use of innovative anticancer treatment in lung cancer will require additional informed consent from patients32.
To harmonize data collection across participating centers, a data dictionary for biological and medical terms included in the database is discussed, shared with centers, and incorporated in REDCap electronic Case Report Form (eCRF)33,34. This data dictionary will be periodically updated, adding new therapeutics or knowledge advances in lung cancer whenever it reaches the clinical practice, based on the most updated available literature. The adoption of a data dictionary will allow the use of a common language among clinical centers, avoiding the use of ambiguous “free text” fields, which are also poorly handled by AI-based algorithms.
Besides RWD collection, medical images are being collected as an additional source of information to potentially include in the multi-omic predictive models, to further boost their performance18,35,36,37. Both retrospective and prospective enrolled patients underwent computed tomography (CT), Magnetic Resonance Imaging (MRI), and/or 2-fluorodeoxyglucose positron emission tomography (FdG-PET) scans according to standard-of-care clinical practice will be included in these analyses. Digitalized slides of diagnostic biopsy are also collected for the conduction of AI-based analysis on whole slide images (WSIs). Radiological imaging will be collected at each innovative treatment’s baseline, first radiological evaluation, and radiological progression. One coordinating center will arrange data collection and will assess data quality. Scans from individuals enrolled in participating centers will be de-identified per GDPR standards and encrypted before transmission to the server for radiomics analysis. The identification of the volume of interest (VOI) to conduct radiomics analysis in the APOLLO11 study follows a two-step approach: a fully-automated segmentation methodology (i.e., nnUNet) is adopted as the first instance to standardize the acquisition of VOI38; for a quote of images, dedicated radiologists with experience in radiomics also semi-automatically delineate the 3D target tumor volumes for each patient, to enhance the reproducibility of extracted features39. The generated radiomics signature will then be validated and integrated with RWD-based models. Similar procedures will be applied for the collection, digitalization, and sharing of digitalized pathology slides40.
With the aim of comprehensively characterizing tumor biology of enrolled patients and finally improving the predictive tool performance trained on RWD and medical image-data, the APOLLO11 project aims to establish a national multilevel biobank for lung cancer patients, standardizing sample collection procedures across centers while ensuring ethical and legal compliance. Biological samples include: archival tumor tissues, collected for diagnostic purposes at the start of treatment or for re-characterization after treatment failure, whole blood, plasma, Peripheral Blood Mononuclear Cells (PBMC), urine, saliva, and feces samples, which are being collected at specific treatment intervals for prospective patients starting a new innovative therapy. Detailed information on immune profiling and spatial transcriptomic analytic process is reported in the “Methods” section. The samples collected are stored on site, but annotation will be continuously tracked in a dedicated section of the eCRF. Biological sample collection is encouraged but not mandatory for center participation and patient enrollment in the APOLLO11 study. This approach promotes comprehensive multi-omic data collection, recognizing that not all institutions may have the logistical or technical capabilities to support biospecimen handling and storage.
These samples will be shared with the center provider of translational analyses at the time of the exploitation of scientific purposes, with residual material promptly shipped back where the patient is enrolled for clinical use or future investigation. Analyses will be performed on specific samples according to the analyses foreseen by the scientific proposal.
Implementation
Patients who are candidates to participate in the APOLLO11 study will be enrolled at the individual center level. The master protocol details specific criteria for patient enrollment, such as the diagnosis of NSCLC or SCLC confirmed on histological or cytological specimen, and past or present treatment with at least one innovative systemic treatment in their cancer history. However, the broad enrollment criteria foresee the inclusion of subgroups of patients, such as older, fragile, and other neglected population settings, across all lung cancer stages (I–IV), providing evidence on special categories not addressed by available literature. Specifically, the introduction of the term “innovative therapies” includes the vast majority of treatments, with the exception of standard cytotoxic agents, to allow all clinical scenarios that have not yet been sufficiently investigated in oncological research to be studied.
Upon confirmation of eligibility, patients are required to provide informed consent. In detail, patients will be asked to provide one-shot consent for the collection of personal data and biological samples. This uniqueness of the informed consent allows sparing several and redundant re-consenting when a patient is going to start a new innovative therapy or when conducting new analyses in the context of the APOLLO11 study.
Each time patients enrolled in the APOLLO11 study initiate a new innovative therapy, biological materials, including whole blood, plasma, stool, and urine, are collected and stored at the center, and the availability of biological material is annotated on REDCap among clinical data30. Available histological samples and medical images (i.e., radiological, digital pathology) are annotated as well.
Data sharing and federated learning
In the era of data-driven research, sharing large amounts of data is essential. However, this process is characterized by different criticisms, including issues about the necessity of high-capacity servers to centrally store data and ethical concerns about data privacy. This aspect is particularly relevant when the collection includes unstructured data such as radiological scans and digitized slides.
Federated learning offers a transformative approach to perform multi-omic analysis in multicenter studies, enabling the integration of diverse datasets without the need for centralized raw data41,42,43,44. It allows each center to locally train AI models, sharing with the central server only the model updates, ensuring that sensitive data remains local. This novel approach to modeling in multicenter studies significantly reduces the risk of data breaches and compliance issues with stringent regulatory requirements for sensitive data transfer, such as GDPR45. To take advantage of the above-mentioned benefits of federated and swarm learning, ad hoc software based on open source platforms has been tested in the APOLLO11 consortium for sharing RWD and radiological images and digital pathology slides between centers. Instead, the collected biological samples will be physically shared. To ensure proper functioning, the software will first be implemented in three Italian centers (the Sponsor institution, a cancer center, and a general hospital) and will be deployed in all centers of the consortium. The software is designed to be easy to use, as it does not require in-depth technical and informatics knowledge, but allows data to be loaded directly into the platform from the center, without the need for manual encoding for local training.
Resource usability
Data collected within the consortium will be used to answer various scientific questions that will be proposed on the basis of unmet clinical needs directly arising from clinical practice, or from new preclinical or translational evidence. On the basis of the specific scientific question, the most up-to-date version of the APOLLO11 dataset, including the subpopulation of interest, could be shared after an appropriate query is proposed.
Proposals for scientific queries, which have to be submitted in an anonymized way, may be made either by centers already belonging to the APOLLO11 consortium or by external institutions.
Internal centers are encouraged to advance new scientific proposals, based on their centers’ professional experience and expertise. To support this, “General Assembly” meetings will be regularly held to update ongoing projects and discuss new proposals. The General Assembly is a consortium organ consisting of one representative from each center, usually the Principal Investigator (PI) or his/her delegate. Proposals for new scientific queries may also come from external centers, whose suitability will be assessed by the Steering Committee itself.
Another ad hoc decision-making organ, called the “Steering Committee”, receives, evaluates, and expresses an independent judgment on the scientific proposal made. The members of the “Steering Committee” body are elected by the general assembly through a vote that takes place every three years. The steering committee will evaluate the projects by expressing an overall judgment on the proposal, based on 4 criteria: (1) relevance of the unmet clinical need in cancer care; (2) urgency of the unmet need in clinical practice; (3) originality of the project based on the available literature; (4) project feasibility, taking into account already available data and required additional resources.
Once the project is approved by the Steering Committee, the implementation phase begins with data extraction, including the annotation of biological samples locally stored in the biobanks at different centers.
At that point, the process will be different for proposals coming from internal and external centers. In the case of proposals from patients already taking part in the APOLLO11 consortium, data and materials are centralized at those centers or services designated to analyses conduction. All data are shared in a pseudonymized way, adopting a unique code for all the sub-analyses. When proposals arise from centers outside the consortium, in order to minimize the privacy risk, synthesized data will be generated from the identified dataset containing the population of interest. In this view, synthetic data already demonstrated to provide sufficient data quality to conduct statistical and AI-based analysis, while maximizing compliance with the GDPR. In the context of the APOLLO11 data workflow, the use of synthetic data, which will be generated through AI-based methodologies (e.g., variational autoencoders or generative adversarial networks), will allow to minimize data flow outside the consortium while still allowing queries from external centers to be addressed. The data usability flow for the APOLLO11 study is summarized in Fig. 2.
A Data collection and local storage; B Scientific queries proposal and discussion, for internal and external centers; C Queries implementation. Data within the APOLLO11 study are collected locally at each participating center through site-specific infrastructures and electronic case report forms, which may vary in terms of data availability, storage capacity, and processing facilities; this heterogeneity reflects real-world clinical practice and allows the inclusion of centers with different technological capabilities. The process of distribution of data collected within the study relies on two different scenarios based on the center proposing the scientific query: evaluated and approved proposals from centers already in the APOLLO11 study will be implemented, analyzing real data collected in study centers, while proposals from external centers will be accomplished with the use of synthetic data created from the original APOLLO11 dataset.
Data analysis methodology
Analyses from blood samples include the identification of Single-Nucleotide Polymorphisms (SNPs) through germinal sequencing from whole blood samples; microRNA (miRNA), circulating free DNA (cfDNA), lipidomic profiles and cytokines analyses from plasma samples; immune profiles and single-cell transcriptomic analyses from vital PBMCs (Fig. 1). Analyses from available archival samples will be conducted on tumor DNA through comprehensive next-generation sequencing (NGS) using extended panels, such as the Oncomine Comprehensive Assay (~500 genes), and, for a subset of tumor samples, through whole-genome sequencing (WGS) depending on the specific clinical or research question. Analyses on RNA will be performed through bulk transcriptomic profiling with standard library preparation protocols. Proteomic and metabolomic analyses will be performed to characterize amino acid and metabolite profiles, respectively, providing an integrated multi-omic view of tumor biology. Finally, microbiota analyses will be conducted on available stool and saliva samples. The choice of the center where the specific type of analysis is conducted is crucial to ensure a high quality of results. For this reason, the data and/or biological samples will be centralized in a designated facility with the highest expertise and deepest experience in analyzing that specific type of data. As discussed above, it is unlikely that a single omics is capable of making predictions on a single patient in a comprehensive manner. Therefore, in order to optimize prediction performance and thus to enhance their applicability, the fusion of various signatures generated by single data modalities into a comprehensive final predictive model is crucial. Multi-modal integration of different omics allows the comprehensive analysis and combination of various types of input to uncover complex interactions between different biological and molecular layers. Given the real-world, multicenter nature of APOLLO11, some heterogeneity and missingness across data sources are unavoidable. Depending on the extent and pattern of missing data, different strategies will be applied, including variable exclusion, restriction to complete cases, imputation, or multi-modal integration approaches tolerant to missing modalities. The final multi-omic model will be tested on an independent cohort of patients from a consortium center that did not contribute to the data training, as a validation cohort. This approach allows the generation of a validated model at the conclusion of each scientific objective, making it potentially ready for clinical implementation (Fig. 3).
A Status of data collection; B Status of centers activation; C RWD metadata; D Radiomics metadata; E FACS analyses. The figure shows the overall status of data collection, divided by type of data collected (A), flowchart of the clinical center contributing to the data collection (B), distribution of patients among characteristics subgroups (Panel C), type of radiological images collected (D), and examples of leukocytes population distribution obtained from FACS analyses (E). CT: computed tomography. eCRF: electronic case report form. IO: immune-oncology treatment. NSCLC non-small cell lung cancer. FACS fluorescence-activated cell sorting. FdG-PET 2-fluorodeoxyglucose positron emission tomography. RWD real-world data. SCLC small cell lung cancer. scRNAseq single-cell ribonucleic acid sequencing. TKI tyrosine kinase inhibitor.
Explainability
Despite the potential of AI to revolutionize cancer care, the adoption of ML/DL-based technologies in clinical practice is often hindered by the “black box” nature of many AI models. In fact, the complexity of models generated, which involve DL and other advanced ML techniques, can make their decision-making processes opaque. This challenge is especially pertinent in cancer research, where the integration of diverse sources of data requires a high degree of transparency and interpretability to understand the reasoning behind a model’s predictions to make it trustworthy to clinicians, patients, and other stakeholders. In this field, the appropriate use of XAI becomes crucial46.
The systematic inclusion of XAI frameworks in the APOLLO11 scientific projects, including SHapley Additive exPlanations (SHAP)-based feature attribution visual interpretability plots, will contribute to the scientific rigor by enabling the validation of AI models against established biological and clinical knowledge. These approaches will be applied once predictive models become available, enabling clinicians and researchers to understand the relative contribution of clinical, imaging, and multi-omic variables to individual predictions. For example, if a model predicts a poor prognosis based on certain radiomic features, XAI techniques can help identify which specific features are driving this prediction and how they correlate with known risk factors or biomarkers. This not only helps in verifying the accuracy of the model but also in uncovering new insights into the disease, potentially leading to novel therapeutic targets or diagnostic markers46,47.
First data collection
At the date of 1st February 2025, 52 Italian oncologic centers were screened for inclusion in the APOLLO11 network. Of them, 32 centers were selected and accepted to participate in the consortium. Finally, all these centers submitted the protocol to the local ethical committee, of which 20 centers received approval. To date, 7 centers started enrolling patients, collecting RWD, baseline CT and FdG-PET scans, and digitizing slides of a total of 2020 patients. Finally, 4 centers started the biological specimen collection. The different pipelines are being implemented concurrently to ensure synchronization and interoperability across modalities.
One example of research query within the APOLLO11 consortium’s aims to identify, through collaborative multi-omic data collection, factors potentially implicated in the response to ICIs in aNSCLC patients treated with immunotherapy in any line of treatment. The final aim of this integrated approach is the creation of a predictive AI model to enhance response customization and prediction. For the present data collection, part of the samples collected in the context of the APOLLO study, a single-institutional observational clinical trial involving the collection of clinical data and blood and tissue samples at the “Fondazione IRCCS Istituto Nazionale dei Tumori” in NSCLC patients who had received IO, have been retrieved17,48. Through these initiatives, this task of APOLLO11 seeks to advance understanding and treatment efficacy in NSCLC immunotherapy.
The scientific object of this first scientific aim consists of the development of a predictive multi-omic algorithm of IO efficacy in aNSCLC patients. To pursue this aim, peripheral immune profiling was obtained in 264 fresh blood samples of patients processed at baseline, while longitudinal immune profiling was available for 197 patients. This characterization encompasses the assessment of monocytes and neutrophils with specific markers, including: CD11b, CD66b, HLA-DR, CD14, CD15, CD10, CD16, CD117, and CD71. In addition, a subgroup of patients (N = 42) underwent single-cell transcriptomic analysis (scRNAseq) to specifically profile various immune subpopulations, including T-lymphocytes, B-lymphocytes, plasma cells, NK cells, monocytes, and neutrophil granulocytes. The association of the composition of PBMCs, and in particular T cells, with clinical outcomes (tumor response, PFS, and OS) during IO treatment is currently being evaluated. Based on the clinical outcome data available to date, the predictive role of the composition of PBMCs has been preliminarily assessed, confirming the positive prognostic role of T cells. As follow-up continues, it will be possible to identify gene expression patterns predictive of benefit from immune checkpoint inhibitor therapy.
Discussion
The APOLLO11 project has established a nationwide, multicenter, continuously updated collection of data and biological samples from lung cancer patients treated with innovative systemic therapies. This model provides a practical framework for data-driven research, which relies on large, updated, and comprehensive datasets to address clinical and translational questions arising from clinical practice. The study follows a strict regulatory framework to ensure compliance with data protection laws, patient confidentiality, and ethical guidelines, thereby maintaining the integrity and reliability of the collected data while facilitating collaborative research efforts. The selected centers are geographically distributed across Northern, Central, and Southern Italy, and include academic, community, and research institutions, thereby reflecting the diversity of clinical practice across the country.
The decentralized structure of data collection and the approach to the scientific queries ensures research democracy, facilitating the availability of data among research group either inside or outside the consortium and supporting sharing hypotheses among researchers; they also guarantees meritocracy, prioritizing research questions that are more likely to positively impact cancer care; finally, APOLLO11 structure promotes scientific fairness, supporting centers with lower structural and financial resources. This comprehensive approach has the potential to enhance disease understanding and to support the development of more tailored treatments, with the aim of ultimately improving lung cancer patients' outcomes. Finally, the integration of XAI methodologies is expected to increase the transparency and trustworthiness of these models, facilitating clinicians to make informed, personalized treatment decisions that optimize outcomes and minimize toxicity. The widespread collection of tumor samples across centers will allow us to emphasize the diversity and richness of available data, encompassing immune circulating profiling, genome, and scRNAseq, with high versatility of the project in terms of data acquisition and analysis. The material and pre-analytical data collection pipeline is designed to ensure the availability of readily usable information, facilitating the initiation of research activities. In essence, this pipeline serves as a “ball pit of information” for translational researchers, providing them with a comprehensive array of biological samples and associated data, facilitating in-depth investigations into the molecular mechanisms underlying lung cancer and potential therapeutic targets. In addition, the APOLLO11 project will enable collaboration with external centers or existing networks, following a scientific query–centered approach and encompassing the use of synthetic data, which allows secure data sharing while overcoming ethical barriers.
Other efforts are underway to build multisource data to build predictive models49,50,51. The I3LUNG project, similarly to APOLLO11, focuses on personalized medicine through the development of AI tools based on multi-modal patient data in lung cancer, integrating clinical information and multi-omics data from international cancer Institutions into a data storage and processing platform. Through the use of AI methodology, I3LUNG aims to improve the clinical decision-making process specifically for aNSCLC patients receiving IO by tailoring treatments to individual needs. Another example of data-driven research includes the MOSAIC project, which is a large European initiative aimed at developing a federated infrastructure for multi-omics and clinical data integration in oncology, which aims to support clinicians with an AI-based framework for multi-modal analysis, classification, and personalized prognostic assessment in rare cancers. However, while MOSAIC primarily focuses on establishing technical standards, data interoperability frameworks, and ethical–legal guidelines to enable cross-border research collaboration, APOLLO11 is a national, multicenter, disease-specific program that applies an innovative infrastructural framework to a real-world, clinically integrated setting in NSCLC.
Similar approaches are ongoing in the literature, including the federated distribution of digitized slides and the systematic collection of cancer radiological images44,50. However, APOLLO11 is unique in its decentralized data collection, multi-modal integration, and federated learning approach. This holistic strategy addresses previous limitations and provides a comprehensive framework for future translational oncology research. In particular, APOLLO11’s creation of a biobank and easily accessible data collection for scientific purposes is a significant innovation. Finally, thanks to the broad inclusion criteria and one-time consent, APOLLO11 will allow studies to be conducted based on novel questions arising directly from unmet clinical or translational needs.
Initial descriptive results from APOLLO11 demonstrate the potential of this collaborative effort. By identifying factors influencing the response to ICIs in advanced NSCLC patients, the consortium has started developing predictive models to guide treatment decisions. This success underscores the feasibility and effectiveness of the APOLLO11 approach, paving the way for further advancements in the field.
Despite its promising design and nationwide scope, the implementation of APOLLO11 is not without challenges. The multicentric nature of the initiative inevitably introduces heterogeneity in data quality and biospecimen handling across participating centers, even with standardized operating procedures in place. Ensuring long-term sustainability in terms of funding, infrastructure maintenance, and the continuous engagement of both hub and spoke centers will be critical to preserve the integrity and expansion of the network over time. Moreover, while the federated learning framework mitigates some privacy and data-sharing concerns, regulatory and legal complexities surrounding data governance may still pose barriers to broader interoperability and scalability. Proactively addressing these issues through continuous quality control, transparent governance, and alignment with European data frameworks will be essential to ensure the feasibility, scalability, and long-term integration of APOLLO11 within the evolving landscape of international precision oncology.
In summary, the APOLLO11 consortium aims to provide a shift in lung cancer research, allowing the implementation of data-driven beside the traditional hypothesis-driven approaches by enabling new hypotheses to directly emerge from large-scale RWD. Given the existing ethical and legal constraints that characterize the actual Italian scenario, the establishment of robust and transparent federated frameworks at the national level could facilitate the active participation of Italian centers in European data ecosystems. The ongoing commitment of participating centers and the continuous integration of new data and technologies are pivotal for sustaining progress. Insights gleaned from APOLLO11 have the ambition of directing future research, signaling a shift towards translational lung cancer research.
Methods
Ethics approval and one-shot consent
During the visit, patients will be thoroughly informed and subsequently given the opportunity to sign an informed consent form. This decree regulates the retrospective and prospective collection of clinical data and the storage of biological samples. In cases where it is not feasible to obtain consent for the use of data from patients retrospectively included due to ethical or administrative reasons, the study complies with the guidelines established by the Data Protection Authority, as specified in the General Authorization for the processing of personal data for scientific research purposes and the General Authorization for the processing of genetic data. The informed consent form will also allow patients to state their preference regarding whether they wish to be informed about any unexpected findings related to their health that could have therapeutic implications or influence their reproductive decisions. Eligibility will be verified using a checklist, ensuring that only patients diagnosed with SCLC or NSCLC, who have received or are candidates to receive innovative therapy, are included. Upon consenting, patients will begin innovative therapy, during which blood, stool, urine, and histological material, as well as data from CT and PET scans, will be collected and recorded in REDCap at baseline and/or specific time points. If consent is given during ongoing innovative therapy or if the patient has previously received such therapy, data collection will continue in REDCap. Patients who complete their current therapy and are candidates for a new innovative therapy will transition to Scenario 1, where samples will be collected at the start and at predetermined time points for each new line of therapy.
REDCap is a secure web application designed for data collection and management in research studies. The activation process involves the engineer from the Coordinating Center transferring data to the engineer at the Recruiting Center, who will structure the platform’s pages. To access REDCap, users will receive a link and credentials generated by the IT engineer at the Recruiting Center. The platform features a registration page that captures the patient’s basic information and multiple questions with either multiple-choice or single-choice responses, ensuring a comprehensive and organized data collection process.
Each center will have access only to its own data on the platform, with oversight provided by the Coordinating Center. Upon signing the consent form for the study, this data will be stored anonymously and in compliance with privacy laws. Access to the data will be granted based on various queries proposed and approved by the consortium, ensuring that all data usage adheres to the established guidelines and regulations.
This study was conducted in accordance with the principles of the Declaration of Helsinki. Ethical approval was granted by the Comitato Etico Territoriale Lombardia 4, which acted as the central ethics committee for the study. The single national opinion (Parere Unico Nazionale) was issued on October 10th, 2024, under the reference code INT 128/22, and is valid for all participating centers involved in the study.
FACS and single-cell transcriptomics methodology
EDTA/Heparin blood samples collected are processed. For each patient/timepoint, it will be generated: a plasma biobank, stored at -80°C in a dedicated freezer; a PBMC biobank, stored in nitrogen. For each patient/timepoint, 5 plasma samples in EDTA, 2 plasma samples in heparin, and 2/3 viable PBMC samples.
For each patient, immediate fresh sample analysis of monocytes and neutrophils is performed with specific markers including: CD11b, CD66b, HLA-DR, CD14, CD15, CD10, CD16, CD11,7 and CD71; these results are reported on a specific updated database, along with the original fcs file generated by the Celesta cytometer.
For a subgroup of patients, a second PBMC sample is analyzed, from which CD66+ CD15+ neutrophils are sorted for scRNAseq. For each scRNAseq analysis, 2 samples are available: 1 sample of FACS-sorted neutrophils (10,000 cells) and 1 sample of total PBMCs (10,000 cells); these data are collected on a dedicated database with all original analyses and all sorting data generated by the FACS Melody cytometer.
The scRNAseq analyses are performed using samples consisting of 50% PBMCs and 50% FACS-sorted neutrophils, in order to produce transcriptomic data from both the mononuclear and polymorphonuclear cell components, which are known to be more labile and difficult to handle in single-cell experiments. The viability of the cells obtained is assessed using Trypan Blue staining. Subsequently, single-cell suspensions are prepared with approximately 10,000 cells per sample. ScRNAseq libraries are created with the Chromium Next GEM kit (10X Genomics). Cells are washed, resuspended in a PBS-BSA solution, and encapsulated in droplets to form gel bead emulsions (GEMs). The GEMs are subjected to reverse transcription in a thermal cycler, then disrupted, and the cDNA is purified and amplified by PCR. For the TCR-seq library, TCR transcripts are enriched from the amplified cDNA and then subjected to various preparation steps. The resulting libraries are purified and assessed for quality and then sequenced on a NovaSeq 6000 sequencer. Data analysis begins with Cell Ranger, followed by further analysis in R using the Seurat package to identify cell types, reduce dimensionality, perform clustering, and visualize data. Cell identification uses supervised algorithms (SingleR, AUcell) and manual curation.
Radiomics methodology
A team of four radiologists with experience in CT scan segmentation will identify the Region Of Interest (ROI) through a semi-automated 3D segmentation process performed with Syngo.via software. The ROI will be selected to include the whole lesion of interest, and it will also encompass the peritumoral region. The segmentation process will be performed by two radiologists and compared with the segmentation performed in an automated way through the U-Net architecture, to ensure the reproducibility of the process. After image pre-processing through different techniques (e.g., gray discretization, intensity normalization, and voxel resampling), radiomic features will be extracted from ROIs and peritumoral areas using the PyRadiomics library.
Given the multicenter nature of the APOLLO11 project, inter-center variability in scanners and acquisition protocols will be systematically assessed, as it represents a major source of batch effects and a critical barrier to the clinical reproducibility of radiomics. A preliminary analysis of acquisition parameters and scanner characteristics will be performed to guide the selection of the most appropriate harmonization strategy. Among the approaches considered, ComBat harmonization will be applied, using the center of image acquisition as the primary batch variable, and, when appropriate, the scanner model or contrast agent use, depending on metadata availability and distribution. These harmonization procedures will be conducted prior to feature selection to minimize technical bias across centers.
To prevent signature overfitting, the dimensionality of features will be reduced before signature construction, firstly excluding features with a high intraclass correlation coefficient and significantly different between the two outcome groups as assessed by one-way analysis of variance (ANOVA). Least absolute shrinkage and selection operator (LASSO) regression and/or Maximum Relevance Minimum Redundancy (MRMR) will then be used for the selection of features to be included in the final model with 5-fold cross-validation.
Model generation and other AI analyses
The entire cohort will be splitted in a training (80%) and a testing cohort (20%), leaving a sample of cases belonging to one of the centers of the consortium as an external validation cohort. Different standard ML classifiers, such as Random Forest, Multilayer perceptron, Logistic Regression, Support Vector Machine, CatBoost, AdaBoost, XGBoost, will be trained and evaluated for this task. Different metrics will be adopted to evaluate the performance of the model on training, cross-validation, and testing sets and on the external validation cohort, such as AUC, sensitivity, or specificity for classification tasks and c-index for survival tasks.
Data availability
The datasets generated during the current study are not publicly available, as this manuscript describes the study design of the APOLLO11 consortium, and the related analyses are still ongoing. However, the data are available from the corresponding author upon reasonable request.
Code availability
Not applicable, as no analyses are included in the present manuscript.
References
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).
Howlader, N. et al. The effect of advances in lung-cancer treatment on population mortality. N. Engl. J. Med. 383, 640–649 (2020).
Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73, 17–48 (2023).
Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2017. CA Cancer J. Clin. 67, 7–30 (2017).
Lynch, T. J. et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N. Engl. J. Med. 350, 2129–39 (2004).
Rosell, R. et al. Erlotinib versus standard chemotherapy as first-line treatment for European patients with advanced EGFR mutation-positive non-small-cell lung cancer (EURTAC): a multicentre, open-label, randomised phase 3 trial. Lancet Oncol. 13, 239–285 (2012).
Maemondo, M. et al. Gefitinib or chemotherapy for non-small-cell lung cancer with mutated EGFR. N. Engl. J. Med. 362, 2380–8 (2010).
Mitsudomi, T. et al. Gefitinib versus cisplatin plus docetaxel in patients with non-small-cell lung cancer harbouring mutations of the epidermal growth factor receptor (WJTOG3405): an open-label, randomised phase 3 trial. Lancet Oncol. 11, 121–128 (2010).
Sequist, L. V. et al. Phase III study of afatinib or cisplatin plus pemetrexed in patients with metastatic lung adenocarcinoma with EGFR mutations. J. Clin. Oncol. 31, 3327–34 (2013).
Soria, J. C. et al. Osimertinib in untreated EGFR-mutated advanced non-small-cell lung cancer. N. Engl. J. Med. 378, 113–125 (2018).
Borghaei, H. et al. Nivolumab versus docetaxel in advanced nonsquamous non-small-cell lung cancer. N. Engl. J. Med. 373, 1627–1639 (2015).
Fehrenbacher, L. et al. Atezolizumab versus docetaxel for patients with previously treated non-small-cell lung cancer (POPLAR): a multicentre, open-label, phase 2 randomised controlled trial. Lancet 387, 1837–1846 (2016).
Reck, M. et al. Pembrolizumab versus chemotherapy for PD-L1-positive non-small-cell lung cancer. N. Engl. J. Med. 375, 1823–1833 (2016).
Antonia, S. J. et al. Durvalumab after chemoradiotherapy in stage III non-small-cell lung cancer. N. Engl. J. Med. 377, 1919–1929 (2017).
Gandhi, L. et al. Pembrolizumab plus chemotherapy in metastatic non-small-cell lung cancer. N. Engl. J. Med. 378, 2078–2092 (2018).
Ferrara, R. et al. Hyperprogressive disease in patients with advanced non-small-cell lung cancer treated with PD-1/PD-L1 inhibitors or with single-agent chemotherapy. JAMA Oncol. 4, 1543–1552 (2018).
Prelaj, A. et al. Machine learning using real-world and translational data to improve treatment selection for NSCLC patients treated with immunotherapy. Cancers 14, 435 (2022).
Prelaj, A. et al. Artificial intelligence for predictive biomarker discovery in immuno-oncology: a systematic review. Ann. Oncol. 35, 29–65 (2024).
Lo Russo, G. et al. PEOPLE (NTC03447678), a phase II trial to test pembrolizumab as first-line treatment in patients with advanced NSCLC with PD-L1 <50%: a multiomics analysis. J. Immunother. Cancer 11, e006833 (2023).
Hunter, D. J. & Holmes, C. Where medical statistics meets artificial intelligence. N. Engl. J. Med. 389, 1211–1219 (2023).
Bzdok, D., Altman, N. & Krzywinski, M. Points of significance: statistics versus machine learning. Nat. Methods 15, 233–234 (2018).
Haug, C. J. & Drazen, J. M. Artificial intelligence and machine learning in clinical medicine, 2023. N. Engl. J. Med. 388, 1201–1208 (2023).
Liang, W. et al. Advances, challenges and opportunities in creating data for trustworthy AI. Nat. Mach. Intell. 4, 669–677 (2022).
Evans, R. P., Bryant, L. D., Russell, G. & Absolom, K. Trust and acceptability of data-driven clinical recommendations in everyday practice: a scoping review. Int. J. Med. Inform. 183, 105342 (2024).
Cresswell, K. et al. Investigating the use of data-driven artificial intelligence in computerised decision support systems for health and social care: a systematic review. Health Inform. J. 26, 2138–2147 (2020).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Prelaj, A. et al. APOLLO 11 Project, Consortium in Advanced Lung Cancer Patients Treated With Innovative Therapies: Integration of Real-World Data and Translational Research. Clin. Lung Cancer 25, 190–195 (2024).
Fania, L. et al. Integrated care pathways and the hub-and-spoke model for the management of non-melanoma skin cancer: a proposal of the Italian Association of Hospital Dermatologists (ADOI). Dermatol. Rep. 13, 9278 (2021).
Munhoz, R., Sabesan, S., Thota, R., Merrill, J. & Hensold, J. O. Revolutionizing rural oncology: innovative models and global perspectives. Am. Soc. Clin. Oncol. Educ. Book 44, e432078 (2024).
Harris, P. A. et al. Research electronic data capture (REDCap)-a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inform. 42, 377–381 (2009).
Williams, M., Bagwell, J. & Nahm Zozus, M. Data management plans, the missing perspective. J. Biomed. Inform. 71, 130–142 (2017).
Kohlmayer, F., Lautenschläger, R. & Prasser, F. Pseudonymization for research data collection: is the juice worth the squeeze? BMC Med. Inform. Decis. Mak. 19, 178 (2019).
Penberthy, L. T. et al. An overview of real-world data sources for oncology and considerations for research. CA Cancer J. Clin. 72, 287–300 (2022).
Goel, A. K., Walter, C., Campbell, S. & Moldwin, R. Structured data capture for oncology. JCO Clin. Cancer Inform. 5, 194–201 (2021).
Elmahdy, M. & Sebro, R. Radiomics analysis in medical imaging research. J. Med. Radiat. Sci. 70, 3–7 (2023).
Van Griethuysen, J. J. M. et al. Computational radiomics system to decode the radiographic phenotype. Cancer Res. 77, e104–e107 (2017).
Zhao, J., et al. Radiomic and clinical data integration using machine learning predict the efficacy of anti-PD-1 antibodies-based combinational treatment in advanced breast cancer: a multicentered study. J. Immunother. Cancer 11, e006514 (2023).
Isensee, F., Jäger, P. F., Kohl, S. A. A., Petersen, J. & Maier-Hein, K. H. Automated design of deep learning methods for biomedical image segmentation. Nat. Methods 17, 1104–1114 (2020).
Fedorov, A. et al. 3D slicer as an image computing platform for the quantitative imaging network. Magn. Reson. Imaging 30, 1323–1341 (2012).
Dolezal, J. M. et al. Deep learning generates synthetic cancer histology for explainability and education. NPJ Precis. Oncol. 7, 49 (2023).
Scherer, J. et al. Joint imaging platform for federated clinical data analytics. JCO Clin. Cancer Inform. 4, 1027–1038 (2020).
Fu, R., Wu, Y., Xu, Q. & Zhang, M. FEAST: a communication-efficient federated feature selection framework for relational data. Proc. ACM Manag. Data 1, 1–28 (2023).
Teo, Z. L. et al. Federated machine learning in healthcare: a systematic review on clinical applications and technical architecture. Cell Rep. Med. 5, 101419 (2024).
Tayebi Arasteh, S. et al. Preserving fairness and diagnostic accuracy in private large-scale AI models for medical imaging. Commun. Med. 4, 462 (2024).
European Parliament. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data. Off. J. Eur. Union L119, 1–88 (2016).
Linardatos, P., Papastefanopoulos, V. & Kotsiantis, S. Explainable AI: a review of machine learning interpretability methods. Entropy 23, 18 (2021).
Wells, L. & Bednarz, T. Explainable AI and reinforcement learning—a systematic review of current approaches and trends. Front. Artif. Intell. 4, 550030 (2021).
Prelaj, A. et al. Real-world data to build explainable trustworthy artificial intelligence models for prediction of immunotherapy efficacy in NSCLC patients. Front. Oncol. 12, 1078822 (2023).
Prelaj, A. et al. The EU-funded I3LUNG Project: Integrative Science, Intelligent Data Platform for Individualized LUNG Cancer Care With Immunotherapy. Clin. Lung Cancer 24, 381–387 (2023).
D'Amico, S. et al. MOSAIC: An Artificial Intelligence-Based Framework for Multimodal Analysis, Classification, and PersonalizedPrognostic Assessment in Rare Cancers. JCO Clin. Cancer Inform. 8, e2400008 (2024).
Martí-Bonmatí, L. et al. Empowering cancer research in Europe: the EUCAIM cancer imaging infrastructure. Insights Imaging 16, 47 (2025).
Acknowledgements
We thank all the patients who accepted to partecipate to the study and IPOP “Insieme per i Pazienti di Oncologia Polmonare” to support and share the project. This work was supported by 5 per 1000 Funds financial support for research 2019, Italian Ministry of University and Research (MUR) — Institutional grant BRI2021. We would like to acknowledge donors for the Excalibur project in memory of Giorgiana Marchesi Bianchini. We especially thank all patients who took part in this clinical trial and their families.
Author information
Authors and Affiliations
Consortia
Contributions
Conceptualization: A.P., V.M., L.P., G.LR., M.G., L.M., A.V. Writing: L.P., V.M., M.G., A.P. Data Collection: A.P., L.P., V.M., M.G., L.M., M.G., C.S., A.S., R.R., M.B., M.O., T.B., P.A., E.S., M.F., A.Z., A.F., G.C., M.MP., M.R., MB.M., AD.D., RM.DM., C.G., C.C., R.S., C.C., A.P., G.M., C.B., R.F., M.M., A.S., MS.C., N.LV., L.T., P.B., F.C., E.Z., S.C., R.B., G.S., A.I., S.G., S.P., F.B., M.B., F.B., A.T., G.P., A.B., L.A., A.G., L.I., N.S., AR.F., P.S., G.G., D.L., EG.P., F.DB., A.P., F.T., C.G., CM.DC., G.V., MC.G., A.C., E.M., M.R., D.S., C.P., A.V., S.S., G.LR. Review: A.P., L.P., V.M., M.G., L.M., M.G., C.S., A.S., R.R., M.B., M.O., T.B., P.A., E.S., M.F., A.Z., A.F., G.C., M.MP., M.R., MB.M., AD.D., RM.DM., C.G., C.C., R.S., C.C., A.P., G.M., C.B., R.F., M.M., A.S., MS.C., N.LV., L.T., P.B., F.C., E.Z., S.C., R.B., G.S., A.I., S.G., S.P., F.B., M.B., F.B., A.T., G.P., A.B., L.A., A.G., L.I., N.S., AR.F., P.S., G.G., D.L., EG.P., F.DB., A.P., F.T., C.G., CM.DC., G.V., MC.G., A.C., E.M., M.R., D.S., C.P., A.V., S.S., G.LR. All authors have read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
A.P.: consulting/advisory role for BMS, AstraZeneca, Novartis, MSD, Lilly, Amgen, Pfizer, Jonsson & Jonsson; travel, accommodations, or other expenses paid or reimbursed by Roche and Jonsson & Jonsson; principal investigator of Spectrum Pharmaceuticals, BMS, Bayer, MSD, Lilly outside the submitted work. Guest Editor for the NPJ Precision Oncology journal special collection: “Artificial Intelligence Biomarkers in Precision Oncology”. L.P.: invited speaker for Pfizer, Novartis, Merck. V.M.: invited speaker for Novartis. L.M.: honoraria from MSD, Novartis; travel grants from Daiichi Sankyo, LeoPharma. A.S.: invited speaker for Novartis, BMS, MSD. C.G.: advisory role/invited speaker for Amgen, AstraZeneca, BMS, Daiichi Sankyo, Eli Lilly, Johnson&Johnson, MSD, Novartis, Pierre Fabre, Regeneron, Roche, Takeda. T.B.: Travel accommodation and conference grants from MSD, Sanofi, Pfizer, and Lilly. Honoraria from MSD. A.R.F.: grants or contracts from AstraZeneca; investigator for Merck Sharp & Dohme, and F. Hoffmann-La Roche; consulting fees from AstraZeneca and Radiomics; and payment or honoraria for lectures, presentations, speaker bureaus, manuscript writing, or educational events from AstraZeneca, F. Hoffmann-La Roche, Takeda, Merck Sharp & Dohme. Travel expenses from AstraZeneca. L.T.: consulting/advisory boards/speaker bureau fees from Roche, AstraZeneca, Sanofi, Beigene, Daiichi Sankyo, Takeda, Pfizer, Regeneron, MSD, Bristol Myers Squibb, Amgen, Johnson & Johnson, Novartis. Principal investigator of trials sponsored by AstraZeneca, ArriVent, Lilly, Roche, Amgen, BMS, PharmaMar, iTeos, and Daiichi Sankyo. A.C.: consultancies/advisory boards: MSD, OncoC4, Roche, Regeneron, BMS, Amgen, Daiichi Sankyo, AstraZeneca, Access Infinity, Ardelis Health, Alpha Sight, Capvision, Techspert, Alira Health, and Lightning Health. He also received speaker fees from Astrazeneca, Roche, Pierre Fabre, MSD, SANOFI/Regeneron; compensation for writing/editorial activity: BMS, MSD; travel support from Sanofi, MSD, Roche, and funding (to institution) from the International Association for the Study of Lung Cancer. A.S.: consultancies/advisory boards for Novartis, Amgen, MSD; speaker fees from Astrazeneca, Regeneron, Roche, Sanofi, Johnson&Johnson, BMS; funding from Italian Association for Cancer Research (AIRC). G.G.: advisory role for Italpharma; travel accommodation or other expenses paid or reimbursed by Roche, Eli Lilly, Amgen; honoraria by AstraZeneca, BMS, MSD. A.Pe.: cofounder and shareholder of two start-up companies, Agade srl and AllyArm srl; speaker for Novartis. C.B.: consultancies/advisory role for AstraZeneca, Novartis, Roche, Amgen, Pfizer, Johnson & Johnson, Daiichi Sankyo; travel, accommodations, or other expenses paid or reimbursed by Roche, Johnson & Johnson, BMS. M.G.: consultancies/advisory boards from BMS, Roche, Regeneron, Amgen, Johnson&Johnson, MSD; speaker fees from Astrazeneca, MSD, Pfizer, compensation for writing/editorial activity from MSD; travel support from Roche, MSD, BMS, Amgen. Astrazeneca. M.M.: advisory board for MSD, speaker fees from Astrazeneca, MSD, Pfizer, compensation for writing/editorial activity from MSD; travel support from Roche, MSD, BMS, Astrazeneca. E.G.P.: speaking fee from AZ, BMS, Regeneron; Travel Grant from Janssen, Roche. D.S.: honoraria from AstraZeneca, BMS, MSD, Roche, Johnson&Johnson, Sanofi, Novartis, Daiichi. Travel grants from MSD, Sanofi, BMS, Roche, AstraZeneca, and Pfizer. M.S.C.: consulting or advisory role for Pfizer, Daiichi Sankyo, Lilly, Gentili, Accord; speaker bureau for Gentili, Techdow; travel expenses from Pfizer, Sanofi, Bayer; research funding from Gilead. G.V.: grants for advisory boards from: Amgen, MSD, Novartis; speaker fees from Amgen, AstraZeneca, BMS, Merck, Pfizer, Regeneron, Roche, Takeda; travel support from AstraZeneca, MSD, Novartis, Sanofi. A.I.: advisory Board/Honoraria from Amgen, AstraZeneca, Merck Sharp & Dohme, Novartis, Roche. Medical writing grant from Merck Serono. Travel support from Amgen, AstraZeneca, Roche, and Sanofi. S.C.: honoraria from Roche, Lilly Oncology, Menarini Stemline, Novartis; AIOM Foundation President. C.C.: honoraria from AstraZeneca, Roche. R.F.: advisory board for MSD and BeiGene.N.L.V.: consulting or advisory role for Novartis, Pfizer, Roche, MSD, Astrazeneca, EISAI; Speaker bureau for Pfizer, Roche, Gentili, Lilly, Gilead, Daiichi Sankyo, Techdow; Travel expenses from Pfizer, Roche; research funding from GSK, Gilead. R.B.: personal fees from Amgen, MSD, Bristol Myers Squibb, Eisai, Roche, and AstraZeneca. G.P.: personal fees from Roche Foundation One, Bayer, Novartis, Lilly. F.d.B: patent for PCT/IB2020/055956 pending and a patent for IT201900009954 pending; honoraria from Roche, EMD Serono, NMS Nerviano Medical Science, Sanofi, MSD, Novartis, Incyte, BMS, Menarini Healthcare Research & Pharmacoepidemiology, Merck Group, Pfizer, Servier, AMGEN, Incyte. M.C.G.: honoraria from MSD Oncology, AstraZeneca/MedImmune, GlaxoSmithKline, Takeda, Roche, Bristol Myers Squibb; consulting or Advisory Role: Bristol Myers Squibb, MSD, AstraZeneca, Novartis, Takeda, Roche, Tiziana Life Sciences, Sanofi, Celgene, Daiiki Sankyo, Inivata, Incyte, Pfizer, Seattle Genetics, Lilly, GlaxoSmithKline, Bayer, Blueprint Medicines, Janssen; speakers’ bureau from AstraZeneca, Takeda, MSD Oncology, Celgene, Incyte, Roche, Bristol Myers Squibb, Otsuka, Lilly; research funding from Bristol Myers Squibb, MSD, Roche/Genentech, AstraZeneca/MedImmune, AstraZeneca, Pfizer, GlaxoSmithKline, Novartis, Merck, Incyte, Takeda, Spectrum Pharmaceuticals, Blueprint Medicines, Lilly, AstraZeneca, Ipsen, Turning Point Therapeutics, Janssen, Exelixis, MedImmune, Array BioPharma, Sanofi; travel and accommodations expenses from Pfizer, Roche, AstraZeneca. C.P.: personal fees from Italfarmaco, AstraZeneca, BMS, and Merck Sharp and Dohme. G.L.R.: consultation, advisory boards, honoraria, or education grants: Merck Sharp and Dohme, Takeda, Amgen, Eli Lilly, BMS, F. Hoffmann-La Roche, Italfarmaco, Novartis, Sanofi, Pfizer, and AstraZeneca. Other authors declare no financial competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Prelaj, A., Provenzano, L., Miskovic, V. et al. APOLLO11: a bio-data-driven model for clinical and translational research in lung cancer. npj Precis. Onc. 10, 96 (2026). https://doi.org/10.1038/s41698-026-01295-3
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41698-026-01295-3





