Powering responsible artificial intelligence with high-quality real-world data: the S-RACE platform for scalable, multi-specialty clinical research

Traverso, Alberto; Tiano, Donato; Corvaglia, Andrea; Dimonte, Alessio; Draetta, Edoardo Luigi; Fabiani, Bruno; Scuri, Patrick; Barbieri, Simone; Agazzi, Marcello; Arslan, Muhammad; Celada, Daniele; Chiabrando, Filippo; Cibrario, Lorenzo; Cielo, Giulio; Colombo, Alberto; Contini, Stefano; Liberotti, Marta; Montagna, Marco; Ogliari, Francesca Rita; Palmisano, Anna; Pisu, Francesco; Serra, Davide; Varani, Diego; Vignale, Davide; Vitali, Andrea Luigi; Zambello, Alan; Chiapponi, Chiara; Denti, Marco; Esposito, Antonio; Tacchetti, Carlo

doi:10.1038/s41746-025-02132-w

Download PDF

Article
Open access
Published: 03 January 2026

Powering responsible artificial intelligence with high-quality real-world data: the S-RACE platform for scalable, multi-specialty clinical research

Alberto Traverso¹,
Donato Tiano¹,
Andrea Corvaglia¹,
Alessio Dimonte¹,
Edoardo Luigi Draetta¹,
Bruno Fabiani¹,
Patrick Scuri¹,
Simone Barbieri¹,
Marcello Agazzi¹,
Muhammad Arslan¹,
Daniele Celada¹,
Filippo Chiabrando²,
Lorenzo Cibrario¹,
Giulio Cielo¹,
Alberto Colombo¹,
Stefano Contini¹,
Marta Liberotti¹,
Marco Montagna²,
Francesca Rita Ogliari^2,3,
Anna Palmisano^2,4,5,
Francesco Pisu¹,
Davide Serra¹,
Diego Varani¹,
Davide Vignale^2,4,5,
Andrea Luigi Vitali¹,
Alan Zambello¹,
Chiara Chiapponi⁶,
Marco Denti¹,
Antonio Esposito^1,2,4,5^na1 &
…
Carlo Tacchetti^1,2,4^na1

npj Digital Medicine volume 9, Article number: 6 (2026) Cite this article

5007 Accesses
1 Citations
2 Altmetric
Metrics details

Subjects

Abstract

The translation of Artificial Intelligence (AI) into clinical practice demands high-quality Real-World Data (RWD), yet unstructured healthcare information poses a significant barrier. To address this, we developed S-RACE, a secure, cloud-based platform designed to systematically transform raw hospital data into high-quality, research-grade evidence. S-RACE features an end-to-end pipeline, starting with on-premises anonymisation for data privacy, followed by Natural Language Processing (NLP) to extract and standardise clinical information into the FHIR format. This curated data foundation is essential for building robust AI models. The platform’s integrated “Data Science Lab” supports responsible AI development, incorporating explainability techniques and adhering to governance standards like ISO 42001 and the EU AI Act. Currently, S-RACE is populated with data from 31,276 patients, powering 19 research projects across fields including oncology, cardiology, and diabetes. We demonstrate its utility in kidney cancer and aortic stenosis, where models trained on S-RACE’s automatically processed RWE showed performance comparable to those trained on manually curated data. S-RACE provides a scalable, governed environment for RWD curation, offering a trustworthy foundation to accelerate the clinical adoption of responsible AI.

A validated framework for responsible AI in healthcare autonomous systems

Article Open access 19 December 2025

Trust in AI-assisted health systems and AI’s trust in humans

Article Open access 28 March 2025

Fair shares: building and benefiting from healthcare AI with mutually beneficial structures and development partnerships

Article 14 July 2021

Introduction

The convergence of Real-World Evidence (RWE) and Artificial Intelligence (AI) is reshaping modern medicine, offering the potential to develop advanced Clinical Decision Support Systems (CDSS) that are essential for personalised patient care. AI technologies are uniquely capable of analysing vast and complex datasets to uncover clinical insights that would otherwise remain hidden, promising to improve patient outcomes by optimising treatments and accelerating drug discovery¹.

However, the transformative potential of AI in healthcare is fundamentally constrained by the quality of the Real-World Data (RWD) it relies on. RWD is typically collected from disparate hospital IT systems and is often unstructured, sparse, and lacking in standardisation. This inherent variability creates a significant barrier to transforming raw data into the ‘regulatory grade’ evidence needed for robust clinical applications². Here, ‘regulatory grade’ refers to Real-World Data with sufficiently high levels of accuracy, completeness, and traceability to be considered reliable for supporting regulatory submissions regarding a medical product’s effectiveness or safety.

Without first addressing these foundational data quality issues, AI models are prone to generating unreliable or erroneous outcomes, rendering them unusable in clinical settings and hindering their adoption. In the absence of foundational quality of RWD for AI applications, AI models can produce ‘hallucinations’ or outcomes that do not reflect actual data, making these applications unusable³.

While the traditional approach of tedious, manual data curation can produce high-quality datasets, it is not a scalable solution for processing the vast amount of data required for modern AI development. However, such manually curated datasets remain an invaluable ‘gold standard’ for benchmarking the performance and accuracy of automated data processing algorithms.

To unlock the full potential of RWD, the healthcare sector requires sophisticated, end-to-end clinical data science platforms. These platforms are specifically designed to systematically ingest, process, and harmonise complex RWD, streamlining the creation of high-quality, analysis-ready datasets that can support the development of reliable AI-driven systems.

Furthermore, the process of ingesting and managing RWD must be aligned with a complex and evolving regulatory landscape, as highlighted in our previous publications^4,5. The responsible integration of AI into healthcare demands robust governance frameworks to ensure safety, ethics, and trustworthiness. Key international standards and regulations, such as ISO/IEC 42001:2023 and the EU AI Act, mandate stringent requirements for data quality, risk management, and human oversight for high-risk AI systems^6,7,8. These frameworks underscore the global commitment to ensure the ethical and safe development and deployment of AI in healthcare. Therefore, any platform aiming to generate research-grade RWD must be built upon a foundation of strong governance and regulatory compliance from the outset.

In response to this critical need, we developed together with Microsoft (Redmond, Washington, USA) and Porini (Milan, Italy) the S-RACE (San Raffaele Ai CEnter) platform. S-RACE platform is a novel, cloud-based solution engineered to directly address the challenges of data quality and governance in healthcare AI. Its core function is a comprehensive data science pipeline that begins with secure, on-premises data anonymisation, followed by NLP-driven data extraction and standardisation into the FHIR format, creating a structured and high-quality data foundation for research.

This paper presents a comprehensive overview of the S-RACE platform, detailing its architecture, functionalities, and its systematic approach to transforming raw clinical data into research-grade RWD. We demonstrate how S-RACE serves as a collaborative environment where clinicians and data scientists can jointly develop and validate responsible AI-driven decision support systems, built upon the high-quality data foundation the platform provides. Through practical examples from our ongoing clinical research, we will illustrate the platform’s capabilities and show how a dedicated focus on data quality and responsible AI governance accelerates the clinical translation of AI, ultimately enhancing patient care. Finally, we will report details on the type of RWD data in the platform available for researchers.

Results

The S-RACE platform is underpinned by a robust governance model designed to ensure the generation of high-quality, research-grade data, in full alignment with Responsible AI principles and key regulatory frameworks like the EU AI Act and ISO 42001:2023. Central to this model is a meticulous data quality assessment process. The platform employs a hybrid data quality model by combining an expert driven evaluation with an automated pre-processing workflow.

Before model development begins, project proposals presented by the clinical PIs are rigorously assessed using a Data Quality Checklist, which contains 39 questions across five categories: Summary, Collection, Pre-Processing, Metadata, and Data. The PI of the study and an S-RACE contact person jointly complete the questionnaire. For each question, they provide a textual response and a score from 0 (worst) to 3 (best). This four-level Likert scale (‘Useless’ to ‘Valuable’) evaluates five quality dimensions: Accessibility, Accuracy, Completeness, Consistency, and Relevancy. The team compiles the weighted and normalised evaluations into a Summary Report, which then undergoes a peer review by the S-RACE project management and IT team to catch omissions or inconsistencies. If needed, a follow-up meeting is held with the clinical team to finalise the questionnaire before the final report is generated.

After the project is approved by the steering committee and the data has been transferred to the platform, the manual review is complemented by the Preliminary Exploratory Data Analysis (PExDA) framework—an automated pipeline. PExDA performs baseline quality checks, such as identifying and flagging patients with missing outcome data.

This entire process ensures data quality and consistency, focusing on a key principle: assessing whether the data is “fit for purpose” for a specific research question. This approach is intentionally dynamic: a dataset unsuitable for one basic model may be perfectly acceptable for a more advanced algorithm capable of handling its specific limitations, allowing the platform to continuously adapt to evolving AI techniques.

The S-RACE platform is built on three architectural pillars that create an end-to-end pipeline for transforming raw clinical data into a foundation for trustworthy AI. More details on the building blocks highlighted below are provided in Supplementary Fig. 1. The platform’s architecture is fundamentally shaped by a ‘privacy by design’ approach, in full compliance with the EU’s General Data Protection Regulation (GDPR). The data pipeline begins with an on-premises engine that performs pseudonymisation before any data is transferred to the cloud. This process targets direct identifiers (e.g., name, medical record number, social security number) and replaces them with a unique, irreversible cryptographic hash. The mapping key linking the pseudonym to the original identifier is stored exclusively within the hospital’s secure on-premises infrastructure and is never exposed to the cloud environment. This segregation is a critical security control that minimises the risk of re-identification. All platform activities are governed by principles of data minimisation and purpose limitation, ensuring researchers can only access the specific data necessary for their approved study protocols. This de-identification process is tailored to each data modality. For medical images in DICOM format, the on-premises engine first scrubs personally identifiable information from all metadata tags. Similarly, for unstructured data like clinical notes, Natural Language Processing (NLP) models are applied to redact personal identifiers before the records are pseudonymised and transferred to the cloud for further processing.

The process begins with the Universal Data Platform, which uses a hybrid-cloud approach to ensure security and privacy. Raw data is first processed by an on-premises engine for pseudonymisation before being securely transferred to the cloud. There, AI-powered services, including Natural Language Processing (NLP) and medical ontologies, parse unstructured text from clinical reports. This information is then transformed and structured according to the FHIR (Fast Healthcare Interoperability Resources) standard, creating a high-quality, harmonised, and analysis-ready dataset.

The Clinician AI Hub provides an interactive environment for clinicians and researchers to explore the curated data. Using data visualisation tools, they can conduct preliminary analyses to assess the quality and suitability of the dataset for a given research question. As a crucial step we ensure that the data is fit for purpose before proceeding to complex modelling using our developed Preliminary Exploratory Data Analysis (PExDA) framework, reinforcing the platform’s commitment to data quality (Supplementary Table S2).

The Data Science Lab offers a comprehensive environment within Microsoft Azure ML Studio for building and validating machine learning models on the high-quality data. To ensure the development of responsible AI, the lab integrates tools that support rigorous traceability and reproducibility (MLflow) and model transparency. Explainable AI (XAI) techniques, such as SHAP (SHapley Additive exPlanations)⁹, are employed to make model predictions interpretable. The platform also incorporates the Microsoft Responsible AI Toolbox (https://github.com/microsoft/responsible-ai-toolbox) to assess fairness, evaluate model performance across different patient cohorts, and mitigate potential biases, ensuring that the resulting AI systems are not only accurate but also trustworthy and equitable.

S-RACE is designed to foster multi-institutional collaboration through a flexible, but governed environment. For a given research project, multiple researchers can be granted secure access to develop and test models on high-quality datasets. This collaborative development is enhanced using Microsoft Azure pipelines, which allow for the reuse of code and expertise, promoting efficiency and standardisation across projects. However, development is decoupled from deployment. Models with robust validation, approved by a governance committee, are promoted to a central “Model Registry.” This registry facilitates their translation into the clinical world for trials measuring model benefit and monitoring their safety and effectiveness in a pragmatic real-world clinical setting¹⁰. Each registered model is further enriched with comprehensive metadata (Supplementary Table S3), crucial for ensuring transparency, traceability, and responsible governance throughout its lifecycle, which are based on the AIME registry for AI in biomedical research¹¹. AIMe provides a standardised framework for documenting AI models—like how clinical trials are registered—to ensure transparency and reproducibility. By adopting this structure, our Model Registry captures essential information like intended use, development data, performance metrics, and validation strategies, which is crucial for traceability and responsible governance throughout the model’s lifecycle. This ensures that only robust and trustworthy AI is considered for translation. To facilitate prospective validation or use in clinical trials by collaborators, these approved models can also be deployed as user-friendly web applications, a practice detailed later in our kidney cancer research project. Furthermore, the platform supports two distinct collaboration models: for centralised studies, external researchers can work directly within the secure S-RACE environment, while for scenarios where data cannot be shared, S-RACE supports privacy-preserving Federated Learning based on the NVFlare framework, enabling decentralised model training across institutions¹².

As of September 2025, there are 19 clinical research projects on-going (13 projects with data loaded on the platform, 6 projects to be loaded on the platform after IRB approval) which allowed us to integrate within the AI platform our 5 major IT data sources: EHRs, pathology, lab tests, PACS, eCRF (e.g., RedCap), and disease-specific internal databases for a total of 31276 patients (Fig. 1A). The projects span the following domains: oncology (8), cardiovascular disease (6), neuroendocrine disorders (3), neurosciences (2). Examples of type of data imported for some research projects are shown in Fig. 1b, c. A synthesis of the investigated clinical research questions, the number of included patients, and the type of imported data for each of the project is shown in Table 1.

Fig. 1: Overview of the total number of unique patients loaded in the S-RACE platform (top) and examples of data types loaded in the S-RACE platform showcased for few clinical research projects (bottom).

Table 1 Overview of the on-going clinical research projects and the type of data used

Full size table

To validate the platform’s core capability of generating high-quality data, we developed a pre-operative AI model to predict cancer-specific mortality in patients with non-metastatic clear cell renal cell carcinoma (ccRCC)¹³. In the initial ‘Business Understanding’ phase for this project, we conducted a systematic literature review to define the clinical problem and identify key prognostic variables. The cited review was instrumental for this purpose, providing a comprehensive list of established factors that we subsequently used to validate the successful feature extraction by our RWD processing pipeline. This project served as a direct test of S-RACE’s data processing pipeline. We utilised two distinct datasets from over 2000 patients: a manually curated clinical dataset (eCRF), representing the traditional ‘gold standard’ for research but with a limited number of variables, and a dataset of raw, unstructured RWD (more than 200 variables) automatically ingested and processed by the S-RACE platform (Fig. 2, top panel). The central experiment was to compare the performance of AI models developed on these two data sources. Following automated data processing, models were developed in the Data Science Lab using a hybrid strategy that balanced predictive power with clinical interpretability: a Random Survival Forest model was used for feature selection, and a Cox Proportional Hazard model was compared to end-to-end ML algorithms such as survival trees. The key finding was that models trained on the automatically processed RWD performed comparably to those trained on the manually curated dataset. Furthermore, by applying Explainable AI (XAI) techniques, we not only confirmed the importance of known clinical predictors but also identified novel prognostic variables present only in the raw RWD (Fig. 2, medium panel). This result provides strong evidence that the S-RACE platform can successfully transform complex, raw clinical data into reliable, research-grade evidence, thereby overcoming a primary bottleneck in the development of scalable and trustworthy AI for healthcare. The model was then implemented as a Web-based app to further speed up ease of use within the clinic and by external collaborators (Fig. 2, bottom panel).

**Fig. 2: Schematic summary of the ccRCC research project.**

The second example demonstrates how the S-RACE platform’s responsible AI capabilities can enhance model development, particularly when working with smaller, more specialised cohorts. Following the CRISP-DM methodology, the ‘Business Understanding’ phase of the TAVI project involved a systematic review of the literature. The cited review was critical for defining the clinical challenge of predicting treatment futility and identifying existing risk stratification models. This step informed our project objectives and provided the necessary benchmarks for the subsequent ‘Evaluation’ phase. The project aimed to identify patients with severe aortic stenosis who were unlikely to benefit from Transcatheter Aortic Valve Implantation (TAVI)¹⁴, using a high-quality dataset of approximately 500 patients. Given the limited cohort size, ensuring model robustness was paramount. The S-RACE Data Science Lab enabled the implementation of a sophisticated stratified nested cross-validation strategy, which is critical for generating reliable and unbiased performance estimates from smaller datasets (Fig. 3, top panel). More importantly, this project leveraged the platform’s integrated tools for responsible AI to move beyond standard accuracy metrics. A decision tree-based error analysis was conducted to automatically identify specific patient subgroups where the model was most likely to make erroneous classifications. By pinpointing these areas of underperformance, this methodology allows for targeted model refinement to improve fairness and clinical utility (Fig. 3, bottom panel). To prevent any risk of data leakage, our error analysis workflow is strictly partitioned. During the iterative development cycle, the analysis is performed exclusively on the validation set to guide model debugging and refinement, while the held-out test set remains untouched. Once all development is complete, the analysis is then applied a single time to the test set in a purely descriptive capacity. In this final stage, it does not guide any model changes but instead serves to complement aggregate metrics, enhancing transparency by documenting the final model’s performance across key subgroups. This analysis also serves as a guide for targeted data acquisition; by understanding the characteristics of patient profiles where the model is weakest, we can leverage the S-RACE platform’s data ingestion capabilities to automatically retrieve additional, relevant data from the hospital’s IT systems. This creates a powerful feedback loop for continuous model improvement, demonstrating a significant step towards developing more precise and equitable AI models for clinical decision support.

**Fig. 3: Schematic summary of the severe aortic stenosis research project.**

A key strength of the S-RACE platform, demonstrated in this research project, is its ability to agnostically ingest all available data for a patient, including large-format data like medical images, and use them to create a “deep patient phenotype”. This enables the extraction of “opportunistic biomarkers” from imaging studies, such as the pre-TAVI planning total body CT scans, that were performed for routine clinical care. The platform automates the analysis of these images, moving beyond standard cardiac measures like annular dimensions, ejection fraction and chamber size. Using deep learning-based segmentation and radiomics, it quantifies features such as the volume and signal distribution of abdominal fat, muscle, and bone, as well as organs like the liver and kidney. This holistic characterisation allows for the identification of subclinical comorbidities and vulnerabilities not captured in standard clinical reports, providing a richer dataset to enhance prognostic models and improve their accuracy. As shown in Table 1, the deep learning image analysis solutions developed within one specific project will be applied to the images of the other cohorts.

Discussion

The S-RACE platform has been developed to address the fundamental challenge in clinical AI: the need for a scalable and governed process to transform raw, heterogeneous hospital data into high-quality, research-grade RWD. The transformation of raw RWD into trustworthy evidence requires more than just technical data cleaning; it demands a rigorous and principled approach to the entire research lifecycle, as outlined in frameworks like the PRINCIPLED checklist for RWD re-use¹⁵. The S-RACE platform was designed to be a comprehensive ecosystem that provides researchers with the functionalities to operationalise such a principled approach. For instance, each study on S-RACE starts with a clear definition of a research question and a systematic literature review within the CRISP-DM framework to support a robust study design. The S-RACE platform provides clinicians with interactive tools in the Clinician AI Hub to define endpoints and cohorts. To address the critical challenge of confounding, S-RACE focuses on creating a holistic view of a patient by integrating multi-modal data, thereby providing a richer set of covariates for adjustment in statistical models. For bias remediation, our ‘Responsible AI Development’ pillar integrates tools like the Microsoft Responsible AI Toolbox to systematically assess fairness and identify subgroup underperformance. Finally, transparency and reproducibility are enforced through the mandatory use of MLflow for experiment tracking and a comprehensive AIMe-based Model Registry for transparent documentation from development through to deployment.

While other notable platforms and frameworks such as N3C¹⁶, i2b2 transMART¹⁷, MSK-CHORD¹⁸, and Ehrapy¹⁹ share the goal of advancing RWE, S-RACE is distinguished by several key architectural and philosophical choices that prioritise data quality, security, and regulatory readiness, as summarised in Table 2.

Table 2 Comparison of various platforms and frameworks designed for the ingestion and use of RWD

Full size table

A primary differentiator is our strategic emphasis on data quality as the foundational output. The platform is engineered first and foremost as an engine for data curation. This is exemplified by our hybrid-cloud architecture, which features a mandatory on-premises anonymisation step. This ‘privacy by design’ approach ensures sensitive patient data never leaves the hospital’s secure environment before being pseudonymised. This is not merely a technical choice but a core governance principle that builds institutional, clinical, and patient trust, and it contrasts with models that may transfer raw identifiable data to the cloud, increasing the attack surface and complicating regulatory compliance.

The S-RACE platform’s design aligns with key frameworks for trustworthy research. It operationalizes the FAIR principles by ensuring data are Findable and Accessible via a governed hub, Interoperable through the mandatory FHIR standard, and Reusable thanks to comprehensive documentation in the AIMe-based Model Registry. The platform’s emphasis on detailed metadata in the Data Quality Checklist and Model Registry also adheres to the MINERVA framework²⁰. Finally, S-RACE supports the PRINCIPLED¹⁵ checklist by providing an ecosystem for robust study design, bias remediation using integrated tools, and better handling of confounding through multi-modal data integration.

Furthermore, S-RACE is deeply integrated within the Microsoft Azure ecosystem. This deliberate choice provides a cohesive, enterprise-grade environment that leverages a suite of interoperable tools for every stage of the pipeline. Using a single, secure cloud environment for data processing (Cognitive Health Services), model development (Azure ML Studio), and collaboration simplifies security and identity management, streamlines workflows, and facilitates easier auditing compared to assembling a solution from multiple, disparate vendors. This tight integration ensures both robust performance and a clear chain of custody for data and models.

A crucial aspect of the S-RACE vision is its role as a catalyst for collaborative research. The platform is not merely a technical tool, but a managed ecosystem designed to bring together clinicians and data scientists from multiple institutions. Governance is embedded into the collaborative workflow: each project operates within a segregated workspace with role-based access controls, ensuring that researchers only see the data relevant to their approved study. This structure supports two powerful modes of collaboration. First, it allows for centralised analysis, where external partners can securely access and work with curated, high-quality datasets. Second, it is equipped for privacy-preserving federated learning, enabling the development of more generalisable models by training algorithms across decentralised datasets without ever moving sensitive patient data. This dual capability makes S-RACE a flexible and powerful hub for multi-centre studies, accelerating scientific discovery by creating larger, more diverse virtual cohorts while upholding the highest standards of data protection and project governance.

The versatility of the S-RACE platform is another key strength. Unlike more specialised platforms focused on a single disease area, it is currently populated with disease-specific data for a total of 31,276 patients, powering 19 distinct clinical research projects across diverse domains including oncology, cardiology, and diabetes. This demonstrates the platform’s technical scalability and, more importantly, the successful implementation of a standardised, repeatable data curation pipeline. This proves its value as a central institutional asset that can break down data silos, foster cross-disciplinary research, and maximise the return on investment in data infrastructure.

The clinical examples presented in this paper serve to illustrate these strengths in practice. The kidney cancer research project provides direct validation for our primary mission: by showing that models built on automatically ingested RWD can perform as well as those built on manually curated data, we demonstrate the platform’s success as a scalable data curation engine. The aortic stenosis research project highlights the next layer of the platform’s capabilities, showing how this high-quality data foundation enables more advanced and responsible AI development. It showcases how S-RACE facilitates the creation of deep patient phenotypes through the extraction of opportunistic biomarkers from imaging data, and how its integrated tools can be used to analyse model fairness and guide a continuous feedback loop of improvement.

Finally, the entire S-RACE framework was built with proactive alignment to the evolving regulatory landscape. Its core features directly address the requirements of standards like ISO 42001:2023 and the EU AI Act. For instance, the centralised model registry provides the versioning and detailed documentation essential for traceability, while the integrated responsible AI tools for error analysis and fairness assessment directly support the risk management and bias mitigation mandates of these regulations. By prioritising the generation of high-quality, reliable data within a responsible and collaborative framework, S-RACE provides a robust, future-proofed solution to accelerate the development and translation of trustworthy AI in medicine.

Beyond its technical capabilities, S-RACE is fundamentally a collaborative ecosystem. It is engineered to unite clinicians and data scientists from multiple institutions, supporting both centralised analysis within its secure environment and privacy-preserving federated learning for studies where data cannot be shared. This collaborative union is achieved by providing two distinct, purpose-built environments that operate on the same governed data foundation: The Clinician AI Hub a no-code, interactive environment, allows clinicians to use data visualisation tools to explore cohorts and assess data quality without requiring programming knowledge; The Data Science Lab is a parallel environment providing data scientists with a comprehensive suite of tools in Microsoft Azure ML Studio for advanced model development and validation. This dual-environment structure effectively bridges the expertise gap. Clinicians can define clinically meaningful problems using accessible tools, while data scientists can apply rigorous computational methods to the exact same curated data. This dual capability establishes S-RACE as a powerful hub for multi-centre research, accelerating the creation of more generalisable and robust AI models. The importance of this work lies in its demonstration of a robust, governed, and scalable environment for RWD curation. S-RACE provides a trustworthy foundation to accelerate the development and clinical adoption of responsible AI. The deep integration with Microsoft Azure raises important considerations regarding data privacy and vendor interoperability, which we address through specific governance and technical choices. First, we state unequivocally that Microsoft, as the cloud provider, has no technical or legal access to any patient data hosted within the S-RACE platform; all data remains under the exclusive control of our institution. Second, to mitigate vendor lock-in and ensure interoperability, the platform relies on open standards. All curated data are structured in the Fast Healthcare Interoperability Resources (FHIR) format, ensuring it can be exported and used in other systems. Furthermore, our support for the open-source NVFlare framework enables privacy-preserving federated learning, allowing for direct collaboration with institutions regardless of their underlying infrastructure, thus promoting a vendor-neutral research ecosystem.

A key limitation of the current S-RACE platform is its primary focus on predictive / prognostic modelling rather than formal causal inference. While the integrated Microsoft Responsible AI libraries provide tools for related tasks—such as generating individualised counterfactual explanations with DiCE or estimating population-level treatment effects with EconML—the platform does not yet automate the rigorous design required for robust causal claims, such as systematic confounder selection or formal “prediction under intervention” analyses. Establishing a full causal inference framework remains a valuable direction for future work. Furthermore, a related challenge for any deployed model that requires further investigations is the potential for performance degradation due to distributional changes over time. The S-RACE platform is designed to mitigate this risk through its governance structure. Every model promoted to the central Model Registry is registered with metadata establishing a baseline reference for its training data distribution and performance. Our post-deployment protocol includes the continuous monitoring of outcome distributions and model calibration against this baseline. Any significant deviation is flagged to a governance committee, which can trigger model recalibration or retraining to ensure its continued safety and efficacy in a real-world clinical setting. S-RACE uses pseudonymisation for secure, longitudinal clinical research data linkage, a capability precluded by full anonymisation. The platform ensures GDPR compliance via a ‘privacy by design’ architecture, storing the vital mapping key securely on-premises, separate from the cloud-processed data. However, research on full anonymisation techniques (e.g. k-anonymisation and synthetic data) is currently ongoing but not yet released. Another limitation is that curating Real-World Data (RWD) by excluding records with poor quality or missing information introduces a risk of selection bias, as the excluded patients may systematically differ from the final study cohort. To mitigate this, we employ the platform’s Exploratory Data Analysis (EDA) tools to thoroughly characterize and compare both the excluded and included populations, thereby ensuring transparency regarding any potential bias. Furthermore, we minimize exclusions—typically limiting them only to records missing a primary outcome—by using imputation techniques. We also address the risk that researchers might unintentionally build models that merely confirm pre-existing hypotheses. This is mitigated through two primary safeguards: a structured research process guided by the CRISP-DM framework, and the mandatory application of Explainable AI (XAI) techniques, such as SHAP. XAI significantly enhances transparency by revealing whether a model is relying on spurious correlations or on the hypothesized clinical factors, which directly challenges and validates researchers’ underlying assumptions.

Methods

The platform’s robust AI capabilities are powered by a suite of Microsoft technologies, forming a comprehensive ecosystem for RWD processing and AI development²¹:

Microsoft Cognitive Health Services: Utilised for advanced Natural Language Processing (NLP) and the application of medical ontologies, these services are crucial for extracting structured, clinically relevant information from unstructured clinical text, such as physician notes and reports. The NLP pipeline is based on the commercial product by Microsoft Text Analytics for Health (TA4H, https://tinyurl.com/mxnysfav) which performs anonymisation of the clinical notes and then extracts medical concepts and relations among these concepts using standard ontologies such as UMLS (Unified Medical Language System). The common data model is the FHIR (Fast Healthcare Interoperability Resources). Additional data models such as OMOP can be obtained, for example, using conversion tools (e.g., FHIR to OMOP, https://build.fhir.org/ig/HL7/fhir-omop-ig/).
Microsoft Power BI (Business Intelligence): Integrated within the Clinician AI Hub, Power BI enables intuitive data visualisation and preliminary analysis, allowing clinicians to explore insights from RWE in an accessible format.
Microsoft Azure ML Studio: This comprehensive and scalable environment supports the entire ML model development and deployment lifecycle within the Data Science Lab, providing data scientists with the tools needed for robust model creation. Azure Machine Learning Studio allows to model any type of clinical questions from regression to classification and survival analysis by using both a “white box” modelling approach (e.g., logistic regression, Cox models) and “black box” approaches (e.g., random forests). In this last scenario, explainability tools are used for clinical understanding of the results.
Standardised Explainability Techniques: A core component of the Data Science Lab, these techniques are employed to enhance the transparency and interpretability of AI models, addressing the ‘black box’ challenge and building trust among clinical users.

The S-RACE platform integrates data from five primary institutional IT systems to build a comprehensive, multi-modal data foundation for research. The core data types include: (i) Electronic Health Records (EHRs), providing clinical and demographic information; (ii) Pathology and Laboratory systems, providing histopathology and lab test results; (iii) the Picture Archiving and Communication System (PACS), for medical imaging such as CT, PET, and MRI; (iv) Genomics data for multi-omics studies; and (v) research-specific sources like electronic Case Report Forms (eCRFs) and internal databases. This integration enables the creation of a multimodal view of patients’ data for everyone in the research cohort. The specific data types utilised for each of the 19 ongoing clinical research projects are detailed in Table 1.

The platform’s architecture is fundamentally shaped by a ‘privacy by design’ approach, in full compliance with the EU’s General Data Protection Regulation (GDPR). To correct an inconsistency in the original manuscript, we clarify that the process employed is strictly pseudonymisation, as legally defined under GDPR. The on-premises data ingestion engine automatically removes all direct patient identifiers (e.g., name, national health service number, medical record number) and replaces them with a unique, irreversible cryptographic hash. The mapping key that links this pseudonym back to the original patient identifier is stored exclusively within the hospital’s secure on-premises infrastructure and is never exposed to the cloud environment. This technical segregation is the primary control that substantiates our ‘privacy by design’ claim, serving as an auditable safeguard that ensures sensitive patient data is protected before leaving the hospital’s trusted domain. This approach is the specific technical safeguard that enables our legal basis for processing under GDPR: scientific research that allows for longitudinal patient follow-up, which requires re-linkability. While our primary compliance framework is GDPR, these technical and organisational measures also align with the core principles of other international standards, such as the security and privacy rules within HIPAA. Furthermore, our use of Microsoft Azure services for data processing is governed by a formal Data Protection Addendum (DPA). This contractual agreement legally obligates Microsoft to adhere to its responsibilities as a data processor under GDPR, ensuring that all data handling meets the required compliance standards.

Although the implementation presented in this paper relies on Microsoft Azure services, the underlying methods and workflows (e.g., data ingestion, storage, processing, and analysis pipelines) are cloud-agnostic. Equivalent services exist in other cloud providers (e.g., AWS S3+EMR, Google Cloud Storage + Dataproc) or can be deployed on-premises using containerized solutions (e.g., Kubernetes + Spark). Therefore, the approach described is portable and not limited to Microsoft Azure.

Data availability

Data can be made available under request. We can grant access to the S-RACE platform to the to access the data within the secure environment of our Cloud.

Code availability

For each research project which will be submitted as original publication, the developed source code will be shared with the editors and reviewers as a private repository on the institutional GitHub and shared during the review process. Under request, we can grant access to specific research projects to their Azure ML studio environments for collaboration purposes.

References

Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat Med. 28, 31–38 (2022).
Article PubMed CAS Google Scholar
Purpura, C. A., Garry, E. M., Honig, N., Case, A. & Rassen, J. A. The Role of Real-World Evidence in FDA-Approved New Drug and Biologics License Applications. Clin Pharmacol Ther 111, 135–144 (2022).
Article PubMed Google Scholar
Sun, Y., Sheng, D., Zhou, Z. & Wu, Y. AI hallucination: towards a comprehensive classification of distorted information in artificial intelligence-generated content. Humanit Soc Sci Commun. 11, 1278 (2024).
Article Google Scholar
Embí, P. J., Rhew, D. C., Peterson, E. D. & Pencina, M. J. Launching the Trustworthy and Responsible AI Network (TRAIN): A Consortium to Facilitate Safe and Effective AI Adoption. JAMA 333, 1481 (2025).
Article PubMed Google Scholar
Van Genderen, M. E., Kant, I. M. J., Tacchetti, C. & Jovinge, S. Moving Toward Implementation of Responsible Artificial Intelligence in Health Care: The European TRAIN Initiative. JAMA 333, 1483 (2025).
Article PubMed Google Scholar
Ethics and Governance of Artificial Intelligence for Health: Large Multi-Modal Models. WHO Guidance. (World Health Organization, Geneva, 2024).
Golpayegani, D., Pandit, H. J. & Lewis, D. Comparison and Analysis of 3 Key AI Documents: EU’s Proposed AI Act, Assessment List for Trustworthy AI (ALTAI), and ISO/IEC 42001 AI Management System. in Artificial Intelligence and Cognitive Science (eds. Longo, L. & O’Reilly, R.) vol. 1662 189–200 (Springer Nature Switzerland, Cham, 2023).
Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. (World Health Organization, Geneva, 2021).
Loyola-Gonzalez, O. Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses From a Practical Point of View. IEEE Access 7, 154096–154113 (2019).
Article Google Scholar
You, J. G., Hernandez-Boussard, T., Pfeffer, M. A., Landman, A. & Mishuris, R. G. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications. NPJ Digit Med. 8, 107 (2025).
Article PubMed PubMed Central Google Scholar
Matschinske, J. et al. The AIMe registry for artificial intelligence in biomedical research. Nat Methods 18, 1128–1131 (2021).
Article PubMed CAS Google Scholar
Roth, H. R. et al. NVIDIA FLARE: Federated Learning from Simulation to Real-World. https://doi.org/10.48550/ARXIV.2210.13291 (2022).
Usher-Smith, J. A. et al. Risk models for recurrence and survival after kidney cancer: a systematic review. BJU International 130, 562–579 (2022).
Article PubMed Google Scholar
Webb, J. G. et al. TAVI in 2022: Remaining issues and future direction. Archives of Cardiovascular Diseases 115, 235–242 (2022).
Article PubMed Google Scholar
Desai, R. J. et al. Process guide for inferential studies using healthcare data from routine clinical practice to evaluate causal effects of drugs (PRINCIPLED): considerations from the FDA Sentinel Innovation Center. BMJ e076460 https://doi.org/10.1136/bmj-2023-076460. (2024)
Haendel, M. A. et al. The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment. Journal of the American Medical Informatics Association 28, 427–443 (2021).
Article PubMed Google Scholar
Firnkorn, D., Merker, S., Ganzinger, M., Muley, T. & Knaup, P. Unlocking Data for Statistical Analyses and Data Mining: Generic Case Extraction of Clinical Items from i2b2 and tranSMART. Stud Health Technol Inform 228, 567–571 (2016).
PubMed Google Scholar
Jee, J. et al. Automated real-world data integration improves cancer outcome prediction. Nature 636, 728–736 (2024).
Article PubMed PubMed Central CAS Google Scholar
Heumos, L. et al. An open-source framework for end-to-end analysis of electronic health record data. Nat Med. 30, 3369–3380 (2024).
Article PubMed PubMed Central CAS Google Scholar
Gini, R. et al. Metadata for Data dIscoverability aNd Study rEplicability in obseRVAtional Studies (MINERVA): Lessons Learnt From the MINERVA Project in Europe. Pharmacoepidemiol Drug Saf 33, e5884 (2024).
Article PubMed Google Scholar
The Transformative Role of Microsoft Azure AI in Healthcare. IJETER 12, 108–113 (2024).

Download references

Acknowledgements

This work was supported by the Supported by the Italian Ministry of Research, Complementary Actions to the NRRP “D34health - Digital Driven Diagnostics, prognostics and therapeutics for sustainable Health care” Grant (# PNC0000001).

Author information

These authors jointly supervised this work: Antonio Esposito, Carlo Tacchetti.

Authors and Affiliations

S-RACE (San-Raffaele Ai CEnter), University Vita-Salute San Raffaele, Milan, Italy
Alberto Traverso, Donato Tiano, Andrea Corvaglia, Alessio Dimonte, Edoardo Luigi Draetta, Bruno Fabiani, Patrick Scuri, Simone Barbieri, Marcello Agazzi, Muhammad Arslan, Daniele Celada, Lorenzo Cibrario, Giulio Cielo, Alberto Colombo, Stefano Contini, Marta Liberotti, Francesco Pisu, Davide Serra, Diego Varani, Andrea Luigi Vitali, Alan Zambello, Marco Denti, Antonio Esposito & Carlo Tacchetti
School of Medicine, University Vita-Salute San Raffaele, Milan, Italy
Filippo Chiabrando, Marco Montagna, Francesca Rita Ogliari, Anna Palmisano, Davide Vignale, Antonio Esposito & Carlo Tacchetti
Medical Oncology Unit, IRCCS San Raffaele Hospital, Milan, Italy
Francesca Rita Ogliari
Experimental Imaging Center, IRCCS San Raffaele Hospital, Milan, Italy
Anna Palmisano, Davide Vignale, Antonio Esposito & Carlo Tacchetti
Advanced Imaging for Personalised Medicine Unit, IRCCS San Raffaele Hospital, Milan, Italy
Anna Palmisano, Davide Vignale & Antonio Esposito
Microsoft Italia, Milan, Italy
Chiara Chiapponi

Authors

Alberto Traverso
View author publications
Search author on:PubMed Google Scholar
Donato Tiano
View author publications
Search author on:PubMed Google Scholar
Andrea Corvaglia
View author publications
Search author on:PubMed Google Scholar
Alessio Dimonte
View author publications
Search author on:PubMed Google Scholar
Edoardo Luigi Draetta
View author publications
Search author on:PubMed Google Scholar
Bruno Fabiani
View author publications
Search author on:PubMed Google Scholar
Patrick Scuri
View author publications
Search author on:PubMed Google Scholar
Simone Barbieri
View author publications
Search author on:PubMed Google Scholar
Marcello Agazzi
View author publications
Search author on:PubMed Google Scholar
Muhammad Arslan
View author publications
Search author on:PubMed Google Scholar
Daniele Celada
View author publications
Search author on:PubMed Google Scholar
Filippo Chiabrando
View author publications
Search author on:PubMed Google Scholar
Lorenzo Cibrario
View author publications
Search author on:PubMed Google Scholar
Giulio Cielo
View author publications
Search author on:PubMed Google Scholar
Alberto Colombo
View author publications
Search author on:PubMed Google Scholar
Stefano Contini
View author publications
Search author on:PubMed Google Scholar
Marta Liberotti
View author publications
Search author on:PubMed Google Scholar
Marco Montagna
View author publications
Search author on:PubMed Google Scholar
Francesca Rita Ogliari
View author publications
Search author on:PubMed Google Scholar
Anna Palmisano
View author publications
Search author on:PubMed Google Scholar
Francesco Pisu
View author publications
Search author on:PubMed Google Scholar
Davide Serra
View author publications
Search author on:PubMed Google Scholar
Diego Varani
View author publications
Search author on:PubMed Google Scholar
Davide Vignale
View author publications
Search author on:PubMed Google Scholar
Andrea Luigi Vitali
View author publications
Search author on:PubMed Google Scholar
Alan Zambello
View author publications
Search author on:PubMed Google Scholar
Chiara Chiapponi
View author publications
Search author on:PubMed Google Scholar
Marco Denti
View author publications
Search author on:PubMed Google Scholar
Antonio Esposito
View author publications
Search author on:PubMed Google Scholar
Carlo Tacchetti
View author publications
Search author on:PubMed Google Scholar

Contributions

AT, CT, AE: design, writing, supervision, review. DT, AC, AD, ELD, BF, PS, SB, MA, MUA, DC, FC, LC, GC, AC, SC, ML, MM, FO, AP, FP, DS, DV, DVI, ALV, AZ, CC, MD: writing, review. All the authors have read and approved the manuscript.

Corresponding authors

Correspondence to Alberto Traverso or Antonio Esposito.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Material (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Traverso, A., Tiano, D., Corvaglia, A. et al. Powering responsible artificial intelligence with high-quality real-world data: the S-RACE platform for scalable, multi-specialty clinical research. npj Digit. Med. 9, 6 (2026). https://doi.org/10.1038/s41746-025-02132-w

Download citation

Received: 01 August 2025
Accepted: 30 October 2025
Published: 03 January 2026
Version of record: 05 January 2026
DOI: https://doi.org/10.1038/s41746-025-02132-w