Introduction

The convergence of Real-World Evidence (RWE) and Artificial Intelligence (AI) is reshaping modern medicine, offering the potential to develop advanced Clinical Decision Support Systems (CDSS) that are essential for personalised patient care. AI technologies are uniquely capable of analysing vast and complex datasets to uncover clinical insights that would otherwise remain hidden, promising to improve patient outcomes by optimising treatments and accelerating drug discovery1.

However, the transformative potential of AI in healthcare is fundamentally constrained by the quality of the Real-World Data (RWD) it relies on. RWD is typically collected from disparate hospital IT systems and is often unstructured, sparse, and lacking in standardisation. This inherent variability creates a significant barrier to transforming raw data into the ‘regulatory grade’ evidence needed for robust clinical applications2. Here, ‘regulatory grade’ refers to Real-World Data with sufficiently high levels of accuracy, completeness, and traceability to be considered reliable for supporting regulatory submissions regarding a medical product’s effectiveness or safety.

Without first addressing these foundational data quality issues, AI models are prone to generating unreliable or erroneous outcomes, rendering them unusable in clinical settings and hindering their adoption. In the absence of foundational quality of RWD for AI applications, AI models can produce ‘hallucinations’ or outcomes that do not reflect actual data, making these applications unusable3.

While the traditional approach of tedious, manual data curation can produce high-quality datasets, it is not a scalable solution for processing the vast amount of data required for modern AI development. However, such manually curated datasets remain an invaluable ‘gold standard’ for benchmarking the performance and accuracy of automated data processing algorithms.

To unlock the full potential of RWD, the healthcare sector requires sophisticated, end-to-end clinical data science platforms. These platforms are specifically designed to systematically ingest, process, and harmonise complex RWD, streamlining the creation of high-quality, analysis-ready datasets that can support the development of reliable AI-driven systems.

Furthermore, the process of ingesting and managing RWD must be aligned with a complex and evolving regulatory landscape, as highlighted in our previous publications4,5. The responsible integration of AI into healthcare demands robust governance frameworks to ensure safety, ethics, and trustworthiness. Key international standards and regulations, such as ISO/IEC 42001:2023 and the EU AI Act, mandate stringent requirements for data quality, risk management, and human oversight for high-risk AI systems6,7,8. These frameworks underscore the global commitment to ensure the ethical and safe development and deployment of AI in healthcare. Therefore, any platform aiming to generate research-grade RWD must be built upon a foundation of strong governance and regulatory compliance from the outset.

In response to this critical need, we developed together with Microsoft (Redmond, Washington, USA) and Porini (Milan, Italy) the S-RACE (San Raffaele Ai CEnter) platform. S-RACE platform is a novel, cloud-based solution engineered to directly address the challenges of data quality and governance in healthcare AI. Its core function is a comprehensive data science pipeline that begins with secure, on-premises data anonymisation, followed by NLP-driven data extraction and standardisation into the FHIR format, creating a structured and high-quality data foundation for research.

This paper presents a comprehensive overview of the S-RACE platform, detailing its architecture, functionalities, and its systematic approach to transforming raw clinical data into research-grade RWD. We demonstrate how S-RACE serves as a collaborative environment where clinicians and data scientists can jointly develop and validate responsible AI-driven decision support systems, built upon the high-quality data foundation the platform provides. Through practical examples from our ongoing clinical research, we will illustrate the platform’s capabilities and show how a dedicated focus on data quality and responsible AI governance accelerates the clinical translation of AI, ultimately enhancing patient care. Finally, we will report details on the type of RWD data in the platform available for researchers.

Results

The S-RACE platform is underpinned by a robust governance model designed to ensure the generation of high-quality, research-grade data, in full alignment with Responsible AI principles and key regulatory frameworks like the EU AI Act and ISO 42001:2023. Central to this model is a meticulous data quality assessment process. The platform employs a hybrid data quality model by combining an expert driven evaluation with an automated pre-processing workflow.

Before model development begins, project proposals presented by the clinical PIs are rigorously assessed using a Data Quality Checklist, which contains 39 questions across five categories: Summary, Collection, Pre-Processing, Metadata, and Data. The PI of the study and an S-RACE contact person jointly complete the questionnaire. For each question, they provide a textual response and a score from 0 (worst) to 3 (best). This four-level Likert scale (‘Useless’ to ‘Valuable’) evaluates five quality dimensions: Accessibility, Accuracy, Completeness, Consistency, and Relevancy. The team compiles the weighted and normalised evaluations into a Summary Report, which then undergoes a peer review by the S-RACE project management and IT team to catch omissions or inconsistencies. If needed, a follow-up meeting is held with the clinical team to finalise the questionnaire before the final report is generated.

After the project is approved by the steering committee and the data has been transferred to the platform, the manual review is complemented by the Preliminary Exploratory Data Analysis (PExDA) framework—an automated pipeline. PExDA performs baseline quality checks, such as identifying and flagging patients with missing outcome data.

This entire process ensures data quality and consistency, focusing on a key principle: assessing whether the data is “fit for purpose” for a specific research question. This approach is intentionally dynamic: a dataset unsuitable for one basic model may be perfectly acceptable for a more advanced algorithm capable of handling its specific limitations, allowing the platform to continuously adapt to evolving AI techniques.

The S-RACE platform is built on three architectural pillars that create an end-to-end pipeline for transforming raw clinical data into a foundation for trustworthy AI. More details on the building blocks highlighted below are provided in Supplementary Fig. 1. The platform’s architecture is fundamentally shaped by a ‘privacy by design’ approach, in full compliance with the EU’s General Data Protection Regulation (GDPR). The data pipeline begins with an on-premises engine that performs pseudonymisation before any data is transferred to the cloud. This process targets direct identifiers (e.g., name, medical record number, social security number) and replaces them with a unique, irreversible cryptographic hash. The mapping key linking the pseudonym to the original identifier is stored exclusively within the hospital’s secure on-premises infrastructure and is never exposed to the cloud environment. This segregation is a critical security control that minimises the risk of re-identification. All platform activities are governed by principles of data minimisation and purpose limitation, ensuring researchers can only access the specific data necessary for their approved study protocols. This de-identification process is tailored to each data modality. For medical images in DICOM format, the on-premises engine first scrubs personally identifiable information from all metadata tags. Similarly, for unstructured data like clinical notes, Natural Language Processing (NLP) models are applied to redact personal identifiers before the records are pseudonymised and transferred to the cloud for further processing.

The process begins with the Universal Data Platform, which uses a hybrid-cloud approach to ensure security and privacy. Raw data is first processed by an on-premises engine for pseudonymisation before being securely transferred to the cloud. There, AI-powered services, including Natural Language Processing (NLP) and medical ontologies, parse unstructured text from clinical reports. This information is then transformed and structured according to the FHIR (Fast Healthcare Interoperability Resources) standard, creating a high-quality, harmonised, and analysis-ready dataset.

The Clinician AI Hub provides an interactive environment for clinicians and researchers to explore the curated data. Using data visualisation tools, they can conduct preliminary analyses to assess the quality and suitability of the dataset for a given research question. As a crucial step we ensure that the data is fit for purpose before proceeding to complex modelling using our developed Preliminary Exploratory Data Analysis (PExDA) framework, reinforcing the platform’s commitment to data quality (Supplementary Table S2).

The Data Science Lab offers a comprehensive environment within Microsoft Azure ML Studio for building and validating machine learning models on the high-quality data. To ensure the development of responsible AI, the lab integrates tools that support rigorous traceability and reproducibility (MLflow) and model transparency. Explainable AI (XAI) techniques, such as SHAP (SHapley Additive exPlanations)9, are employed to make model predictions interpretable. The platform also incorporates the Microsoft Responsible AI Toolbox (https://github.com/microsoft/responsible-ai-toolbox) to assess fairness, evaluate model performance across different patient cohorts, and mitigate potential biases, ensuring that the resulting AI systems are not only accurate but also trustworthy and equitable.

S-RACE is designed to foster multi-institutional collaboration through a flexible, but governed environment. For a given research project, multiple researchers can be granted secure access to develop and test models on high-quality datasets. This collaborative development is enhanced using Microsoft Azure pipelines, which allow for the reuse of code and expertise, promoting efficiency and standardisation across projects. However, development is decoupled from deployment. Models with robust validation, approved by a governance committee, are promoted to a central “Model Registry.” This registry facilitates their translation into the clinical world for trials measuring model benefit and monitoring their safety and effectiveness in a pragmatic real-world clinical setting10. Each registered model is further enriched with comprehensive metadata (Supplementary Table S3), crucial for ensuring transparency, traceability, and responsible governance throughout its lifecycle, which are based on the AIME registry for AI in biomedical research11. AIMe provides a standardised framework for documenting AI models—like how clinical trials are registered—to ensure transparency and reproducibility. By adopting this structure, our Model Registry captures essential information like intended use, development data, performance metrics, and validation strategies, which is crucial for traceability and responsible governance throughout the model’s lifecycle. This ensures that only robust and trustworthy AI is considered for translation. To facilitate prospective validation or use in clinical trials by collaborators, these approved models can also be deployed as user-friendly web applications, a practice detailed later in our kidney cancer research project. Furthermore, the platform supports two distinct collaboration models: for centralised studies, external researchers can work directly within the secure S-RACE environment, while for scenarios where data cannot be shared, S-RACE supports privacy-preserving Federated Learning based on the NVFlare framework, enabling decentralised model training across institutions12.

As of September 2025, there are 19 clinical research projects on-going (13 projects with data loaded on the platform, 6 projects to be loaded on the platform after IRB approval) which allowed us to integrate within the AI platform our 5 major IT data sources: EHRs, pathology, lab tests, PACS, eCRF (e.g., RedCap), and disease-specific internal databases for a total of 31276 patients (Fig. 1A). The projects span the following domains: oncology (8), cardiovascular disease (6), neuroendocrine disorders (3), neurosciences (2). Examples of type of data imported for some research projects are shown in Fig. 1b, c. A synthesis of the investigated clinical research questions, the number of included patients, and the type of imported data for each of the project is shown in Table 1.

Fig. 1: Overview of the total number of unique patients loaded in the S-RACE platform (top) and examples of data types loaded in the S-RACE platform showcased for few clinical research projects (bottom).
Fig. 1: Overview of the total number of unique patients loaded in the S-RACE platform (top) and examples of data types loaded in the S-RACE platform showcased for few clinical research projects (bottom).The alternative text for this image may have been generated using AI.
Full size image

A Overview of the total number of unique patients automatically imported in the S-RACE platform for each clinical research projects (research projects can share the same patients for similar inclusion criteria). Each project contributed to improve the automated ingestion of related RWD from our hospital’s data warehouse. There are 31276 unique patients loaded in the platform, corresponding to 13 research projects. Multiple research projects can share the same type of patients but tackle different research questions. Follow up data, when available, are automatically updated with a night batch routine. B Examples of PowerBI dashboards of four oncology research projects (metastatic lung cancer [mNSCLC], kidney cancer [ccRCC], metastatic colorectal cancer [mCRC], liver and bile duct cancer), multiple sclerosis, type 1 diabetes mellitus [T1DM] and the detailed type of data available for a specific research project. The number of patients indicated represents the final size of the original cohort used to develop the ML models, obtained after updating the inclusion criteria based on the refinement clinical research questions and the automated pre-processing pipeline for quality check of the original data (e.g., removal of patients with missing outcome data). C Examples of PowerBI dashboards for the research projects spanning the medical imaging domain. These projects were fundamental in supporting the automated integration of the hospital’s PACS in the S-RACE platform. AKI Acute Kidney Injury, TAVI Transcatheter Aortic Valve Implantation, CAD Cardiovascular Disease, ccRCC Clear Cell Renal Cell Carcinoma, mCRC Metastatic Colorectal Cancer, mNSCLC Metastatic Non-Small Cell Lung Cancer, T1DM Type-1 Diabetes Mellitus, T2DM Type-2 Diabetes Mellitus.

Table 1 Overview of the on-going clinical research projects and the type of data used

To validate the platform’s core capability of generating high-quality data, we developed a pre-operative AI model to predict cancer-specific mortality in patients with non-metastatic clear cell renal cell carcinoma (ccRCC)13. In the initial ‘Business Understanding’ phase for this project, we conducted a systematic literature review to define the clinical problem and identify key prognostic variables. The cited review was instrumental for this purpose, providing a comprehensive list of established factors that we subsequently used to validate the successful feature extraction by our RWD processing pipeline. This project served as a direct test of S-RACE’s data processing pipeline. We utilised two distinct datasets from over 2000 patients: a manually curated clinical dataset (eCRF), representing the traditional ‘gold standard’ for research but with a limited number of variables, and a dataset of raw, unstructured RWD (more than 200 variables) automatically ingested and processed by the S-RACE platform (Fig. 2, top panel). The central experiment was to compare the performance of AI models developed on these two data sources. Following automated data processing, models were developed in the Data Science Lab using a hybrid strategy that balanced predictive power with clinical interpretability: a Random Survival Forest model was used for feature selection, and a Cox Proportional Hazard model was compared to end-to-end ML algorithms such as survival trees. The key finding was that models trained on the automatically processed RWD performed comparably to those trained on the manually curated dataset. Furthermore, by applying Explainable AI (XAI) techniques, we not only confirmed the importance of known clinical predictors but also identified novel prognostic variables present only in the raw RWD (Fig. 2, medium panel). This result provides strong evidence that the S-RACE platform can successfully transform complex, raw clinical data into reliable, research-grade evidence, thereby overcoming a primary bottleneck in the development of scalable and trustworthy AI for healthcare. The model was then implemented as a Web-based app to further speed up ease of use within the clinic and by external collaborators (Fig. 2, bottom panel).

Fig. 2: Schematic summary of the ccRCC research project.
Fig. 2: Schematic summary of the ccRCC research project.The alternative text for this image may have been generated using AI.
Full size image

Top panel: overview of the ML design process. Different ML models were employed considering a combination of “black box” models as preliminary feature selectors (e.g., Survival Random Forest). Selected features are then combined using “white box” models such as Cox Proportional Hazard. Developed models have then been externally validated (TRIPOD type III) and benchmarked against state-of-the art clinical models (e.g., GRANT) endorsed by clinical societies. Medium panel: all the models, during training and validation, are always challenged with explainability protocols. Here we show the use of feature importance and SHAP values to rank the eight prognostic pre-operative features. Bottom panel: the final model was released as a Web-based app to further promote prospective or additional external validation. Data can be inserted manually using the insertion form (panel A). For each individual patients, results are reported: risk assignment and individual KM curve compared to the KM of the cohort used to train the model (panel A), SHAP individual explanations (panel B), and distribution of single patient’s values with respect to the training cohort (panel C).

The second example demonstrates how the S-RACE platform’s responsible AI capabilities can enhance model development, particularly when working with smaller, more specialised cohorts. Following the CRISP-DM methodology, the ‘Business Understanding’ phase of the TAVI project involved a systematic review of the literature. The cited review was critical for defining the clinical challenge of predicting treatment futility and identifying existing risk stratification models. This step informed our project objectives and provided the necessary benchmarks for the subsequent ‘Evaluation’ phase. The project aimed to identify patients with severe aortic stenosis who were unlikely to benefit from Transcatheter Aortic Valve Implantation (TAVI)14, using a high-quality dataset of approximately 500 patients. Given the limited cohort size, ensuring model robustness was paramount. The S-RACE Data Science Lab enabled the implementation of a sophisticated stratified nested cross-validation strategy, which is critical for generating reliable and unbiased performance estimates from smaller datasets (Fig. 3, top panel). More importantly, this project leveraged the platform’s integrated tools for responsible AI to move beyond standard accuracy metrics. A decision tree-based error analysis was conducted to automatically identify specific patient subgroups where the model was most likely to make erroneous classifications. By pinpointing these areas of underperformance, this methodology allows for targeted model refinement to improve fairness and clinical utility (Fig. 3, bottom panel). To prevent any risk of data leakage, our error analysis workflow is strictly partitioned. During the iterative development cycle, the analysis is performed exclusively on the validation set to guide model debugging and refinement, while the held-out test set remains untouched. Once all development is complete, the analysis is then applied a single time to the test set in a purely descriptive capacity. In this final stage, it does not guide any model changes but instead serves to complement aggregate metrics, enhancing transparency by documenting the final model’s performance across key subgroups. This analysis also serves as a guide for targeted data acquisition; by understanding the characteristics of patient profiles where the model is weakest, we can leverage the S-RACE platform’s data ingestion capabilities to automatically retrieve additional, relevant data from the hospital’s IT systems. This creates a powerful feedback loop for continuous model improvement, demonstrating a significant step towards developing more precise and equitable AI models for clinical decision support.

Fig. 3: Schematic summary of the severe aortic stenosis research project.
Fig. 3: Schematic summary of the severe aortic stenosis research project.The alternative text for this image may have been generated using AI.
Full size image

Top left panel: overview of the ML analysis pipeline. The ML models are developed on the internal cohort using a nested cross validation. Nested cross validation was chosen because it provides an unbiased and more reliable estimate of a model’s true performance by separating hyperparameter tuning from final evaluation, preventing data leakage and promoting robust generalisation compared to standard cross validation. Following a similar approach of the kidney cancer research project, the best performing model is then re-trained on all the internal cohort and then externally validated. Top right panel: application of the responsible AI error decision tree. Based on the features included in the final model, a decision tree is trained to optimise the split among subpopulations with the highest / lowest number of erroneous classifications.

A key strength of the S-RACE platform, demonstrated in this research project, is its ability to agnostically ingest all available data for a patient, including large-format data like medical images, and use them to create a “deep patient phenotype”. This enables the extraction of “opportunistic biomarkers” from imaging studies, such as the pre-TAVI planning total body CT scans, that were performed for routine clinical care. The platform automates the analysis of these images, moving beyond standard cardiac measures like annular dimensions, ejection fraction and chamber size. Using deep learning-based segmentation and radiomics, it quantifies features such as the volume and signal distribution of abdominal fat, muscle, and bone, as well as organs like the liver and kidney. This holistic characterisation allows for the identification of subclinical comorbidities and vulnerabilities not captured in standard clinical reports, providing a richer dataset to enhance prognostic models and improve their accuracy. As shown in Table 1, the deep learning image analysis solutions developed within one specific project will be applied to the images of the other cohorts.

Discussion

The S-RACE platform has been developed to address the fundamental challenge in clinical AI: the need for a scalable and governed process to transform raw, heterogeneous hospital data into high-quality, research-grade RWD. The transformation of raw RWD into trustworthy evidence requires more than just technical data cleaning; it demands a rigorous and principled approach to the entire research lifecycle, as outlined in frameworks like the PRINCIPLED checklist for RWD re-use15. The S-RACE platform was designed to be a comprehensive ecosystem that provides researchers with the functionalities to operationalise such a principled approach. For instance, each study on S-RACE starts with a clear definition of a research question and a systematic literature review within the CRISP-DM framework to support a robust study design. The S-RACE platform provides clinicians with interactive tools in the Clinician AI Hub to define endpoints and cohorts. To address the critical challenge of confounding, S-RACE focuses on creating a holistic view of a patient by integrating multi-modal data, thereby providing a richer set of covariates for adjustment in statistical models. For bias remediation, our ‘Responsible AI Development’ pillar integrates tools like the Microsoft Responsible AI Toolbox to systematically assess fairness and identify subgroup underperformance. Finally, transparency and reproducibility are enforced through the mandatory use of MLflow for experiment tracking and a comprehensive AIMe-based Model Registry for transparent documentation from development through to deployment.

While other notable platforms and frameworks such as N3C16, i2b2 transMART17, MSK-CHORD18, and Ehrapy19 share the goal of advancing RWE, S-RACE is distinguished by several key architectural and philosophical choices that prioritise data quality, security, and regulatory readiness, as summarised in Table 2.

Table 2 Comparison of various platforms and frameworks designed for the ingestion and use of RWD

A primary differentiator is our strategic emphasis on data quality as the foundational output. The platform is engineered first and foremost as an engine for data curation. This is exemplified by our hybrid-cloud architecture, which features a mandatory on-premises anonymisation step. This ‘privacy by design’ approach ensures sensitive patient data never leaves the hospital’s secure environment before being pseudonymised. This is not merely a technical choice but a core governance principle that builds institutional, clinical, and patient trust, and it contrasts with models that may transfer raw identifiable data to the cloud, increasing the attack surface and complicating regulatory compliance.

The S-RACE platform’s design aligns with key frameworks for trustworthy research. It operationalizes the FAIR principles by ensuring data are Findable and Accessible via a governed hub, Interoperable through the mandatory FHIR standard, and Reusable thanks to comprehensive documentation in the AIMe-based Model Registry. The platform’s emphasis on detailed metadata in the Data Quality Checklist and Model Registry also adheres to the MINERVA framework20. Finally, S-RACE supports the PRINCIPLED15 checklist by providing an ecosystem for robust study design, bias remediation using integrated tools, and better handling of confounding through multi-modal data integration.

Furthermore, S-RACE is deeply integrated within the Microsoft Azure ecosystem. This deliberate choice provides a cohesive, enterprise-grade environment that leverages a suite of interoperable tools for every stage of the pipeline. Using a single, secure cloud environment for data processing (Cognitive Health Services), model development (Azure ML Studio), and collaboration simplifies security and identity management, streamlines workflows, and facilitates easier auditing compared to assembling a solution from multiple, disparate vendors. This tight integration ensures both robust performance and a clear chain of custody for data and models.

A crucial aspect of the S-RACE vision is its role as a catalyst for collaborative research. The platform is not merely a technical tool, but a managed ecosystem designed to bring together clinicians and data scientists from multiple institutions. Governance is embedded into the collaborative workflow: each project operates within a segregated workspace with role-based access controls, ensuring that researchers only see the data relevant to their approved study. This structure supports two powerful modes of collaboration. First, it allows for centralised analysis, where external partners can securely access and work with curated, high-quality datasets. Second, it is equipped for privacy-preserving federated learning, enabling the development of more generalisable models by training algorithms across decentralised datasets without ever moving sensitive patient data. This dual capability makes S-RACE a flexible and powerful hub for multi-centre studies, accelerating scientific discovery by creating larger, more diverse virtual cohorts while upholding the highest standards of data protection and project governance.

The versatility of the S-RACE platform is another key strength. Unlike more specialised platforms focused on a single disease area, it is currently populated with disease-specific data for a total of 31,276 patients, powering 19 distinct clinical research projects across diverse domains including oncology, cardiology, and diabetes. This demonstrates the platform’s technical scalability and, more importantly, the successful implementation of a standardised, repeatable data curation pipeline. This proves its value as a central institutional asset that can break down data silos, foster cross-disciplinary research, and maximise the return on investment in data infrastructure.

The clinical examples presented in this paper serve to illustrate these strengths in practice. The kidney cancer research project provides direct validation for our primary mission: by showing that models built on automatically ingested RWD can perform as well as those built on manually curated data, we demonstrate the platform’s success as a scalable data curation engine. The aortic stenosis research project highlights the next layer of the platform’s capabilities, showing how this high-quality data foundation enables more advanced and responsible AI development. It showcases how S-RACE facilitates the creation of deep patient phenotypes through the extraction of opportunistic biomarkers from imaging data, and how its integrated tools can be used to analyse model fairness and guide a continuous feedback loop of improvement.

Finally, the entire S-RACE framework was built with proactive alignment to the evolving regulatory landscape. Its core features directly address the requirements of standards like ISO 42001:2023 and the EU AI Act. For instance, the centralised model registry provides the versioning and detailed documentation essential for traceability, while the integrated responsible AI tools for error analysis and fairness assessment directly support the risk management and bias mitigation mandates of these regulations. By prioritising the generation of high-quality, reliable data within a responsible and collaborative framework, S-RACE provides a robust, future-proofed solution to accelerate the development and translation of trustworthy AI in medicine.

Beyond its technical capabilities, S-RACE is fundamentally a collaborative ecosystem. It is engineered to unite clinicians and data scientists from multiple institutions, supporting both centralised analysis within its secure environment and privacy-preserving federated learning for studies where data cannot be shared. This collaborative union is achieved by providing two distinct, purpose-built environments that operate on the same governed data foundation: The Clinician AI Hub a no-code, interactive environment, allows clinicians to use data visualisation tools to explore cohorts and assess data quality without requiring programming knowledge; The Data Science Lab is a parallel environment providing data scientists with a comprehensive suite of tools in Microsoft Azure ML Studio for advanced model development and validation. This dual-environment structure effectively bridges the expertise gap. Clinicians can define clinically meaningful problems using accessible tools, while data scientists can apply rigorous computational methods to the exact same curated data. This dual capability establishes S-RACE as a powerful hub for multi-centre research, accelerating the creation of more generalisable and robust AI models. The importance of this work lies in its demonstration of a robust, governed, and scalable environment for RWD curation. S-RACE provides a trustworthy foundation to accelerate the development and clinical adoption of responsible AI. The deep integration with Microsoft Azure raises important considerations regarding data privacy and vendor interoperability, which we address through specific governance and technical choices. First, we state unequivocally that Microsoft, as the cloud provider, has no technical or legal access to any patient data hosted within the S-RACE platform; all data remains under the exclusive control of our institution. Second, to mitigate vendor lock-in and ensure interoperability, the platform relies on open standards. All curated data are structured in the Fast Healthcare Interoperability Resources (FHIR) format, ensuring it can be exported and used in other systems. Furthermore, our support for the open-source NVFlare framework enables privacy-preserving federated learning, allowing for direct collaboration with institutions regardless of their underlying infrastructure, thus promoting a vendor-neutral research ecosystem.

A key limitation of the current S-RACE platform is its primary focus on predictive / prognostic modelling rather than formal causal inference. While the integrated Microsoft Responsible AI libraries provide tools for related tasks—such as generating individualised counterfactual explanations with DiCE or estimating population-level treatment effects with EconML—the platform does not yet automate the rigorous design required for robust causal claims, such as systematic confounder selection or formal “prediction under intervention” analyses. Establishing a full causal inference framework remains a valuable direction for future work. Furthermore, a related challenge for any deployed model that requires further investigations is the potential for performance degradation due to distributional changes over time. The S-RACE platform is designed to mitigate this risk through its governance structure. Every model promoted to the central Model Registry is registered with metadata establishing a baseline reference for its training data distribution and performance. Our post-deployment protocol includes the continuous monitoring of outcome distributions and model calibration against this baseline. Any significant deviation is flagged to a governance committee, which can trigger model recalibration or retraining to ensure its continued safety and efficacy in a real-world clinical setting. S-RACE uses pseudonymisation for secure, longitudinal clinical research data linkage, a capability precluded by full anonymisation. The platform ensures GDPR compliance via a ‘privacy by design’ architecture, storing the vital mapping key securely on-premises, separate from the cloud-processed data. However, research on full anonymisation techniques (e.g. k-anonymisation and synthetic data) is currently ongoing but not yet released. Another limitation is that curating Real-World Data (RWD) by excluding records with poor quality or missing information introduces a risk of selection bias, as the excluded patients may systematically differ from the final study cohort. To mitigate this, we employ the platform’s Exploratory Data Analysis (EDA) tools to thoroughly characterize and compare both the excluded and included populations, thereby ensuring transparency regarding any potential bias. Furthermore, we minimize exclusions—typically limiting them only to records missing a primary outcome—by using imputation techniques. We also address the risk that researchers might unintentionally build models that merely confirm pre-existing hypotheses. This is mitigated through two primary safeguards: a structured research process guided by the CRISP-DM framework, and the mandatory application of Explainable AI (XAI) techniques, such as SHAP. XAI significantly enhances transparency by revealing whether a model is relying on spurious correlations or on the hypothesized clinical factors, which directly challenges and validates researchers’ underlying assumptions.

Methods

The platform’s robust AI capabilities are powered by a suite of Microsoft technologies, forming a comprehensive ecosystem for RWD processing and AI development21:

  • Microsoft Cognitive Health Services: Utilised for advanced Natural Language Processing (NLP) and the application of medical ontologies, these services are crucial for extracting structured, clinically relevant information from unstructured clinical text, such as physician notes and reports. The NLP pipeline is based on the commercial product by Microsoft Text Analytics for Health (TA4H, https://tinyurl.com/mxnysfav) which performs anonymisation of the clinical notes and then extracts medical concepts and relations among these concepts using standard ontologies such as UMLS (Unified Medical Language System). The common data model is the FHIR (Fast Healthcare Interoperability Resources). Additional data models such as OMOP can be obtained, for example, using conversion tools (e.g., FHIR to OMOP, https://build.fhir.org/ig/HL7/fhir-omop-ig/).

  • Microsoft Power BI (Business Intelligence): Integrated within the Clinician AI Hub, Power BI enables intuitive data visualisation and preliminary analysis, allowing clinicians to explore insights from RWE in an accessible format.

  • Microsoft Azure ML Studio: This comprehensive and scalable environment supports the entire ML model development and deployment lifecycle within the Data Science Lab, providing data scientists with the tools needed for robust model creation. Azure Machine Learning Studio allows to model any type of clinical questions from regression to classification and survival analysis by using both a “white box” modelling approach (e.g., logistic regression, Cox models) and “black box” approaches (e.g., random forests). In this last scenario, explainability tools are used for clinical understanding of the results.

  • Standardised Explainability Techniques: A core component of the Data Science Lab, these techniques are employed to enhance the transparency and interpretability of AI models, addressing the ‘black box’ challenge and building trust among clinical users.

The S-RACE platform integrates data from five primary institutional IT systems to build a comprehensive, multi-modal data foundation for research. The core data types include: (i) Electronic Health Records (EHRs), providing clinical and demographic information; (ii) Pathology and Laboratory systems, providing histopathology and lab test results; (iii) the Picture Archiving and Communication System (PACS), for medical imaging such as CT, PET, and MRI; (iv) Genomics data for multi-omics studies; and (v) research-specific sources like electronic Case Report Forms (eCRFs) and internal databases. This integration enables the creation of a multimodal view of patients’ data for everyone in the research cohort. The specific data types utilised for each of the 19 ongoing clinical research projects are detailed in Table 1.

The platform’s architecture is fundamentally shaped by a ‘privacy by design’ approach, in full compliance with the EU’s General Data Protection Regulation (GDPR). To correct an inconsistency in the original manuscript, we clarify that the process employed is strictly pseudonymisation, as legally defined under GDPR. The on-premises data ingestion engine automatically removes all direct patient identifiers (e.g., name, national health service number, medical record number) and replaces them with a unique, irreversible cryptographic hash. The mapping key that links this pseudonym back to the original patient identifier is stored exclusively within the hospital’s secure on-premises infrastructure and is never exposed to the cloud environment. This technical segregation is the primary control that substantiates our ‘privacy by design’ claim, serving as an auditable safeguard that ensures sensitive patient data is protected before leaving the hospital’s trusted domain. This approach is the specific technical safeguard that enables our legal basis for processing under GDPR: scientific research that allows for longitudinal patient follow-up, which requires re-linkability. While our primary compliance framework is GDPR, these technical and organisational measures also align with the core principles of other international standards, such as the security and privacy rules within HIPAA. Furthermore, our use of Microsoft Azure services for data processing is governed by a formal Data Protection Addendum (DPA). This contractual agreement legally obligates Microsoft to adhere to its responsibilities as a data processor under GDPR, ensuring that all data handling meets the required compliance standards.

Although the implementation presented in this paper relies on Microsoft Azure services, the underlying methods and workflows (e.g., data ingestion, storage, processing, and analysis pipelines) are cloud-agnostic. Equivalent services exist in other cloud providers (e.g., AWS S3+EMR, Google Cloud Storage + Dataproc) or can be deployed on-premises using containerized solutions (e.g., Kubernetes + Spark). Therefore, the approach described is portable and not limited to Microsoft Azure.