Reimagining clinical AI: from clickstreams to clinical insights with EHR use metadata

Yan, Chao; Zhang, Xinmeng; Kannampallil, Thomas G.; Adler-Milstein, Julia; Chen, You

doi:10.1038/s44401-025-00040-5

Download PDF

Perspective
Open access
Published: 04 September 2025

Reimagining clinical AI: from clickstreams to clinical insights with EHR use metadata

Chao Yan¹^na1,
Xinmeng Zhang²^na1,
Thomas G. Kannampallil^3,4,
Julia Adler-Milstein⁵ &
…
You Chen^1,2

npj Health Systems volume 2, Article number: 33 (2025) Cite this article

2110 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Most clinical AI solutions center on patient data—particularly those from electronic health records (EHR)—while overlooking the clinician activities that shape care and signal patient outcomes. EHR use metadata, which capture fine-grained clinician-EHR interactions, represent a powerful yet underutilized resource that complements patient data. This perspective highlights the opportunity of integrating EHR use metadata with patient data to advance the development, evaluation, and real-world impact of clinical AI.

Harnessing EHR data for health research

Article 04 July 2024

Generating synthetic mixed-type longitudinal electronic health records for artificial intelligent applications

Article Open access 27 May 2023

CriteriaMapper: establishing the automatic identification of clinical trial cohorts from electronic health records by matching normalized eligibility criteria and patient clinical characteristics

Article Open access 25 October 2024

Introduction

Clinical Artificial Intelligence (AI) technologies built on electronic health record (EHR) data have demonstrated considerable promise in supporting various tasks such as disease diagnosis and phenotyping, outcome prediction, and clinical decision support¹. Numerous clinician-facing models that leverage large-scale clinical datasets have shown efficacy in improving healthcare delivery, operational efficiency, and evidence-based clinical decision-making². However, most models focus exclusively on EHR-derived patient data (hereafter referred to as patient data), assuming that AI-powered clinical tasks can be sufficiently supported by patient data alone. However, this perspective overlooks the pivotal role of clinician behaviors, which encompass the judgment, attitudes, and actions of clinicians throughout their decision-making processes in patient care. These behaviors not only reflect the collective insights of care teams into patients’ conditions and influence care outcomes, but also provide a critical lens for evaluating clinical AI technologies, thereby helping to ensure these technologies effectively improve patient care without introducing unintended consequences, such as workflow disruption³ and clinician deskilling^4,5. Therefore, taking a holistic perspective that combines both patient data and clinician behaviors can offer a more comprehensive view of clinical care and open new opportunities for building and assessing clinical AI applications.

EHR use metadata refer to a collection of various types of event logs—including audit log and other log types documenting information such as clinical decision support alerts and secure messages—that capture user interaction with the EHR, as well as the creation and use of clinical data⁶. They provide a granular, longitudinal, and objective record of how individuals (clinicians, administrators, and patients) engage with patient records and utilize various EHR functionalities for patient care activities^7,8. They systematically document actions performed within the EHR and system events triggered by these actions, specifying who, in what role, performed each action, what was done, on which patient’s record, when, and where. Recognizing that clinicians devote a substantial portion of their time to engaging with the EHR, over the past decade, researchers have increasingly leveraged EHR use metadata to characterize clinician EHR usage patterns and assess health system efficiency^9,10. While EHR use metadata may not serve as a precise proxy for clinician behaviors, key dimensions of these behaviors can be inferred, explicitly or implicitly, using well-designed analytical strategies and careful validation. Prior studies have introduced a broad range of metrics to evaluate various dimensions of clinical behaviors in the EHR, such as clinician daily workload^11,12, cognitive burden¹³, and patterns of collaboration among care team members^14,15. These metrics have been instrumental in revealing links between clinician behaviors and key patient outcomes^{9,10,16,17,18} (such as length of stay, mortality, and readmission risk) as well as clinician well-being^19,20 (such as burnout, job dissatisfaction, and intentions to leave the profession).

Patient data and EHR use metadata represent two interdependent yet fundamentally distinct information sources. Patient data (e.g., diagnoses, procedures, measurement results, and medication orders) characterize a patient’s health journey and the outcome of care. In contrast, EHR uses metadata capture the process by which those data is generated and used for care by encoding clinicians’ reasoning, prioritization, and workflow decisions, which might not directly determine clinical outcomes. Notably, these behavior traces can serve as proxies for unrecorded clinical observations and decisions. They also illuminate the context of how and why certain clinical observations are captured in the first place. For example, while a CT scan result in patient data may show no acute findings, EHR use metadata capturing how urgently the scan was ordered, how soon it was reviewed, and whether it led to additional workups (e.g., ordering additional laboratory tests or specialist referrals) can reveal escalating clinical concern. This action sequence reflects the clinician’s evolving level of suspicion regarding potential differential diagnosis, thus providing a more informative context than the scan result alone. Moreover, EHR use metadata can surface patient status that are absent from patient data. Consider a scenario in which a patient’s oxygen saturation level remains within normal limits that suggests stability; however, EHR use metadata indicating frequent reviews of respiratory data, rapid chart navigation, and multiple clinicians documenting notes in quick succession may reflect heightened clinical vigilance and can precede abrupt clinical deterioration, which offers early insights unavailable in patient data. Utilizing this complementary relationship is important because it conveys not only what happened to a patient but also the underlying clinical rationale, the care team’s interpretation of the patient’s condition, and the sequence of decision-making activities over time.

Recent attempts in clinical AI highlight the transformative potential of integrating information derived from EHR use metadata throughout the AI lifecycle, ranging from model development^{21,22,23,24,25,26,27} to post-deployment evaluation^28,29. This integration enables the creation of more adaptive, context-aware, and robust AI systems that can better withstand data shifts and facilitate continuous performance monitoring and improvement in real-world clinical settings. Here, we propose a paradigm shift in clinical AI toward a more integrated approach—one that harnesses both patient data and EHR use metadata to embed the collective insights and behaviors of care teams into model development and evaluation (Fig. 1a). By combining these complementary data sources that contribute unique aspects of clinical context, we can achieve a more holistic understanding of clinical care to inform the development and evaluation of clinical AI. This, in turn, paves the way for more effective and trustworthy patient care that is centered on the needs of both patients and clinicians.

**Fig. 1: Dual-Lens clinical AI framework combining patient data with clinician behaviors.**

This Perspective explores the current research landscape surrounding EHR use metadata, highlights the potential and opportunities of the new paradigm in clinical AI, and outlines key challenges and considerations for the future. We will use the terms EHR use metadata, metadata, and event logs interchangeably throughout the paper to refer to the data that characterize clinicians’ actions in the EHR.

Current use of EHR metadata in research

Beyond supporting privacy auditing mandated by HIPAA, the current use of EHR use metadata is predominantly shaped by its clinician-oriented (or EHR user-oriented more broadly) nature. Most existing research focuses on converting event logs (even as granular as mouse and keyboard clicks) into higher-level, clinically meaningful information to characterize clinician activities and understand care delivery that relies heavily on the EHR^{30,31,32,33,34,35}. Specifically, researchers have explored (1) deriving quantitative metrics, (2) mapping event logs to task-level workflows, and 3) modeling team structures and dynamics. These efforts seek to reveal patterns across various contexts (such as specialties^12,30,31, clinician demographics^34,36,37, national health systems³⁸, the COVID-19 pandemic³⁶, and reimbursement policy changes³⁹) to guide improvements in efficiency, coordination, and overall quality of care delivery.

Measuring EHR usage via diverse metrics

Both EHR vendors and researchers have increasingly designed and applied EHR use metadata-based metrics for different purposes^9,40. Various general metrics have been introduced^{12,41,42,43,44,45,46,47}, such as login frequency, session duration, frequency of actions, time spent on actions (e.g., total EHR time, note documentation time, activity outside normal working hours, time after patient check-out, message volume, and their normalized values). More sophisticated metrics often involve the navigational patterns across the EHR functionalities. For example, analyzing event sequences that reflect the context-specific intensity and variability of clinician EHR engagement enables the development of key metrics such as time spent managing the In-Basket^48,49,50, workload of follow-up actions triggered by alerts^51,52, and the frequencies of note template usage^53,54. These metrics help to dissect the temporal distribution of various EHR activities and reveal patterns in a specific context, thereby providing meaningful insights into EHR interaction efficiency, system usability, and associated burnout issues.

Mapping event logs to clinical tasks, workflows, and cognitive load

As individual events or actions recorded in EHR use metadata are often highly granular and fragmented, they lack meaningful clinical context when considered in isolation. Therefore, it is essential to aggregate event sequences into clinical tasks performed within the EHR^{10,55,56,57,58,59,60,61}. Once these tasks are delineated, they can be systematically mapped to broader workflows or pathways. This mapping serves not only to understand how clinicians navigate the system in parallel with providing real-world patient care but also to help characterize EHR-based cognitive burden. For example, Lou et al. develop metrics of attention switching from event logs as a proxy for cognitive burden¹⁷. The attention switching metric shows face discriminant validity as it is associated with increased total time in the EHR and wrong-patient errors. Similarly, large language model-based approaches have been used to develop an action-as-language framework to characterize cognitive burden²⁶. This framework has been used to identify cognitively burdensome tasks at scale, such as switching to and from the inbox. More broadly, identifying frequent task interruptions, deviations from standardized best practice, or prolonged searches for patient information and system functionalities can indicate increased cognitive load of clinicians, with implications for both clinician burnout and patient safety. The line of research, though facing challenges due to the complexity of raw event logs and the diversity of clinical workflows, enables the identification of friction points within the system, such as suboptimal interface design, excessive data fragmentation, and inefficient task flows. By addressing these issues, there is an opportunity to streamline workflows, alleviate cognitive burdens, and ultimately enhance overall clinical efficiency and care quality.

Characterizing team structures and dynamics

Growing evidence suggests that EHR use metadata offers valuable insights into how clinicians are structured as multidisciplinary teams and how they collaborate with each other in care delivery^9,10,40. By analyzing patterns such as concurrent logins and co-access behaviors, the structure of care teams and the roles of team members can be represented as a network^{16,55,62,63,64,65,66}, i.e., patient-sharing network (PSN). The topological features of a PSN, including measures such as centrality and betweenness, have been leveraged to characterize the exchange intensity of patient information and medical expertise across team members. Several studies have revealed that specific network configurations, such as densely interconnected care teams and those with extensive collaborative experience, tend to be associated with better patient outcomes, whereas other network configurations, such as fragmented or sparse networks, are often linked to less favorable results^16,18,67,68. Importantly, this network-based approach also enables the tracking of how collaboration patterns evolve over time, including significant shifts observed in response to major disruptions such as the COVID-19 pandemic^63,66. While PSNs may not capture every aspect of real-world collaboration, this approach has been accepted as a useful proxy for revealing the underlying patterns of information exchange and professional interaction, shedding light on how staffing structures should be improved for better effectiveness.

An integrated clinical AI paradigm

EHR use metadata provides benefits that extend much further than the basic assessment of EHR utilization. Notably, patient data are not merely passive reflections of disease development, but rather, they are actively shaped by the sequential reasoning and decision-making of clinicians as they interpret evolving clinical evidence and respond to a patient’s health status (Fig. 1b). Each clinical data point (such as a laboratory test result, a disease diagnosis, and an initiation of an intervention) reflects not only a patient’s physiological state but also the cognitive judgments and decision pathways of the care team. This dynamic interdependency between patient physiology and clinician decision-making implies that most modeling objectives of clinical AI are inherently determined by both patient-specific factors and clinician-driven choices. Therefore, it is imperative for clinical AI models to embrace an integrated approach that combines both information to ensure a more contextually grounded representation of clinical practice.

The conceptual framework depicted in Fig. 1a, which we named “Dual-Lens clinical AI” (DL-ClinAI), illustrates the new paradigm of clinical AI lifecycle advocated in this Perspective. DL-ClinAI comprises three key components: (1) aligning patient data and EHR use metadata along a common temporal axis, (2) selecting appropriate AI models to learn the complex relationships either between dual-lens features and the target objective (i.e., training task-specific clinical AI models) or within dual-lens features themselves (i.e., developing clinical AI foundation models), and (3) leveraging the dual-lens features to assess the impact of clinical AI tools throughout their development and deployment.

Data alignment

The objective of this component is to achieve coherent integration of patient data and EHR use metadata along a shared temporal framework. This goes beyond simple chronological matching and requires careful consideration in addressing the complexities involved in aligning distinct data streams. First, timeframe unification is necessary to map patient-specific clinical data points (e.g., a laboratory test result or an administered medication) and clinician actions captured by EHR use metadata (e.g., a nurse viewing the patient’s flowsheet or a specialist appending clinical notes) onto a single temporal framework. Since all these events are timestamped in the EHR, initial temporal alignment is relatively straightforward. However, this alignment can be further processed by anchoring events to critical clinical milestones (e.g., symptom onset, intervention initiation, hospital admission, or unit transfer), thereby establishing a richer clinical context valuable for downstream analysis.

Second, feature scope determination defines the range and resolution of the features selected for model development to ensure they are relevant and reflective of the clinical context. Deciding whether to incorporate the complete sequences of raw event data, filter for a specific subset, or apply an appropriate level of aggregation needs careful assessment. Not all raw clinical data points or clinician actions are equally meaningful for a specific downstream analysis. Domain experts may identify those that are essential to a specific modeling objective, such that key signals can be isolated from noise. For example, in a real-time patient deterioration surveillance system in acute or intensive care settings, Rossetti et al. use expert-determined features, including the frequencies of note writing, vital sign measurements and comments, medication administration, and a specific set of pertinent symptom terms from nursing notes, all of which are deemed to have strong predictive power for deterioration events^22,27. On the other hand, by mapping specific clinician action sequences to known clinical tasks or care processes, it is possible to convert fine-grained event data into aggregated features with updated timestamps that encapsulate broader clinical workflows⁵⁶. The analytical value of these features can be further enhanced by incorporating additional contextual metadata, such as information about care team composition and roles⁶⁹, staffing changes⁷⁰, shift handoffs⁷¹, or even patient-provider communications through the portal^72,73. These contextual elements enrich the feature set by providing insights into the organizational and interpersonal dynamics that might influence patient outcomes and clinician decision-making. For instance, changes in staffing levels or shift transitions might correlate with certain care processes, while portal logs might reveal early indicators of patient concerns or adherence issues.

Third, harmonizing data granularity is critical for creating a unified feature space from data recorded at varying levels of detail. Harmonization reconciles differences across data sources to ensure that atomic events and aggregated data are consistently aligned. This can be typically achieved by time windowing, which aggregates data into fixed intervals (e.g., minutes, hours, or days) such that events with high frequency can be represented as summary statistics. For example, in early sepsis detection, high temporal resolution, e.g., every 10 minutes, is required to capture transient fluctuations in vital signs (e.g., mean, maximum, and minimum values) and to track dynamics in clinician actions in the EHR (e.g., adjustments to medication administration and vital sign monitoring patterns). This level of granularity ensures that subtle but critical changes are preserved for analysis. In contrast, for chronic disease progression applications such as managing chronic kidney disease, a coarser granularity, such as weekly or monthly, may be preferable to track metrics like glomerular filtration rate, creatinine levels, and the frequency of medication regimen adjustments, chart reviews, and note documentation by specialty. This level of harmonization helps smooth out short-term noise and emphasizes longer-term trends, ultimately supporting the recognition of overarching patterns that inform more reliable outcome forecasting.

Fourth, validation of data alignment needs to be performed to ensure that the unified temporal framework accurately reflects the sequential clinical processes in real-world settings. In addition to confirming the correct sequencing of events—particularly those aggregated from atomic clinician EHR actions—using their timestamps, this process can involve cross-referencing key clinical milestones, such as admission or discharge times, diagnostic tests, and therapeutic interventions. Integrating expert clinical review alongside automated, rule-based sanity checks can help refine the alignment by embedding human insights into typical workflow patterns and the causal structures between patient physiological states and clinician actions.

Training task-specific clinical AI models

Upon the completion of data alignment, a wide range of AI-assisted clinical tasks can benefit from a dual perspective approach that integrates conventional patient data with enriched insights from EHR use metadata. This approach has the potential to transform diverse applications (Table 1) in a way that enables the modeling of the complex relationships (Fig. 1c): 1) between patient data and EHR use metadata, (2) between EHR use metadata and the target outcome, (3) between the interaction in (1) and the target outcome, and (4) within EHR use metadata, while preserving the traditional associations captured between patient data and the target outcome. Several recent clinical AI studies, aligned with the design principles of DL-ClinAI, have demonstrated substantial benefits in both performance and reliability. In a critical patient-level prediction task (i.e., daily hospital discharge), Zhang et al. enhance a tree-based machine learning model by integrating counts of distinct action types in the EHR performed by care team members during the past 24 hours, alongside conventional features such as daily updated patient data and day of week²³. This integration significantly improves the area under the receiver operating characteristic curve from 0.86 to 0.92. The key takeaway message is that EHR uses metadata to encode granular semantics about clinical workflows and implicitly conveys clinicians’ evolving assessments of patient discharge status. Interestingly, the most predictive feature for next-day non-discharge is the high frequency of medical device barcode scanning—a proxy for ongoing treatment activity, which intuitively indicates that the patient is not yet ready for discharge soon. In the context of clinical outcome forecasting, Bhaskhar et al. demonstrate that integrating clinician actions from EHR use metadata with structured EHR information significantly improves the prediction performance of major adverse kidney events within 120 days of ICU admission in patients with acute kidney injury, as well as 30-day readmission in acute stroke patients²⁴. Notably, this approach proves substantially more robust to temporal data distribution shifts, a common challenge in healthcare data that often undermines the reliability of clinical AI applications. These findings suggest that clinician actions in the EHR can enrich the contextual understanding of care and serve as a stabilizing factor that anchors model predictions in real-time clinical judgment and workflow dynamics. Another notable example is a 1-year, cluster-randomized clinical trial of 60 K hospital encounters across two institutions, where the early warning surveillance system of patient deterioration, powered by EHR use metadata, significantly reduced in-hospital mortality (−35.6%), length of stay (−11.2%), and sepsis risk (−7.5%) compared to the control arm²¹. Such solid evidence underscores the value of integrating EHR use metadata into clinical AI to deliver real-time, context-aware insights that guide clinician actions and support timely interventions.

Table 1 Examples of EHR-based clinical tasks DL-ClinAI can support

Full size table

The selection of models for training must be aligned with the specific clinical task. Because real-world clinical AI applications are executed on a recurring or continuous basis as the accumulation of new patient data and clinician actions in the EHR, the selected models must be able to use the most recent information that could update the model’s beliefs about its target training objective. In practice, this means prioritizing models that support rapid updating or incremental learning, offer mechanisms for data drift detection, and maintain calibration as the underlying data distribution evolves⁷⁴. Longitudinal models that can handle multivariate sequences, such as transformer-based encoders, temporal convolutional networks, and other recurrent neural networks, are particularly well-suited, because they can ingest the timeline-aligned data and continuously refine their internal representations to conduct predictions or classifications⁷⁵. When events far in the past add little value to a prediction or classification, and exact ordering of recent events is not critical, simpler models, such as shallow feed-forward networks and tree-based methods, may be preferable⁷⁶. These models often achieve strong performance without the computational overhead and infrastructure demands of more complex sequence-handling architectures.

Training clinical AI foundation models

Developing clinical foundation models from large-scale patient health records has recently gained considerable interest^{77,78,79,80,81}. The core rationale is straightforward: instead of collecting a bespoke dataset and training a separate model for every individual clinical task, one can pretrain a single, high-capacity foundation model using a unified, longitudinal clinical dataset and then adapt it to a broad range of downstream tasks. Pretraining enables the model to learn the semantics and latent structures of patient health trajectories by solving self-supervised objectives, such as predicting the next clinical event (e.g., diagnosis codes or laboratory test values) or reconstructing masked events along the timeline⁸². These objectives require no manual labels. Once pretrained, the foundation model can be quickly specialized through lightweight adaptation to support diverse clinical tasks across multiple domains⁸⁰. This mechanism not only dramatically reduces the burden of data curation and model development but also promotes knowledge transfer to low-resource settings, improving both scalability and generalizability of clinical AI.

The DL-ClinAI framework naturally extends to the development of dual-lens clinical foundation models, which can be pretrained on timeline-aligned patient data and EHR use metadata. This integrated pretraining enables the model to learn not only the progression of patient health states over time but also how clinicians act upon those evolving states within real-world clinical workflows. In this setting, self-supervised objectives can be applied to alternate between predicting clinician actions and patient-specific clinical events. Self-supervised contrastive learning can also be performed to align the patient-data stream with the corresponding clinician-action stream for the same patient and clinical context. By modeling this interdependency, the foundation model is expected to establish a rich and contextual representation of the interplay between patient health states and clinical decision-making. These patient representations are particularly valuable for downstream tasks that require sensitivity to workflow dynamics, practice variability, or team-based decision patterns, where patient data-based foundation models fall short. Importantly, the dual-lens foundation model offers a unique advantage: it can easily simulate clinician behavior sequences between clinical events, effectively filling contextual gaps where traditional models lack information. For example, it can infer whether a change in treatment was preceded by increased monitoring, a specialist consultation, or documentation activity, all invisible in patient data but critical for understanding the full care context. Additionally, this dual-lens foundation model can support developing a stronger clinical digital twin⁸³—a virtual, individualized representation of a patient’s physiological state over time that allows dynamic simulation of potential treatment strategy, monitoring and prediction of health trajectory, and early intervention and prevention, based on modeling of multi-modal patient data. The digital twin, powered by the dual-lens foundation mode,l can be leveraged to simulate the downstream effects of alternative care strategies, clinician responses, and EHR workflow configurations, which can enable direct analysis of how different clinician behaviors and operational patterns influence patient trajectories and outcomes. Specifically, such a digital twin can generate counterfactual scenarios to predict how a patient’s outcome might change under earlier monitoring, delayed documentation, or different triage strategies, such that the best practices, inefficiencies, and even medical errors can be identified in a data-driven, low-risk environment.

Impact assessment

Beyond evaluating the accuracy of the core algorithm of a clinical AI tool, the real-world impact of such a tool hinges on numerous factors that researchers often underappreciate, including, but not limited to, interface design, integration level into existing workflows, alert timing and volume, clinician training and trust, ongoing performance monitoring and recalibration, data quality safeguards, interoperability with other systems, and governance structures for oversight and accountability^84,85. Neglecting any of these factors can undermine even the most accurate model, leading to clinician frustration, workflow disruption, or patient safety risks that erode the tool’s intended benefits. DL-ClinAI underscores that the assessment of clinical AI tools must consider two synchronized streams of evidence: objective patient data and the granular behavioral traces clinicians leave in the EHR. Patient short-term and long-term outcomes reveal whether a clinical AI tool actually benefits care, while EHR use metadata show how the tool changes workload, decision pathways, potential automation bias, and all other aspects linking to outcome changes of both patients and clinicians. Only by examining these two strands of evidence in tandem can health systems determine whether innovations such as AI scribes, early warning systems, or draft-reply assistants truly help or simply shift workload, introduce new errors, or erode clinicians’ skills.

EHR use metadata has clear strength for assessing post-deployment impact of clinical AI across several complementary dimensions^28,29,86,87. First, timestamped clicks, scrolls, keystrokes, and order events make it possible to trace what clinicians do immediately after an AI-generated alert or suggestion appears (e.g., how long until an order is placed, whether an imaging study is ordered, or which EHR function or section they navigate to). These action sequences reveal whether a tool is actually used by clinicians and whether it truly streamlines care pathways or unintentionally inserts detours and delays. Second, traditional aggregate metrics like total time in the chart, after-hours click counts, or the number of simultaneously open patient charts can be used to quantify how the tool shifts effort and attention. Increases or decreases in these metrics may serve as proxies for cognitive load and burnout risk because of the clinical AI deployment. Third, simple but powerful indicators, such as the proportion of AI-generated text that clinicians sign without modification, the keystrokes or edits required to revise drafted replies before sending, the frequency of same-day note revisions, or the note-to-order latency, may surface automation bias, latent errors, and potential patient safety threats that would otherwise remain hidden. Additionally, capturing and analyzing what is edited (or canceled) and why can directly help refine the algorithm that leads to the modifications. Fourth, longitudinal trends in manual order entry, free text documentation length, template diversity, or resident versus attending contribution rates flag whether clinicians are maintaining core judgment and reasoning skills or drifting toward overreliance on AI assistance. Monitoring these patterns supports proactive retraining and safeguards clinical competence. Fifth, by systematically quantifying and comparing how workload is redistributed across the entire care team in aspects such as time-in-system by role, branching complexity, length of hand-off chains, frequency of simultaneous chart access, and any new coordination bottlenecks after tool deployment, health systems can reveal how team collaboration is impacted.

Challenges and opportunities

Heterogeneity in vocabularies and resolution

As EHR vendors differ as fundamentally as operating systems, innovations that rely on EHR use metadata must deal with cross-vendor architectural differences. Indeed, EHR use metadata differ widely in both action vocabulary and resolution^9,88,89. Each EHR system offers its own interface, feature set, and recommended workflows, resulting in a distinct catalog of atomic action types. A task that appears as a single, high-level event in one system might be recorded as a sequence of fine-grained clicks in another, while certain activities captured in one system may have no direct counterpart elsewhere. Further heterogeneity arises when medical institutions running the same vendor’s EHR system customize their local deployment and create site-specific action types. These variabilities complicate the creation of universal, meaningful metrics of EHR use, hamper cross-institutional benchmarking, and undermine the portability of clinical AI development. To fundamentally resolve these challenges, we advocate for (1) a vendor- and organization-agnostic action vocabulary, which creates a common semantic building-block layer, (2) shared logging specifications that mandate a minimum set of timestamped interaction elements, such that the essential information of each event is always captured and portable, and (3) robust mapping frameworks that link raw events to higher-level clinical workflow concepts. By combining a universal lexicon, an enforced logging standard, and robust mapping technology, the community can produce a durable foundation such that clinical AI models can be trained once and deployed broadly with appropriate adaptation. Standardization at these layers is also important for comparing clinician-EHR interaction patterns, validating AI tools across sites, and ultimately realizing scalable, trustworthy clinical AI deployments. Alternatively, the heterogeneity can be mitigated technologically by using or finetuning a domain-specific language embedding model to encode textual descriptions of action types into a shared latent space, such that semantically similar actions from different vendors or local customizations cluster together. This enables clinical AI models to reason over semantically equivalent actions without relying on vendor- or institution-specific vocabularies. Nevertheless, it should be noted that, to account for potential data shift, any model trained at one institution should undergo formal external validation and calibrated fine-tuning before clinical use elsewhere.

Noise in data

Raw EHR use metadata are inherently noisy because they record every interaction—whether it is part of a clinical workflow, training, or illegal accesses that breach patient privacy⁹⁰. Duplicated clicks and keystrokes, automatic pop-up notifications, system-initiated refreshes, asynchronous logging that produces partially recorded action sequences, and interleaved event streams from multitasking or shared workstations all add to the complexity of turning these traces into informative features and models. Further complicating factors are machine-generated entries and workflow artifacts produced to satisfy process requirements or institutional policies. In the worst case, when left unfiltered, the aforementioned artifacts can mislead model training to overfit to interface peculiarities rather than clinically relevant behaviors and can mask safety-critical actions. Mitigating this risk demands rigorous data preprocessing for building reliable clinical AI systems, such as removing duplicated events, filtering out sequences from irrelevant roles, feature selection by human experts, aggregating low-level actions to higher-level tasks, re-ordering out-of-order events or tasks, as well as post-hoc model explainability techniques that identify the influential features or subsequences. Notably, identifying and separating machine-generated entries has been made possible based on their patterns of occurrence.

Off-screen clinician behaviors

Despite the advantages of EHR use metadata, they record only the interactions that occur within the EHR, leaving out substantial dimensions of clinician behaviors that unfold off-screen^7,91. Many critical activities, such as team huddles, bedside hand-offs, informal hallway consultations, and direct patient-clinician communications, are invisible in event logs, yet they often drive decision pathways and care coordination. Equally elusive are the cognitive processes underpinning care (e.g., mental model updates and multidisciplinary deliberations), which might not be reliably inferred from click stream data alone. To bridge this gap, the development and assessment of clinical AI tools must be supplemented with richer clinician behavioral data sources, such as audio or video transcripts of interactions in the physical world, sensor outputs, and ambient AI data, so that they can leverage the full spectrum of clinician activity and decision making.

Potential bias

Integrating EHR use metadata into clinical AI development and evaluation at scale must safeguard against the propagation or amplification of pre-existing biases, ensuring that clinical decision-making remains fair and that research conclusions are not flawed. EHR use metadata mirrors the realities of resource constraints, documentation norms, staffing hierarchies, vendor- and institution-specific biases, and other dimensions where intentional or unintentional bias can arise during care. Previous research has shown, for example, that healthcare professionals engage with the EHR differently for patients from different demographic groups⁴¹. When such variations affect the thoroughness and precision of care, they can lead to quality disparities, thereby biasing all downstream model development and analysis if ingested without critical evaluation. These issues, however, are not unique to EHR use metadata and have analogs in almost all clinical data, suggesting that existing mitigation strategies can be adapted as potential solutions. Guarding against these risks requires periodic, stratified checks of input data, carefully selected model training strategies, and continuous performance auditing across dimensions like patient demographics, clinician roles, and care settings to detect and correct harmful biases. Based on emerging guidance for race-aware AI, practical safeguards can be applied across the lifecycle of DL-ClinAI⁹². At the data preprocessing stage, a dataset “nutrition-style” label can be created to document essential information such as institutional and workflow context, clinician expertise composition, and data missingness. When imbalances are identified, approaches such as stratified sampling, targeted oversampling, and synthetic data augmentation can help restore representativeness. For model development, bias-aware loss functions, constraint-based optimization, and adversarial learning can be leveraged to ensure model fairness. Trained models must be rigorously evaluated on temporally and institutionally held-out datasets, with stakeholder feedback informing model retraining decisions when necessary. Finally, deployment safeguards should include silent-mode validation prior to launch, followed by model performance and fairness dashboards that monitor potential drift and trigger recalibration or retraining to ensure an ongoing feedback loop.

Shaping the future through shared innovation

Pairing the behavioral signals embedded in EHR use metadata with traditional patient data opens a transformative path toward clinical AI that is not only more accurate but also more context-aware, trustworthy, and tightly aligned with real-world patient care. By revealing how clinicians act on patient data, this dual-lens approach enables AI systems to fit seamlessly into workflows and reveal opportunities for improvement. However, this vision requires addressing multifaceted challenges through concerted effort. We encourage researchers, clinicians, policymakers, and industry partners to join forces in establishing vendor-agnostic data standards, building scalable preprocessing pipelines, and developing bias-sensitive evaluation frameworks that support the effective and fair utilization of EHR use metadata throughout the lifecycle of clinical AI. Leveraging such collaboration, the healthcare community can shape an AI-enabled future where technology meaningfully augments clinical expertise and drives safer, more efficient, and more equitable healthcare delivery.

Data availability

No datasets were generated or analyzed during the current study.

References

Harnessing EHR data for health research | Nature Medicine. https://www.nature.com/articles/s41591-024-03074-8.
Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).
Article CAS PubMed Google Scholar
Adler-Milstein, J. et al. Meeting the Moment: Addressing Barriers and Facilitating Clinical Adoption of Artificial Intelligence in Medical Diagnosis. NAM Perspect. https://doi.org/10.31478/202209c. (2022)
Aquino, Y. S. J. et al. Utopia versus dystopia: Professional perspectives on the impact of healthcare artificial intelligence on clinical roles and skills. Int. J. Med. Inf. 169, 104903 (2023).
Article Google Scholar
Pavuluri, S., Sangal, R., Sather, J. & Taylor, R. A. Balancing act: the complex role of artificial intelligence in addressing burnout and healthcare workforce dynamics. BMJ Health Care Inf. 31, e101120 (2024).
Article Google Scholar
Rule, A. et al. Guidance for reporting analyses of metadata on electronic health record use. J. Am. Med. Inform. Assoc.31, 784–789 (2023).
Article PubMed Central Google Scholar
Adler-Milstein, J., Adelman, J. S., Tai-Seale, M., Patel, V. L. & Dymek, C. EHR audit logs: A new goldmine for health services research?. J. Biomed. Inform. 101, 103343 (2020).
Article PubMed Google Scholar
Kannampallil, T. & Adler-Milstein, J. Using electronic health record audit log data for research: insights from early efforts. J. Am. Med. Inform. Assoc. 30, 167–171 (2023).
Article Google Scholar
Rule, A., Melnick, E. R. & Apathy, N. C. Using event logs to observe interactions with electronic health records: an updated scoping review shows increasing use of vendor-derived measures. J. Am. Med. Inform. Assoc. 30, 144–154 (2023).
Article Google Scholar
Rule, A., Chiang, M. F. & Hribar, M. R. Using electronic health record audit logs to study clinical activity: a systematic review of aims, measures, and methods. J. Am. Med. Inform. Assoc.27, 480–490 (2020).
Article PubMed Google Scholar
Physician time spent using the electronic health record during outpatient encounters: a descriptive study. Ann Intern. Med 172, No 3. https://www.acpjournals.org/doi/10.7326/M18-3684.
Rotenstein, L. S., Holmgren, A. J., Downing, N. L. & Bates, D. W. Differences in total and after-hours electronic health record time across ambulatory specialties. JAMA Intern. Med. 181, 863–865 (2021).
Article PubMed PubMed Central Google Scholar
Tai-Seale, M. et al. Association of physician burnout with perceived EHR work stress and potentially actionable factors. J. Am. Med. Inform. Assoc. 30, 1665–1672 (2023).
Article PubMed PubMed Central Google Scholar
Chen, Y. et al. Modeling care team structures in the neonatal intensive care unit through network analysis of EHR Audit Logs. Methods Inf. Med. 58, 109–123 (2019).
Article PubMed Google Scholar
Yakusheva, O. et al. An electronic health record metadata-mining approach to identifying patient-level interprofessional clinician teams in the intensive care unit. J. Am. Med. Inform. Assoc. 32, 426–434 (2025).
Article PubMed Google Scholar
Chen, Y., Patel, M. B., McNaughton, C. D. & Malin, B. A. Interaction patterns of trauma providers are associated with length of stay. J. Am. Med. Inform. Assoc.25, 790–799 (2018).
Article PubMed PubMed Central Google Scholar
Lou, S. S. et al. Effect of clinician attention switching on workload and wrong-patient errors. Br. J. Anaesth. 129, e22–e24 (2022).
Article PubMed PubMed Central Google Scholar
Rose, C. et al. Team is brain: leveraging EHR audit log data for new insights into acute care processes. J. Am. Med. Inform. Assoc. 30, 8–15 (2023).
Article Google Scholar
Melnick, E. R. et al. Analysis of electronic health record use and clinical productivity and their association with physician turnover. JAMA Netw. Open 4, e2128790 (2021).
Article PubMed PubMed Central Google Scholar
Tran, B., Lenhart, A., Ross, R. & Dorr, D. A. Burnout and EHR use among academic primary care physicians with varied clinical workloads. AMIA Summits Transl. Sci. Proc. 2019, 136–144 (2019).
PubMed PubMed Central Google Scholar
Rossetti, S. C. et al. Real-time surveillance system for patient deterioration: a pragmatic cluster-randomized controlled trial. Nat. Med. 1–8. https://doi.org/10.1038/s41591-025-03609-7. (2025)
Rossetti, S. C. et al. Healthcare process modeling to phenotype clinician behaviors for exploiting the signal gain of clinical expertise (HPM-ExpertSignals): Development and evaluation of a conceptual framework. J. Am. Med. Inform. Assoc.28, 1242–1251 (2021).
Article PubMed PubMed Central Google Scholar
Zhang, X., Yan, C., Malin, B. A., Patel, M. B. & Chen, Y. Predicting next-day discharge via electronic health record access logs. J. Am. Med. Inform. Assoc.28, 2670–2680 (2021).
Article PubMed PubMed Central Google Scholar
Bhaskhar, N., Ip, W., Chen, J. H. & Rubin, D. L. Clinical outcome prediction using observational supervision with electronic health records and audit logs. J. Biomed. Inform. 147, 104522 (2023).
Article PubMed Google Scholar
Zhang, X. et al. Optimizing large language models for discharge prediction: best practices in leveraging electronic health record audit logs. AMIA Annu. Symp. Proc. 2024, 1323–1331 (2025).
Kim, S., Warner, B. C., Lew, D., Lou, S. S. & Kannampallil, T. Measuring cognitive effort using tabular transformer-based language models of electronic health record-based audit log action sequences. J. Am. Med. Inform. Assoc.31, 2228–2235 (2024).
Article PubMed PubMed Central Google Scholar
Rossetti, S. C. et al. Leveraging clinical expertise as a feature - not an outcome - of predictive models: evaluation of an early warning system use case. Amia. Annu. Symp. Proc. 2019, 323–332 (2020).
PubMed PubMed Central Google Scholar
Duggan, M. J. et al. Clinician experiences with ambient scribe technology to assist with documentation burden and efficiency. JAMA Netw. Open 8, e2460637 (2025).
Article PubMed PubMed Central Google Scholar
Garcia, P. et al. Artificial intelligence–generated draft replies to patient inbox messages. JAMA Netw. Open 7, e243201 (2024).
Article PubMed PubMed Central Google Scholar
Sinsky, C. A., Rotenstein, L., Holmgren, A. J. & Apathy, N. C. The number of patient scheduled hours resulting in a 40-hour work week by physician specialty and setting: a cross-sectional study using electronic health record event log data. J. Am. Med. Inform. Assoc. 32, 235–240 (2025).
Article PubMed Google Scholar
Holmgren, A. J., Sinsky, C. A., Rotenstein, L. & Apathy, N. C. National comparison of ambulatory physician electronic health record use across specialties. J. Gen. Intern. Med. 39, 2868–2870 (2024).
Article PubMed PubMed Central Google Scholar
Rotenstein, L. et al. Virtual scribes and physician time spent on electronic health records. JAMA Netw. Open 7, e2413140 (2024).
Article PubMed PubMed Central Google Scholar
Rotenstein, L. S. et al. Association of primary care physicians’ Electronic Inbox activity patterns with patients’ likelihood to recommend the physician. J. Gen. Intern. Med. 39, 150–152 (2024).
Article PubMed Google Scholar
Li, H. et al. Quantifying EHR and policy factors associated with the gender productivity gap in ambulatory, general internal medicine. J. Gen. Intern. Med. 39, 557–565 (2024).
Article PubMed Google Scholar
Jay Holmgren, A., Steitz, B., Lou, S. & Apathy, N. Using Electronic Health Record Metadata to Understand Clinician Work and Behavior. In Reengineering Clinical Workflow in the Digital and AI Era: Toward Safer and More Efficient Care (eds. Zheng, K., Westbrook, J. & Patel, V. L.) 299–317 (Springer Nature Switzerland, Cham, https://doi.org/10.1007/978-3-031-82971-0_15. 2025).
Rotenstein, L. & Jay Holmgren, A. COVID exacerbated the gender disparity in physician electronic health record inbox burden. J. Am. Med. Inform. Assoc. 30, 1720–1724 (2023).
Article PubMed PubMed Central Google Scholar
Gupta, K. et al. Differences in ambulatory EHR use patterns for male vs. female physicians. Catal. Carryover 5, (2019).
Rotenstein, L. S. et al. System-level factors and time spent on electronic health records by primary care physicians. JAMA Netw. Open 6, e2344713 (2023).
Article PubMed PubMed Central Google Scholar
Holmgren, A. J., Thombley, R., Sinsky, C. A. & Adler-Milstein, J. Changes in physician electronic health record use with the expansion of telemedicine. JAMA Intern. Med. 183, 1357–1365 (2023).
Article PubMed PubMed Central Google Scholar
Tawfik, D. et al. Emerging domains for measuring health care delivery with electronic health record metadata. J. Med. Internet Res. 27, e64721 (2025).
Article PubMed PubMed Central Google Scholar
Yan, C. et al. Differences in health professionals’ engagement with electronic health records based on inpatient race and ethnicity. JAMA Netw. Open 6, e2336383 (2023).
Article PubMed PubMed Central Google Scholar
Cox, M. L. et al. Documenting or operating: where is time spent in general surgery residency?. J. Surg. Educ. 75, e97–e106 (2018).
Article PubMed PubMed Central Google Scholar
Read-Brown, S. et al. Time requirements for electronic health record use in an Academic Ophthalmology Center. JAMA Ophthalmol. 135, 1250–1257 (2017).
Article PubMed PubMed Central Google Scholar
Dziorny, A. C. et al. Automatic detection of front-line clinician hospital shifts: a novel use of electronic health record timestamp data. Appl. Clin. Inform. 10, 28–37 (2019).
Article PubMed PubMed Central Google Scholar
Hribar, M. R. et al. Secondary use of EHR timestamp data: validation and application for workflow optimization. Amia. Annu. Symp. Proc. 2015, 1909–1917 (2015).
PubMed Central Google Scholar
Hribar, M. R. et al. Secondary use of electronic health record data for clinical workflow analysis. J. Am. Med. Inform. Assoc. JAMIA 25, 40–46 (2018).
Article PubMed Google Scholar
Sinsky, C. A. et al. Metrics for assessing physician activity using electronic health record log data. J. Am. Med. Inform. Assoc. 27, 639–643 (2020).
Article PubMed PubMed Central Google Scholar
Avdagovska, M. et al. Exploring the impact of in basket metrics on the adoption of a new electronic health record system among specialists in a tertiary hospital in alberta: descriptive study. J. Med. Internet Res. 26, e53122 (2024).
Article PubMed PubMed Central Google Scholar
Akbar, F. et al. Physicians’ electronic inbox work patterns and factors associated with high inbox work duration. J. Am. Med. Inform. Assoc. 28, 923–930 (2021).
Article PubMed Google Scholar
Arndt, B. G. et al. Tethered to the EHR: Primary care physician workload assessment using EHR event log data and time-motion observations. Ann. Fam. Med. 15, 419–426 (2017).
Article PubMed PubMed Central Google Scholar
Amroze, A. et al. Use of electronic health record access and audit logs to identify physician actions following noninterruptive alert opening: descriptive study. JMIR Med. Inform. 7, e12650 (2019).
Article PubMed PubMed Central Google Scholar
Cutrona, S. L. et al. Primary care providers’ opening of time-sensitive alerts sent to commercial electronic health record inBaskets. J. Gen. Intern. Med. 32, 1210–1219 (2017).
Article PubMed PubMed Central Google Scholar
Rumlow, Z. et al. The impact of diagnosis-specific plan templates on admission note writing time: a quality improvement initiative. J. Grad. Med. Educ. 16, 581–587 (2024).
Article PubMed PubMed Central Google Scholar
Nguyen, O. T. et al. Primary care physicians’ electronic health record proficiency and efficiency behaviors and time interacting with electronic health records: a quantile regression analysis. J. Am. Med. Inform. Assoc. JAMIA 29, 461–471 (2021).
Article Google Scholar
Chen, B. et al. Mining tasks and task characteristics from electronic health record audit logs with unsupervised machine learning. J. Am. Med. Inform. Assoc. 28, 1168–1177 (2021).
Article PubMed PubMed Central Google Scholar
Lou, S. S., Liu, H., Harford, D., Lu, C. & Kannampallil, T. Characterizing the macrostructure of electronic health record work using raw audit logs: an unsupervised action embeddings approach. J. Am. Med. Inform. Assoc. 30, 539–544 (2023).
Article PubMed Google Scholar
Tiase, V. L., Sward, K. A. & Facelli, J. C. A scalable and extensible logical data model of electronic health Record Audit Logs for Temporal Data Mining (RNteract): model conceptualization and formulation. JMIR Nurs. 7, e55793 (2024).
Article PubMed PubMed Central Google Scholar
Zhang, X., Zhao, Y., Yan, C., Derr, T. & Chen, Y. Inferring EHR utilization workflows through audit logs. Amia. Annu. Symp. Proc. 2022, 1247–1256 (2023).
PubMed PubMed Central Google Scholar
Chen, Y., Adler-Milstein, J. & Sinsky, C. Measuring and maximizing undivided attention in the context of electronic health records. Appl. Clin. Inform. https://doi.org/10.1055/a-1892-1437. (2022)
Moy, A. J. et al. Characterizing multitasking and workflow fragmentation in electronic health records among emergency department clinicians: using time-motion data to understand documentation burden. Appl. Clin. Inform. 12, 1002–1013 (2021).
Article PubMed PubMed Central Google Scholar
Jones, B., Zhang, X., Malin, B. A. & Chen, Y. Learning tasks of pediatric providers from electronic health record audit logs. Amia. Annu. Symp. Proc. 2020, 612–618 (2021).
PubMed PubMed Central Google Scholar
Li, P. et al. Measuring collaboration through concurrent electronic health record usage: network analysis study. JMIR Med. Inform. 9, e28998 (2021).
Article PubMed PubMed Central Google Scholar
Mannering, H. et al. Assessing neonatal intensive care unit structures and outcomes before and during the COVID-19 pandemic: network analysis study. J. Med. Internet Res. 23, e27261 (2021).
Article PubMed PubMed Central Google Scholar
Chen, Y., Yan, C. & Patel, M. B. Network analysis subtleties in ICU structures and outcomes. Am. J. Respir. Crit. Care Med. 202, 1606–1607 (2020).
Article PubMed PubMed Central Google Scholar
Chen, Y., Lorenzi, N. M., Sandberg, W. S., Wolgast, K. & Malin, B. A. Identifying collaborative care teams through electronic medical record utilization patterns. J. Am. Med. Inform. Assoc. JAMIA 24, e111–e120 (2017).
Article PubMed Google Scholar
Yan, C. et al. Collaboration structures in COVID-19 critical care: retrospective network analysis study. JMIR Hum. Factors 8, e25724 (2021).
Article PubMed PubMed Central Google Scholar
Kelly Costa, D., Liu, H., Boltey, E. M. & Yakusheva, O. The structure of critical care nursing teams and patient outcomes: a network analysis. Am. J. Respir. Crit. Care Med. 201, 483–485 (2020).
Article PubMed PubMed Central Google Scholar
Kim, C. et al. Provider Networks in the Neonatal Intensive Care Unit Associate with Length of Stay. In 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC) 127–134. https://doi.org/10.1109/CIC48465.2019.00024. (2019)
Apathy, N. C., Holmgren, A. J. & Cross, D. A. Physician EHR time and visit volume following adoption of team-based documentation support. JAMA Intern. Med. 184, 1212–1221 (2024).
Article PubMed Google Scholar
Tang, M., Holmgren, A. J., Huckman, R. S., Pany, M. J. & McWilliams, J. M. Modalities, Mo Problems: impacts of provider modality switching in hybrid outpatient clinics. Acad. Manag. Proc. 2024, 13107 (2024).
Article Google Scholar
Jiang, S. Y., Hum, R. S., Vawdrey, D. & Mamykina, L. In search of social translucence: an audit log analysis of handoff documentation views and updates. Amia. Annu. Symp. Proc. 2015, 669–676 (2015).
PubMed PubMed Central Google Scholar
Lyles, C. R. et al. Using electronic health record portals to improve patient engagement: research priorities and best practices. Ann. Intern. Med. 172, S123–S129 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. Association between patient portal engagement and weight loss outcomes in patients after bariatric surgery: longitudinal observational study using electronic health records. J. Med. Internet Res. 26, e56573 (2024).
Article PubMed PubMed Central Google Scholar
Davis, S. E., Embí, P. J. & Matheny, M. E. Sustainable deployment of clinical prediction tools—a 360° approach to model maintenance. J. Am. Med. Inform. Assoc. 31, 1195–1198 (2024).
Article PubMed PubMed Central Google Scholar
Guo, L. L. et al. EHR foundation models improve robustness in the presence of temporal distribution shift. Sci. Rep. 13, 3767 (2023).
Article CAS PubMed PubMed Central Google Scholar
Brown, K. E. et al. Large language models are less effective at clinical prediction tasks than locally trained machine learning models. J. Am. Med. Inform. Assoc. ocaf038. https://doi.org/10.1093/jamia/ocaf038. (2025)
Wornow, M. et al. The shaky foundations of large language models and foundation models for electronic health records. Npj Digit. Med. 6, 1–10 (2023).
Article Google Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Article CAS PubMed Google Scholar
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Guo, L. L. et al. A multi-center study on the adaptability of a shared foundation model for electronic health records. Npj Digit. Med. 7, 1–9 (2024).
Article Google Scholar
Peng, C. et al. A study of generative large language model for medical research and healthcare. NPJ Digit. Med. 6, 210 (2023).
Article PubMed PubMed Central Google Scholar
Krishnan, R., Rajpurkar, P. & Topol, E. J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 6, 1346–1352 (2022).
Article Google Scholar
Katsoulakis, E. et al. Digital twins for health: a scoping review. NPJ Digit. Med. 7, 77 (2024).
Article PubMed PubMed Central Google Scholar
Embí, P. J., Rhew, D. C., Peterson, E. D. & Pencina, M. J. Launching the Trustworthy and Responsible AI Network (TRAIN): A Consortium to Facilitate Safe and Effective AI Adoption. JAMA https://doi.org/10.1001/jama.2025.1331. (2025)
Maddox, T. M. et al. Generative AI in Medicine — Evaluating Progress and Challenges. N. Engl. J. Med. 0
You, J. G., Hernandez-Boussard, T., Pfeffer, M. A., Landman, A. & Mishuris, R. G. Clinical trials informed framework for real world clinical implementation and deployment of artificial intelligence applications. Npj Digit. Med. 8, 1–5 (2025).
Article Google Scholar
McCoy, A. B. et al. Clinician collaboration to improve clinical decision support: the Clickbusters initiative. J. Am. Med. Inform. Assoc. 29, 1050–1059 (2022).
Article PubMed PubMed Central Google Scholar
Baxter, S. L., Apathy, N. C., Cross, D. A., Sinsky, C. & Hribar, M. R. Measures of electronic health record use in outpatient settings across vendors. J. Am. Med. Inform. Assoc. JAMIA 28, 955–959 (2021).
Article PubMed Google Scholar
Cohen, G. R., Boi, J., Johnson, C., Brown, L. & Patel, V. Measuring time clinicians spend using EHRs in the inpatient setting: a national, mixed-methods study. J. Am. Med. Inform. Assoc. JAMIA 28, 1676–1682 (2021).
Article PubMed Google Scholar
Wu, D. T. Y. et al. Using EHR audit trail logs to analyze clinical workflow: A case study from community-based ambulatory clinics. Amia. Annu. Symp. Proc. 2017, 1820–1827 (2018).
PubMed PubMed Central Google Scholar
Sinsky, C. et al. Allocation of physician time in ambulatory practice: a time and motion study in 4 specialties. Ann. Intern. Med. 165, 753–760 (2016).
Article PubMed Google Scholar
Were, M. C. et al. Role and use of race in artificial intelligence and machine learning models related to health. J. Med. Internet Res. 27, e73996 (2025).
Article PubMed PubMed Central Google Scholar
Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. Npj Digit. Med. 1, 1–10 (2018).
Article Google Scholar
Grabowska, M. E. et al. Developing and evaluating pediatric phecodes (Peds-Phecodes) for high-throughput phenotyping using electronic health records. J. Am. Med. Inform. Assoc. 31, 386–395 (2024).
Article PubMed Google Scholar
Hripcsak, G. & Albers, D. J. Next-generation phenotyping of electronic health records. J. Am. Med. Inform. Assoc. 20, 117–121 (2013).
Article PubMed Google Scholar
Yasrebi-de Kom, I. A. R. et al. Electronic health record-based prediction models for in-hospital adverse drug event diagnosis or prognosis: a systematic review. J. Am. Med. Inform. Assoc. 30, 978–988 (2023).
Article PubMed PubMed Central Google Scholar
Dos Santos, F. C. et al. The effect of a combined mHealth and community health worker intervention on HIV self-management. J. Am. Med. Inform. Assoc. 32, 510–517 (2025).
Article PubMed Google Scholar
Liu, S. et al. Leveraging explainable artificial intelligence to optimize clinical decision support. J. Am. Med. Inform. Assoc. 31, 968–974 (2024).
Article PubMed PubMed Central Google Scholar
Ozkaynak, M., Ponnala, S. & Werner, N. E. Patient-Oriented Workflow Approach. In Reengineering Clinical Workflow in the Digital and AI Era: Toward Safer and More Efficient Care (eds. Zheng, K., Westbrook, J. & Patel, V. L.) 213–229 (Springer Nature Switzerland, Cham, https://doi.org/10.1007/978-3-031-82971-0_11. 2025).
Sánchez-Salmerón, R. et al. Machine learning methods applied to triage in emergency services: A systematic review. Int. Emerg. Nurs. 60, 101109 (2022).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Library of Medicine of the National Institutes of Health under award numbers R01LM012854 and K99LM014428, and Agency for Healthcare Research and Quality under award number R01-HS029020.

Author information

These authors contributed equally: Chao Yan, Xinmeng Zhang.

Authors and Affiliations

Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA
Chao Yan & You Chen
Department of Computer Science, Vanderbilt University, Nashville, TN, USA
Xinmeng Zhang & You Chen
Department of Anesthesiology, Washington University School of Medicine, St. Louis, MO, USA
Thomas G. Kannampallil
Institute for Informatics, Data Science, and Biostatistics, Washington University School of Medicine, St. Louis, MO, USA
Thomas G. Kannampallil
Division of Clinical Informatics & Digital Transformation, Department of Medicine, University of California San Francisco, San Francisco, CA, USA
Julia Adler-Milstein

Authors

Chao Yan
View author publications
Search author on:PubMed Google Scholar
Xinmeng Zhang
View author publications
Search author on:PubMed Google Scholar
Thomas G. Kannampallil
View author publications
Search author on:PubMed Google Scholar
Julia Adler-Milstein
View author publications
Search author on:PubMed Google Scholar
You Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

C.Y., X.Z. and Y.C. conceptualized the idea and developed the outline. C.Y. and X.Z. drafted the initial manuscript. T.G.K. and J.A.M. critically reviewed and revised the manuscript. Y.C. supervised the work. All authors approved the final manuscript.

Corresponding authors

Correspondence to Chao Yan or You Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yan, C., Zhang, X., Kannampallil, T.G. et al. Reimagining clinical AI: from clickstreams to clinical insights with EHR use metadata. npj Health Syst. 2, 33 (2025). https://doi.org/10.1038/s44401-025-00040-5

Download citation

Received: 07 May 2025
Accepted: 23 August 2025
Published: 04 September 2025
DOI: https://doi.org/10.1038/s44401-025-00040-5