Abstract
The Mendelian Phenotype Search Engine (MPSE), a clinical decision support tool using Natural Language Processing and Machine Learning, helped neonatologists expedite decisions to whole genome sequencing (WGS) to diagnose patients in the neonatal intensive care unit. After the MPSE was introduced, utilization of WGS increased, time to ordering WGS decreased, and WGS diagnostic yield increased.
Similar content being viewed by others
Genetic disorders are a leading cause of death and disability for infants admitted to the neonatal intensive care unit (NICU)1. Rapidly diagnosing the underlying cause of critical illness and initiating targeted treatment are of paramount importance given the considerable morbidity and mortality associated with NICU admission1,2,3,4,5,6. Rapid Precision Medicine utilizing Whole Genome Sequencing (WGS) can help identify patients with genetic disease and thus facilitate care tailored to the individual5,6,7,8,9,10,11,12,13. However, due to economic considerations and clinician familiarity with WGS, deciding which patients should receive WGS in the NICU can be challenging6,7,12,13,14. We hypothesized that an automated clinical decision support tool utilizing machine learning to continually reassess the appropriateness of rapid WGS (rWGS) could assist neonatologists with patient prioritization for rWGS. The primary objective of this study is to determine if the inclusion of the MPSE as part of the patient evaluation would decrease the time to nomination.
A single-group study was designed to compare findings before and after the implementation of a clinical decision support tool. The clinical support tool, Mendelian Phenotype Search Engine (MPSE), was designed to utilize Machine Learning (ML) to leverage the Human Phenotype Ontology (HPO) terms to calculate scores for prioritizing patients for WGS15. The HPO provides a hierarchical representation of the clinical abnormalities observed in human disease and thereby facilitates computational analysis of patient phenotypes16,17. Natural Language Processing (NLP) tools can identify HPO terms found in Electronic Medical Record (EMR) notes that describe patient phenotypes related to Mendelian disease, allowing for analysis via machine learning (ML)18,19,20,21.
We developed a software pipeline to automatically extract HPO terms from all of the unstructured medical professional notes embedded within the EMR of patients recently admitted to the NICU. These HPO terms were used by the MPSE to compute a prioritization score that reflects the similarity of newly admitted NICU patients to observed phenotypes of patients within the NICU who previously received WGS15.
We performed this study in two phases. The objective of Phase 1 (the pre-implementation phase) was to collect baseline data on the number of babies nominated for WGS, the time to nomination (Fig. 1), and the diagnostic yield of WGS (defined as the proportion of WGS tests with pathogenic variants). During this phase, MPSE scores for each patient were computed daily but were not provided to the clinical team. During Phase 2 (the implementation phase), the attending neonatologists were provided with a daily report containing MPSE scores for each NICU patient on the census (Fig. 2). In order to avoid contributing to the alarm-fatigue phenomenon22,23, this MPSE report was presented to the neonatologists as part of their daily rounds to be considered as an additional piece of information to be taken into consideration when deciding which patients should receive WGS.
This shows the score percentiles for every patient admitted to the NICU during the duration of the study. For nominated patients, the MPSE score at the time of nomination is shown. For patients who were not nominated, the maximum MPSE score within the first seven days of their NICU admission is shown. We observe statistically higher score percentiles of nominated patients when compared to the not-nominated patients in both phases.
Three primary outcomes were measured: (1) the number and proportion of babies nominated for WGS, (2) the time from admission to nomination for WGS, and (3) the diagnostic yield of WGS.
In total, 118 patients were nominated for rWGS; 27 in Phase 1 (14 weeks, 1.9 nominations/week) and 91 in Phase 2 (38 weeks, 2.4 nominations/week) (Mann–Whitney–Wilcoxon two-sided test, p = 0.35); in both phases, 13% of the eligible patients in the NICU were deemed by the attending physician to benefit from rWGS. Of the nominated patients, 99 patients (83%) were enrolled and underwent WGS (reasons for decline listed in Supplementary Table 1); 25 from Phase 1 (1.8 enrollments per week) and 74 from Phase 2 (1.92 enrollments per week). Enrollment rates were not significantly different between Phase 1 and Phase 2 (Mann–Whitney–Wilcoxon two-sided test, p = 0.63).
Of the 99 sequenced patients, 29 received a molecular diagnosis (Supplementary Table 1), with 6 diagnoses in Phase 1 (24% diagnostic yield) and 23 in Phase 2 (32% diagnostic yield) (Fisher’s Exact test, p = 0.61). Each of the diagnosed patients had at least one genetic variant consistent with their phenotype classified as pathogenic or likely pathogenic according to The American College of Medical Genetics and Genomics (ACMG) guidelines24.
The median time from admission to nomination decreased from 48.0 h in phase 1 to 39.1 h in phase 2 (18.5% reduction; Mann–Whitney–Wilcoxon two-sided test, p = 0.10). This is particularly noticeable at 72 h post-admission where, in phase 2, 82% of all nominations had taken place, vs only 52% in phase 1, as shown in Fig. 1 (Cox’s proportional hazard regression, p = 0.10).
In both phases, the MPSE scores for the nominated patients were significantly higher than the scores of the patients who were not nominated (Mann–Whitney–Wilcoxon two-sided test (Fig. 2); Phase 1: p = 1.5 × 10−4; Phase 2: p = 4.6 × 10−17) and is consistent with our previous work that found that the MPSE scores of patients nominated for WGS were higher than those not nominated15. A list of terms found on the patients with high vs low MPSE scores can be found in Supplementary Table 2.
Although the differences between the three primary outcomes were not statistically different between pre-implementation (Phase 1) and implementation (Phase 2) of MPSE, there was a trend towards improvement for all three primary outcomes after the MPSE was implemented: the average number of babies nominated for WGS increased from 1.9 nominations/week to 2.4 nominations/week, the median time from admission to nomination for WGS reduced from 48.0 h to 39.1 h (Supplementary Figs. 2 and 3), and the diagnostic yield of WGS increased from 24% to 32%. Importantly, these results suggest that the increased frequency and speed of nomination did not degrade the yield of rWGS and may have improved it.
Limitations in this study include a small sample size, especially during pre-implementation, and lack of long-term outcome data. These challenges should be addressed with future studies. Additionally, the established familiarity of the study site’s NICU physicians with rWGS suggests MPSE might hold greater influence in settings where Rapid Precision Medicine has not been established. The version of the MPSE used in this study was trained on a database of patients admitted to our level IV NICU (roughly 28% of those patients were nominated for WGS)15, while this method aims to replicate the complex criteria that physicians used to nominate patients for WGS, it also means that our model may present operator bias. Further research is needed to confirm these preliminary findings and to assess generalizability between NICUs and clinical teams – including the need for evaluating model performance on less acute patients, such as those admitted to lower acuity NICUs, and to create a larger training dataset to improve the MPSE model. Additionally, future versions of the MPSE could be improved by including features that can be automatically extracted from the patient’s EMR in addition to the HPO terms, such as lab test results, demographic information and social determinants of health, and EEG/EKG data.
Although statistically significant differences were not observed, likely due to limitations in sample size, these findings hold promise for future research. This study contributes to the ongoing effort to inform the design and implementation of ML tools within healthcare environments. This study demonstrates MPSE’s capability for integration into existing clinical workflows and indicates that MPSE could be similarly employed in other healthcare systems.
These findings underscore the immediate impact that carefully applied clinical decision support tools harnessing NLP and machine learning can potentially have on clinicians in the intensive care unit with regard to efficiently and appropriately selecting patients for genomic sequencing.
While the clinical utility of rWGS in the NICU is well established5,6,7,8,9,10,11,12,13, the longstanding question of which infants would most likely benefit from sequencing has remained. Presently, the decision to request rWGS for a patient relies on subjective factors with wide inter-clinician variability with regard to patient selection for sequencing and may be influenced by the clinician’s familiarity with the technology and their level of comfort with genomic information. In particular, patients with previously unrecognized presentations of genetic disorders and/or those who lack obvious physical stigmata of genetic disease may not be offered early rWGS. Clinical decision support tools such as the MPSE provide clinicians with a much-needed objective data component to the decision regarding rWGS while not detracting from the clinician’s autonomy in making the final determination. As clinical decision support tools continue to become more commonplace in medical practice, the MPSE, in particular, fills a needed void, especially given the obscurity of most rare diseases. We envision the MPSE as a tool to prompt consideration of rWGS by the clinician, with the potential to facilitate earlier genomic diagnoses and thereby expedite the initiation of appropriate therapeutics for affected critically ill patients.
Methods
Patient enrollment
This clinical prospective study was conducted in the Level IV NICU of Rady Children’s Hospital in San Diego (RCHSD). Our study was implemented in 2 phases. In each phase, attending neonatologists nominated patients for WGS following inclusion/exclusion criteria outlined below:
Inclusion criteria
-
NICU admit, age 0–12 months
And
-
0–7 days from admission or within one week of development of an abnormal response to standard therapy
Exclusion criteria
-
Clinical course entirely explained by:
-
Isolated prematurity
-
Isolated unconjugated hyperbilirubinemia
-
Infection/sepsis with an expected pathogen and a normal response to therapy
-
Previously confirmed genetic diagnosis
-
Isolated transient tachypnea of the newborn
-
Meconium aspiration
-
Trauma
The patient’s family provided written informed consent, and whenever possible, parent samples were also collected.
The sole difference between Phase 1 and Phase 2 is that in Phase 2, the NICU attending physicians were provided, right before their daily rounds, with a printed report that contained the most recent MPSE scores and percentiles (Supplementary Fig. 1) of the infants in the NICU that meet the inclusion/exclusion criteria.
Phase 1 lasted 14 weeks (July–October 2022), it included 204 infants who met the inclusion/exclusion criteria, 27 of them were nominated by the attending physicians, and 25 enrolled in the study. Phase 2 lasted 38 weeks (October 2022–July 2023), it included 691 infants who met the inclusion/exclusion criteria, 91 of them were nominated by the attending physicians, and 74 enrolled in the study.
MPSE score computation
The Mendelian phenotype Search Engine (MPSE) employs Human Phenotype Ontology (HPO) terms to determine the likelihood that a Mendelian condition underlies a patient’s phenotype. MPSE employs a simple, well-established approach: a Naïve Bayes (NB) classifier that has previously been published in detail by our group, shown to have good performance (AUC 0.86 in RCHSD and 0.85 at the University of Utah)15. Briefly, MPSE uses the differences in HPO term frequencies between a collection of cases and controls to score each patient by calculating NB the log-odds ratio. This is referred to as the MPSE Raw Score, ranging from negative infinity to positive infinity, the sign represents to which class the patient’s HPO terms are most similar (positive corresponding to the cases and negative to the control). Because MPSE Raw Scores can be unintuitive to interpret, we computed a percentile for each score. To ensure that all percentile scores are comparable, we normalized them to the same training data.
MPSE scores and percentiles for each patient in the NICU were computed automatically every three hours during the study period.
HPO-based phenotype descriptions were generated for all patients in Phases 1 and 2 by NLP analysis of clinical notes using a local instance of CLiX ENRICH (Clinithink, Alpharetta, GA). A pre-trained MPSE model was then used to calculate MPSE scores for each patient15. This MPSE model was trained on a dataset of 1049 NICU patients. These patients were separated into two groups (the target variable) based on whether they were nominated for sequencing by a physician or not (positive N = 293, negative N = 756)15.
Statistics
Statistics were computed in Python version 3.10.2 with SciPy version 1.8.0, statannotations version 0.5.0, and lifelines version 0.27.8.
In general, because we expected our distributions to not be Gaussian, we chose the following non-parametric tests:
Group statistics utilize the Mann–Whitney–Wilcoxon two-sided test, which is a non-parametric test that allows testing significant differences in two distributions.
To compare proportions the Fisher’s Exact test is used, this is a non-parametric test that allows one to determine if there are statistical differences in two proportions (e.g., percent of enrolled patients with a positive diagnosis in Phase 1 vs Phase 2).
To compare time to nomination in Phase 1 vs Phase 2, we use Cox’s proportional hazard model (as implemented by lifelines), this test is used to investigate the statistical association between the time-to-an-event and one or more predictor variables.
For all testing, p < 0.05 was considered statistically significant.
Data availability
De-identified data utilized in this paper is attached as Supplementary Table 1; including time from admission to nomination, MPSE score, WGS results for enrolled patients, and reasons for decline for patients who did not enroll.
Code availability
The source code for the MPSE can be found in GitHub: https://github.com/Yandell-Lab/MPSE/tree/main.
References
Michel, M. C., Colaizy, T. T., Klein, J. M., Segar, J. L. & Bell, E. F. Causes and circumstances of death in a neonatal unit over 20 years. Pediatr. Res. 83, 829–833 (2018).
Trowbridge, A., Walter, J. K., McConathey, E., Morrison, W. & Feudtner, C. Modes of death within a children’s hospital. Pediatrics 142, e20174182 (2018).
Chow, S. et al. A selected review of the mortality rates of neonatal intensive care units. Front. Public Health 3, 225 (2015).
Burns, J. P., Sellers, D. E., Meyer, E. C., Lewis-Newby, M. & Truog, R. D. Epidemiology of death in the pediatric intensive care unit at five U.S. teaching hospitals. Crit. Care Med. 42, 2101–2108 (2014).
NICUSeq Study Group et al. Effect of whole-genome sequencing on the clinical management of acutely ill infants with suspected genetic disease: a randomized clinical trial. JAMA Pediatr. 175, 1218–1226 (2021).
Dimmock, D. P. et al. An RCT of rapid genomic sequencing among seriously ill infants results in high clinical utility, changes in management, and low perceived harm. Am. J. Hum. Genet. 107, 942–952 (2020).
Clark, M. M. et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci. Transl. Med. 11, eaat6177 (2019).
Kingsmore, S. F. et al. A randomized, controlled trial of the analytic and diagnostic performance of singleton and trio, rapid genome and exome sequencing in ill infants. Am. J. Hum. Genet. 105, 719–733 (2019).
Franck, L. S., Dimmock, D., Hobbs, C. & Kingsmore, S. F. Rapid whole-genome sequencing in critically ill children: shifting from unease to evidence, education, and equitable implementation. J. Pediatr. 238, 343 (2021).
Franck, L. S. et al. Implementing rapid whole-genome sequencing in critical care: a qualitative study of facilitators and barriers to new technology adoption. J. Pediatr. 237, 237–243.e2 (2021).
Kingsmore, S. F. et al. Mortality in a neonate with molybdenum cofactor deficiency illustrates the need for a comprehensive rapid precision medicine system. Mol. Case Stud. 6, a004705 (2020).
James, K. N. et al. Partially automated whole-genome sequencing reanalysis of previously undiagnosed pediatric patients can efficiently yield new diagnoses. Npj Genom, Med, 5, 1–8 (2020).
Dimmock, D. et al. Project Baby Bear: rapid precision care incorporating rWGS in 5 California children’s hospitals demonstrates improved clinical outcomes and reduced costs of care. Am. J. Hum. Genet. 108, 1231–1238 (2021).
Franck, L. S. et al. Healthcare professionals’ attitudes toward rapid whole genome sequencing in pediatric acute care. Children 9, 357 (2022).
Peterson, B. et al. Automated prioritization of sick newborns for whole genome sequencing using clinical natural language processing and machine learning. Genome Med.15, 18 (2023).
Gargano, M. A. et al. The human phenotype ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346 (2024).
Daniali, M. et al. Enriching representation learning using 53 million patient notes through human phenotype ontology embedding. Artif. Intell. Med. 139, 102523 (2023).
Havrilla, J. M. et al. PheNominal: an EHR-integrated web application for structured deep phenotyping at the point of care. BMC Med. Inform. Decis. Mak. 22, 198 (2022).
Bastarache, L. et al. Improving the phenotype risk score as a scalable approach to identifying patients with Mendelian disease. J. Am. Med. Inform. Assoc.26, 1437–1447 (2019).
Morley, T. J. et al. Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing. Nat. Med. 27, 1097–1104 (2021).
Keles, E. & Bagci, U. The past, current, and future of neonatal intensive care units with artificial intelligence: a systematic review. Npj Digit. Med. 6, 1–36 (2023).
Stiglich, Y. F., Dik, P. H. B., Segura, M. S. & Mariani, G. L. The alarm fatigue challenge in the neonatal intensive care unit: a ‘before’ and ‘after’ study. Am. J. Perinatol. 41, e2348–e2355 (2024).
Wong, A. et al. External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern. Med. 181, 1065–1070 (2021).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Acknowledgements
This study was funded by the Prebys Foundation, R.R. is supported by NCATS grant number K12TR004410. We are grateful to the families who participated in this study. We are also thankful to the RCHSD NICU team for their collaboration and contributions to this study. We acknowledge the assistance that Ricky (Hung) Nguyen provided to the study by delivering the daily reports and Shauna Briscoe for serving as a project manager.
Author information
Authors and Affiliations
Contributions
C.H., M.Y., C.T., and J.R. conceptualized this study design and supervised the implementation; E.J. coordinated the implementation of the software and report delivery, performed the data analyses, and wrote the first draft of the manuscript; M.B., M.Y., C.H., L.T., and C.T. revised the figures and data analyses; M.B., E.S.K., B.P., C.H., and L.T. revised earlier versions of the manuscript; E.S.K., R.R., K.W., J.C., S.B.T., and C.H. provided clinical expertise throughout the study and during manuscript revisions; M.B., C.T. and M.Y. provided supervision for data analysis and interpretations; E.J., S.G., C.B., D.D., C.T., M.Y., and C.H., designed and implemented the score report. J.L., B.S., E.K., L.O., E.S.K., R.R., K.W., and C.H. contributed to study enrollment; and J.L., B.S. returned results to patients’ families. All authors reviewed and approved the final versions of this manuscript.
Corresponding author
Ethics declarations
Competing interests
C.T. is a director and employee of Clinithink and also a shareholder. No other competing interests from any author to disclose.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Juarez, E.F., Peterson, B., Sanford Kobayashi, E. et al. A machine learning decision support tool optimizes WGS utilization in a neonatal intensive care unit. npj Digit. Med. 8, 72 (2025). https://doi.org/10.1038/s41746-025-01458-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-01458-9
This article is cited by
-
MPSE identifies newborns for whole genome sequencing within 48 h of NICU admission
npj Genomic Medicine (2025)