Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records

Onishchenko, Dmytro; Marlowe, Robert J.; Ngufor, Che G.; Faust, Louis J.; Limper, Andrew H.; Hunninghake, Gary M.; Martinez, Fernando J.; Chattopadhyay, Ishanu

doi:10.1038/s41591-022-02010-y

Article
Published: 29 September 2022

Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records

Nature Medicine volume 28, pages 2107–2116 (2022)Cite this article

13k Accesses
21 Citations
89 Altmetric
Metrics details

Subjects

Abstract

Idiopathic pulmonary fibrosis (IPF) is a lethal fibrosing interstitial lung disease with a mean survival time of less than 5 years. Nonspecific presentation, a lack of effective early screening tools, unclear pathobiology of early-stage IPF and the need for invasive and expensive procedures for diagnostic confirmation hinder early diagnosis. In this study, we introduce a new screening tool for IPF in primary care settings that requires no new laboratory tests and does not require recognition of early symptoms. Using subtle comorbidity signatures identified from the history of medical encounters of individuals, we developed an algorithm, called the zero-burden comorbidity risk score for IPF (ZCoR-IPF), to predict the future risk of an IPF diagnosis. ZCoR-IPF was trained on a national insurance claims database and validated on three independent databases, comprising a total of 2,983,215 participants, with 54,247 positive cases. The algorithm achieved positive likelihood ratios greater than 30 at a specificity of 0.99 across different cohorts, for both sexes, and for participants with different risk states and history of confounding diseases. The area under the receiver-operating characteristic curve for ZCoR-IPF in predicting IPF exceeded 0.88 and was approximately 0.84 at 1 and 4 years before a conventional diagnosis, respectively. Thus, if adopted, ZCoR-IPF can potentially enable earlier diagnosis of IPF and improve outcomes of disease-modifying therapies and other interventions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Participant characteristics.**

**Fig. 2: ZCoR-IPF performance for predictions 1 year into the future.**

All-cause mortality of patients with idiopathic pulmonary fibrosis: a nationwide population-based cohort study in Korea

Article Open access 26 July 2021

The incidence of acute exacerbation of idiopathic pulmonary fibrosis: a systematic review and meta-analysis

Article Open access 10 September 2024

A nationwide population-based study of incidence and mortality of lung cancer in idiopathic pulmonary fibrosis

Article Open access 28 January 2021

Data availability

The Truven, UCM and MAYO datasets cannot be made available due to their commercial nature. D.O. and I.C. had access to the Truven and UCM databases, and I.C. was responsible for maintaining the integrity of these datasets. C.G.N., L.J.F. and A.H.L. had access to the MAYO dataset, and A.H.L. was responsible for maintaining the integrity of that dataset.

Code availability

Methodological details needed to evaluate our conclusions are included in the Methods and Supplementary Information. A working software implementation of the pipeline (free for noncommercial evaluations) is available at https://doi.org/10.5281/zenodo.6040418, which includes installation instructions in standard Python environments. To enable fast execution, some more compute-intensive features are disabled in this version. Results from this software are for demonstration purposes only, and must not be interpreted as medical advice, or serve as replacement for such.

References

Lederer, D. & Martinez, F. Idiopathic pulmonary fibrosis. N. Engl. J. Med. 378, 1811–1823 (2018).
Article CAS PubMed Google Scholar
Raghu, G., Remy-Jardin, M. & Myers, J. Diagnosis of idiopathic pulmonary fibrosis. an official ats/ers/jrs/alat clinical practice guideline. Am. J. Respir. Crit. Care Med. 198, 44–68 (2018).
Article Google Scholar
Raghu, G. Idiopathic pulmonary fibrosis: shifting the concept to irreversible pulmonary fibrosis of many entities. Lancet Respir. Med. 7, 926–929 (2019).
Article PubMed Google Scholar
Ley, B., Collard, H. & King, T., Jr. Clinical course and prediction of survival in idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 183, 431–440 (2011).
Article PubMed Google Scholar
Antoniou, K., Symvoulakis, E., Margaritopoulos, G., Lionis, C. & Wells, A. Early diagnosis of IPF: time for a primary-care case-finding initiative? Lancet Respir. Med. 2, 1 (2014).
Article Google Scholar
Adegunsoye, A. Diagnostic delay in idiopathic pulmonary fibrosis: where the rubber meets the road. Ann. Am. Thorac. Soc. 16, 310–312 (2019).
Article PubMed Google Scholar
Cottin, V. & Richeldi, L. Neglected evidence in idiopathic pulmonary fibrosis and the importance of early diagnosis and treatment. Eur. Respir. Rev. 23, 106–110 (2014).
Article PubMed Google Scholar
Putman, R., Rosas, I. & Hunninghake, G. Genetics and early detection in idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 189, 770–778 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lamas, D. et al. Delayed access and survival in idiopathic pulmonary fibrosis: a cohort study. Am. J. Respir. Crit. Care Med. 184, 842–847 (2011).
Article PubMed PubMed Central Google Scholar
Hoyer, N., Prior, T., Bendstrup, E., Wilcke, T. & Shaker, S. Risk factors for diagnostic delay in idiopathic pulmonary fibrosis. Respir. Res. 20, 103 (2019).
Article PubMed PubMed Central Google Scholar
Mooney, J., Chang, E. & Lalla, D. Potential delays in diagnosis of idiopathic pulmonary fibrosis in medicare beneficiaries. Ann. Am. Thorac. Soc. 16, 393–396 (2019).
PubMed PubMed Central Google Scholar
Pritchard, D., Adegunsoye, A. & Lafond, E. Diagnostic test interpretation and referral delay in patients with interstitial lung disease. Respir. Res. 20, 253 (2019).
Article PubMed PubMed Central Google Scholar
Cosgrove, G. P., Bianchi, P., Danese, S. & Lederer, D. J. Barriers to timely diagnosis of interstitial lung disease in the real world: the INTENSITY survey. BMC Pulm. Med. 18, 9 (2018).
Article PubMed PubMed Central Google Scholar
Schoenheit, G., Becattelli, I. & Cohen, A. Living with idiopathic pulmonary fibrosis: an in-depth qualitative survey of European patients. Chron. Respir. Dis. 8, 225–231 (2011).
Article PubMed Google Scholar
Collard, H., Tino, G. & Noble, P. Patient experiences with pulmonary fibrosis. Respir. Med. 101, 1350–1354 (2007).
Article PubMed Google Scholar
Thickett, D., Voorham, J. & Ryan, R. Historical database cohort study addressing the clinical patterns prior to idiopathic pulmonary fibrosis (IPF) diagnosis in UK primary care. BMJ Open 10, 034428 (2020).
Article Google Scholar
Hewson, T. et al. Timing of onset of symptoms in people with idiopathic pulmonary fibrosis. Thorax https://doi.org/10.1136/thoraxjnl-2017-210177 (2017).
Cottin, V. & Cordier, J. Velcro crackles: the key for early diagnosis of idiopathic pulmonary fibrosis? Eur. Respir. J. 40, 519–521 (2012).
Article PubMed Google Scholar
Hart, S. Machine learning molecular classification in IPF: UIP or not UIP, that is the question. Lancet Respir. Med. 7, 466–467 (2019).
Article PubMed Google Scholar
Oldham, J. & Noth, I. Idiopathic pulmonary fibrosis: early detection and referral. Respir. Med. 108, 819–829 (2014).
Article PubMed PubMed Central Google Scholar
Hansen, L. The Truven Health MarketScan Databases for Life Sciences Researchers (Truven Health Ananlytics IBM Watson Health, 2017).
Andrade, C. Examination of participant flow in the CONSORT diagram can improve the understanding of the generalizability of study results. J. Clin. Psychiatry 76, e1469–e1471 (2015).
Wallace, P. J., Shah, N. D., Dennen, T., Bleicher, P. A. & Crown, W. H. Optum Labs: building a novel node in the learning healthcare system. Health Aff. 33, 1187–1194 (2014).
Article Google Scholar
Raghu, G., Amatto, V., Behr, J. & Stowasser, S. Comorbidities in idiopathic pulmonary fibrosis patients: a systematic literature review. Eur. Respir. J. 46, 1113–1130 (2015).
Article CAS PubMed Google Scholar
World Health Organization. International Classification of Diseases—Ninth Revision (ICD-9). Wkly Epidemiol. Rec. 63, 343–344 (1988).
Chattopadhyay, I. & Lipson, H. Abductive learning of quantized stochastic processes with probabilistic finite automata. Philos. Trans. A Math. Phys. Eng. Sci. 371, 20110543 (2013).
Article Google Scholar
Huang, Y. & Chattopadhyay, I. Universal risk phenotype of us counties for flu-like transmission to improve county-specific covid-19 incidence forecasts. PLoS Comput. Biol. 17, e1009363 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ley, B. et al. Code-based diagnostic algorithms for idiopathic pulmonary fibrosis. Case validation and improvement. Ann. Am. Thorac. Soc. 14, 880–887 (2017).
Article PubMed PubMed Central Google Scholar
Alqarni, A. M., Schneiders, A. G. & Hendrick, P. A. Clinical tests to diagnose lumbar segmental instability: a systematic review. J. Orthop. Sports Phys. Ther. 41, 130–140 (2011).
Article PubMed Google Scholar
Vining, R., Potocki, E., Seidman, M. & Morgenthal, A. P. An evidence-based diagnostic classification system for low back pain. J. Can. Chiropr. Assoc. 57, 189–204 (2013).
PubMed PubMed Central Google Scholar
Kaplan, E. L. & Meier, P. Nonparametric estimation from incomplete observations. J. Am. Stat. Assoc. 53, 457–481 (1958).
Article Google Scholar
Noble, P. W. et al. Pirfenidone in patients with idiopathic pulmonary fibrosis (capacity): two randomised trials. Lancet 377, 1760–1769 (2011).
Article CAS PubMed Google Scholar
Richeldi, L. et al. Efficacy and safety of nintedanib in idiopathic pulmonary fibrosis. N. Engl. J. Med. 370, 2071–2082 (2014).
Article PubMed Google Scholar
Hyldgaard, C., Hilberg, O. & Bendstrup, E. How does comorbidity influence survival in idiopathic pulmonary fibrosis? Respir. Med. 108, 647–653 (2014).
Article PubMed Google Scholar
Oldham, J., Adegunsoye, A. & Khera, S. Underreporting of interstitial lung abnormalities on lung cancer screening computed tomography. Ann. Am. Thorac. Soc. 15, 764–766 (2018).
Article PubMed PubMed Central Google Scholar
Walsh, S., Humphries, S., Wells, A. & Brown, K. Imaging research in fibrotic lung disease; applying deep learning to unsolved problems. Lancet Respir. Med. 8, 1144–1153 (2020).
Article PubMed Google Scholar
Raghu, G., Flaherty, K. & Lederer, D. Use of a molecular classifier to identify usual interstitial pneumonia in conventional transbronchial lung biopsy samples: a prospective validation study. Lancet Respir. Med. 7, 487–496 (2019).
Article PubMed Google Scholar
Torrisi, S. E., Pavone, M., Vancheri, A. & Vancheri, C. When to start and when to stop antifibrotic therapies. Eur. Respir. Rev. 26, 170053 (2017).
Sugino, K. et al. Efficacy of early antifibrotic treatment for idiopathic pulmonary fibrosis. BMC Pulm. Med. 21, 218 (2021).
Ryerson, C. J. et al. Effects of nintedanib in patients with idiopathic pulmonary fibrosis by gap stage. ERJ Open Res. 5, 00127–2018 (2019).
Article PubMed PubMed Central Google Scholar
Kropski, J. Biomarkers and early treatment of idiopathic pulmonary fibrosis. Lancet Respir. Med. 7, 725–727 (2019).
Article PubMed PubMed Central Google Scholar
Farrand, E., Iribarren, C. & Vittinghoff, E. Impact of idiopathic pulmonary fibrosis on longitudinal health-care utilization in a community-based cohort of patients. Chest 159, 219–227 (2020).
Kreuter, M., Ehlers-Tenenbaum, S. & Palmowski, K. Impact of comorbidities on mortality in patients with idiopathic pulmonary fibrosis. PLoS ONE 11, 0151425 (2016).
Article Google Scholar
Ley, B. & Collard, H. R. Risk prediction in idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 185, 6–7 (2012).
Ryerson, C. J. et al. Predicting mortality in systemic sclerosis-associated interstitial lung disease using risk prediction models derived from idiopathic pulmonary fibrosis. Chest 148, 1268–1275 (2015).
Article PubMed Google Scholar
Kim, G. H. J. et al. Prediction of idiopathic pulmonary fibrosis progression using early quantitative changes on ct imaging for a short term of clinical 18- to 24-month follow-ups. Eur. Radiol. 30, 726–734 (2020).
Article PubMed Google Scholar
Richards, T. J. et al. Peripheral blood proteins predict mortality in idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 185, 67–76 (2012).
CAS Google Scholar
King Jr, T. E., Tooze, J. A., Schwarz, M. I., Brown, K. R. & Cherniack, R. M. Predicting survival in idiopathic pulmonary fibrosis: scoring system and survival model. Am. J. Respir. Crit. Care Med. 164, 1171–1181 (2001).
Article CAS Google Scholar
Wells, A. U. et al. Idiopathic pulmonary fibrosis: a composite physiologic index derived from disease extent observed by computed tomography. Am. J. Respir. Crit. Care Med. 167, 962–969 (2003).
Article PubMed Google Scholar
du Bois, R. M. et al. Ascertainment of individual risk of mortality for patients with idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 184, 459–466 (2011).
Article PubMed Google Scholar
Singh, R. P., Hom, G. L., Abramoff, M. D., Campbell, J. P. & Chiang, M. F. Current challenges and barriers to real-world artificial intelligence adoption for the healthcare system, provider, and the patient. Transl. Vis. Sci. Technol. 9, 45 (2020).
Article PubMed PubMed Central Google Scholar
Holm, E. A. In defense of the black box. Science 364, 26–27 (2019).
Article CAS PubMed Google Scholar
Esposito, D., Lanes, S. & Donneyong, M. Idiopathic pulmonary fibrosis in united states automated claims. incidence, prevalence, and algorithm validation. Am. J. Respir. Crit. Care Med. 192, 1200–7 (2015).
Article PubMed Google Scholar
Ley, B., Urbania, T. & Husson, G. Code-based diagnostic algorithms for idiopathic pulmonary fibrosis. Case validation and improvement. Ann. Am. Thorac. Soc. 14, 880–887 (2017).
Article PubMed PubMed Central Google Scholar
Inoue, Y., Kaner, R. & Guiot, J. Diagnostic and prognostic biomarkers for chronic fibrosing interstitial lung diseases with a progressive phenotype. Chest 158, 646–659 (2020).
Article CAS PubMed Google Scholar
George, P., Spagnolo, P. & Kreuter, M. Progressive fibrosing interstitial lung disease: clinical uncertainties, consensus recommendations, and research priorities. Lancet Respir. Med. 8, 925–934 (2020).
Article PubMed Google Scholar
Mortimer, K., Bartels, D. & Hartmann, N. Characterizing health outcomes in idiopathic pulmonary fibrosis using US health claims data. Respiration 99, 108–118 (2020).
Article PubMed Google Scholar
Miotto, R., Li, L., Kidd, B. A. & Dudley, J. T. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6, 26094 (2016).
Article CAS PubMed PubMed Central Google Scholar
Granger, C. W. J. & Joyeux, R. An introduction to long-memory time series models and fractional differencing. J. Time Ser. Anal. 1, 15–29 (1980).
Article Google Scholar
American Academy of Pediatrics. Transitioning to 10: 2014 general equivalence mappings (online exclusive). AAP Pediatric Coding Newsletter https://doi.org/10.1542/pcco_book116_document005 (2013).
Chattopadhyay, I. & Lipson, H. Data smashing: uncovering lurking order in data. J. R. Soc. Interface 11, 20140826 (2014).
Onishchenko, D. et al. Reduced false positives in autism screening via digital biomarkers inferred from deep comorbidity patterns. Sci. Adv. 7, eabf0354 (2021).
Article PubMed PubMed Central Google Scholar
Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley-Interscience, 1991).
Book Google Scholar
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Statist. 22, 79–86 (1951).
Article Google Scholar
Doob, J. Stochastic Processes (Wiley, 1953). https://books.google.com/books?id=KvJQAAAAMAAJ
Ke, G. et al. LightGBM: a highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 3146–3154 (2017).
Birnbaum, Z. W. & Klose, O. M. Bounds for the variance of the Mann–Whitney statistic. Ann. Math. Stat. 4, 933–945 (1957).
Mann, H. B. & Whitney, D. R. On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947).
Wilcoxon, F. Individual comparisons by ranking methods. in Breakthroughs in Statistics Vol. 2 196–202 (Springer, 1992).
Newcombe, R. G. & Vollset, S. E. Confidence intervals for a binomial proportion. Stat. Med. 13, 1283–1285 (1994).
Article CAS PubMed Google Scholar
Birnbaum, Z. On a use of the Mann–Whitney statistic. in Contribution to the Theory of Statistics Vol. 1, 13–18 (University of California Press, 2020).
van Dantzig, D. On the consistency and the power of wilcoxon’s two-sample test (Proceedings KNAW series A, 54, nr 1, Indagationes Mathematicae, 13, 1–8). Stichting Mathematisch Centrum. Statistische Afdeling (1951).
Newcombe, R. G. Two-sided confidence intervals for the single proportion: comparison of seven methods. Stat. Med. 17, 857–872 (1998).
Article CAS PubMed Google Scholar
Haldane, J. B. & Smith, C. A. A simple exact test for birth-order effect. Ann. Eugen. 14, 117–124 (1947).
Article Google Scholar
Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).
Article PubMed Google Scholar
Van Houdt, G., Mosquera, C. & Nápoles, G. A review on the long short-term memory model. Artif. Intell. Rev. 53, 5929–5955 (2020).
Article Google Scholar
Albawi, S., Mohammed, T. A. & Al-Zawi, S. Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET), 1–6 (IEEE, 2017).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Article Google Scholar
Alom, M. Z. et al. The history began from AlexNet: a comprehensive survey on deep learning approaches. Preprint at https://arxiv.org/abs/1803.01164 (2018).
Zhang, K., Guo, Y., Wang, X., Yuan, J. & Ding, Q. Multiple feature reweight densenet for image classification. IEEE Access 7, 9872–9880 (2019).
Article Google Scholar
Lu, Z., Jiang, X. & Kot, A. Deep coupled resnet for low-resolution face recognition. IEEE Signal Processing Lett. 25, 526–530 (2018).
Article Google Scholar
Guo, W., Ge, W., Cui, L., Li, H. & Kong, L. An interpretable disease onset predictive model using crossover attention mechanism from electronic health records. IEEE Access 7, 134236–134244 (2019).
Article Google Scholar

Download references

Acknowledgements

This work is funded in part by the Defense Advanced Research Projects Agency under project no.HR00111890043. The claims made in this study do not reflect the position or the policy of the US Government. The UCM dataset is provided by the Clinical Research Data Warehouse (CRDW) maintained by the Center for Research Informatics at the University of Chicago. The Center for Research Informatics is funded by the Biological Sciences Division, the Institute for Translational Medicine/CTSA (National Institutes of Health award no. UL1 TR000430) at the University of Chicago.

Author information

Authors and Affiliations

Department of Medicine, University of Chicago, Chicago, IL, USA
Dmytro Onishchenko & Ishanu Chattopadhyay
Spencer-Fontayne Corporation, Jersey City, NJ, USA
Robert J. Marlowe
Mayo Clinic College of Medicine and Science, Rochester, MN, USA
Che G. Ngufor, Louis J. Faust & Andrew H. Limper
Director, Thoracic Research Unit, Mayo Clinic College of Medicine and Science, Rochester, MN, USA
Andrew H. Limper
Director, Interstitial Lung Disease Program, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Gary M. Hunninghake
Bruce Webster Professor of Internal Medicine, Medicine, Weill Cornell Medical College, New York, NY, USA
Fernando J. Martinez
Chief of Division of Pulmonary and Critical Care Medicine at Weill Cornell Medicine and NewYork-Presbyterian Weill Cornell Medical Center, New York, NY, USA
Fernando J. Martinez
Committee on Genetics, Genomics & Systems Biology, University of Chicago, Chicago, IL, USA
Ishanu Chattopadhyay
Committee on Quantitative Methods in Social, Behavioral, and Health Sciences, University of Chicago, Chicago, IL, USA
Ishanu Chattopadhyay

Authors

Dmytro Onishchenko
View author publications
Search author on:PubMed Google Scholar
Robert J. Marlowe
View author publications
Search author on:PubMed Google Scholar
Che G. Ngufor
View author publications
Search author on:PubMed Google Scholar
Louis J. Faust
View author publications
Search author on:PubMed Google Scholar
Andrew H. Limper
View author publications
Search author on:PubMed Google Scholar
Gary M. Hunninghake
View author publications
Search author on:PubMed Google Scholar
Fernando J. Martinez
View author publications
Search author on:PubMed Google Scholar
Ishanu Chattopadhyay
View author publications
Search author on:PubMed Google Scholar

Contributions

D.O. implemented the algorithm and ran validation tests. D.O. and I.C. carried out mathematical modeling and algorithm design. R.J.M., F.J.M. and I.C. wrote the paper. F.J.M., G.M.H. and I.C. interpreted results and guided research. C.G.N., L.J.F. and A.H.L. evaluated the tool on the dataset available at the Mayo Clinic. I.C. procured funding for the research.

Corresponding author

Correspondence to Ishanu Chattopadhyay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Athol Wells, Harold Collard and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Michael Basson, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Performance under delayed updates to participant records.

a, b, Out-of-sample ROC curves when the patient data is delayed by 4w vs the no-delay condition, for the UCM and the Truven datasets, respectively. 95% confidence bounds about the mean is shown, computed with n=2,053,277 for Truven and n=68,658 for UCM. Note that there is no significant loss of performance with such delayed data. c, d, ZCoR-IPF performance vs a 87-feature baseline model optimized via logistic regression, where these features denote presence/absence of manually-curated risk factors (Supplemental Table 4) and age (over/under 65 years), for the Truven and the UCM datasets, respectively.

Extended Data Fig. 2 Comparison with neural network architectures.

a,b, Out-of-sample AUC achieved in Truven and UCM datasets, respectively, by a range of neural network architectures ranging from simple feed-forward networks, LSTMs and CNNs, to large state of the art models such as the ALEXNET, DENSENET and RESNET, along with 95% confidence intervals about the mean (n=2,053,277 for Truven and n=68,658 for UCM).

Extended Data Fig. 3 Performance with broader target definition.

a,b, Out-of-sample ROC curves for the Truven and the UCM dataset, respectively, comparing the results from the primary analysis with that in the secondary analysis (analysis with broader target definition as specified in Extended Data Table 1). 95% confidence bounds about the mean is shown, computed with n=2,053,277 for Truven and n=68,658 for UCM. c, Negative vs positive likelihood ratios (LR- vs LR+). d, Positive vs negative predictive values. Note that with the broad target definition we can select to operate with LR+ > 30 as well, similar to the target in the primary analysis.

Extended Data Fig. 4 Co-morbidity Spectra.

a,b, Diseases (recorded ICD codes) that increase the odds of the patient being a ‘true positive’ vs a ‘true negative’ for males and females respectively. These odds are broadly similar across the sexes, with over-representation of respiratory disorders.

Extended Data Fig. 5 Expected increase in survival times.

a, Survival function lower bounds at two specificity levels (90 and 95%). b, Cumulative hazard function upper bounds. 95% confidence bounds around the mean shown for both, generated using the Truven dataset (n=2,053,277). c, Variation of the mean survival time as a function of the specificity at which ZCoR-IPF is operated. d, Variation of estimated raw risk as a function of age for screening four years from actual recorded diagnosis of IPF, showing that risk increases almost linearly with age for the patients eventually diagnosed with IPF. e, Degradation of out-of-sample AUC as we attempt to screen earlier, stepping back from the time of current diagnosis (in absence of ZCoR-IPF screening).

Extended Data Table 1 TARGET CODES: DESCRIPTION OF ICD CODES(S) USED TO IDENTIFY IPF DIAGNOSES

.

Extended Data Table 2 HIGH RISK COMORBIDITIES WHICH DEFINE OUR HIGH-RISK COHORT

.

Extended Data Table 3 FEATURE DEFINITIONS (TOTAL NUMBER OF FEATURES USED: 667)

.

Extended Data Table 4 ZCOR-IPF PERFORMANCE FOR DIFFERENT SUBPOPULATIONS AT 99% SPECIFICITY IN MALES

.

Extended Data Table 5 ZCOR-IPF PERFORMANCE FOR DIFFERENT SUBPOPULATIONS AT 99% SPECIFICITY IN FEMALES

.

Supplementary information

Supplementary Information

Supplementary Note, Tables 1–11 and Figs. 1–3

Reporting Summary

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Onishchenko, D., Marlowe, R.J., Ngufor, C.G. et al. Screening for idiopathic pulmonary fibrosis using comorbidity signatures in electronic health records. Nat Med 28, 2107–2116 (2022). https://doi.org/10.1038/s41591-022-02010-y

Download citation

Received: 18 March 2021
Accepted: 12 August 2022
Published: 29 September 2022
Issue date: October 2022
DOI: https://doi.org/10.1038/s41591-022-02010-y

This article is cited by

Epidemiology and comorbidities in idiopathic pulmonary fibrosis: a nationwide cohort study
- Jang Ho Lee
- Hyung Jun Park
- Ho Cheol Kim
BMC Pulmonary Medicine (2023)