Abstract
Background
Privacy-protecting analytic approaches without centralized pooling of individual-level data, such as distributed regression, are particularly important for vulnerable populations, such as children, but these methods have not yet been tested in multi-center pediatric studies.
Methods
Using the electronic health data from 34 healthcare institutions in the National Patient-Centered Clinical Research Network (PCORnet), we fit 12 multivariable-adjusted linear regression models to assess the associations of antibiotic use <24 months of age with body mass index z-score at 48 to <72 months of age. We ran these models using pooled individual-level data and conventional multivariable-adjusted regression (reference method), as well as using the more privacy-protecting pooled summary-level intermediate statistics and distributed regression technique. We compared the results from these two methods.
Results
Pooled individual-level and distributed linear regression analyses produced virtually identical parameter estimates and standard errors. Across all 12 models, the maximum difference in any of the parameter estimates or standard errors was 4.4833 × 10−10.
Conclusions
We demonstrated empirically the feasibility and validity of distributed linear regression analysis using only summary-level information within a large multi-center study of children. This approach could enable expanded opportunities for multi-center pediatric research, especially when sharing of granular individual-level data is challenging.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Cheng, T. L., Bogue, C. W. & Dover, G. J. The next 7 great achievements in pediatric research. Pediatrics 139, e20163803 (2017).
Curtis, L. H., Brown, J. & Platt, R. Four health data networks illustrate the potential for a shared national multipurpose big-data network. Health Aff. (Millwood) 33, 1178–1186 (2014).
Currie, J. “Big data” versus “big brother”: on the appropriate use of large-scale data collections in pediatrics. Pediatrics 131(Suppl 2), S127–S132 (2013).
Department of Health and Human Services. The Code of Federal Regulations. Title 45, Subtitle A, Subchapter A, Part 46: Protection of Human Subjects. (https://www.ecfr.gov/cgi-bin/retrieveECFR?gp=&SID=83cd09e1c0f5c6937cd9d7513160fc3f&pitd=20180719&n=pt45.1.46&r=PART&ty=HTML#se45.1.46_1401).
Simon, G. E. et al. Data sharing and embedded research. Ann. Intern. Med. 167, 668–670 (2017).
Brown, J. S. et al. Distributed health data networks: a practical and preferred approach to multi-institutional evaluations of comparative effectiveness, safety, and quality of care. Med. Care 48, S45–S51 (2010).
Toh, S., Platt, R., Steiner, J. F. & Brown, J. S. Comparative-effectiveness research in distributed health data networks. Clin. Pharm. Ther. 90, 883–887 (2011).
Mazor, K. M. et al. Stakeholders’ views on data sharing in multicenter studies. J. Comp. Eff. Res. 6, 537–547 (2017).
Karr, A. F., Lin, X., Sanil, A. P. & Reiter, J. P. Secure regression on distributed databases. J. Comput. Graph. Stat. 14, 263–279 (2005).
Fienberg, S. E., Fulp, W. J., Slavković, A. B. & Wrobel, T. A. “Secure” log-linear and logistic regression analysis of distributed databases. Lect. Notes Comput. Sci. 2006, 277–290 (2006).
Toh, S. et al. Combining distributed regression and propensity scores: a doubly privacy-protecting analytic method for multicenter research. Clin. Epidemiol. 10, 1773–1786 (2018).
Sarpatwari, A., Kesselheim, A. S., Malin, B. A., Gagne, J. J. & Schneeweiss, S. Ensuring patient privacy in data sharing for postapproval research. N. Engl. J. Med. 371, 1644–1649 (2014).
Fleurence, R. L. et al. Launching PCORnet, a national patient-centered clinical research network. J. Am. Med Inf. Assoc. 21, 578–582 (2014).
PCORnet. PCORnet Common Data Model. The People-Centered Research Foundation, 2019. (https://pcornet.org/data-driven-common-model/).
Toh, S. et al. The National Patient-Centered Clinical Research Network (PCORnet) Bariatric Study Cohort: Rationale, Methods, and Baseline Characteristics. JMIR Res. Protoc. 6, e222 (2017).
Arterburn, D. et al. Comparative effectiveness and safety of bariatric procedures for weight loss: a PCORnet Cohort Study. Ann. Intern. Med. 169, 741–750 (2018).
Block, J. P. et al. PCORnet Antibiotics and Childhood Growth Study: Process for cohort creation and cohort description. Acad. Pediatr. 18, 569–576 (2018).
Block, J. P. et al. Early antibiotic exposure and weight outcomes in young children. Pediatrics 2018; 142.
Kuczmarski, R. J. et al. CDC growth charts: United States. Adv. Data 2000,1–27.
Feudtner, C. et al. Deaths attributed to pediatric complex chronic conditions: national trends and implications for supportive care services. Pediatrics 107, E99 (2001).
Wu, Y., Jiang, X., Kim, J. & Ohno-Machado, L. Grid Binary LOgistic REgression (GLORE): building shared models without sharing data. J. Am. Med. Inf. Assoc. 19, 758–764 (2012).
El Emam, K. et al. A secure distributed logistic regression protocol for the detection of rare adverse drug events. J. Am. Med. Inf. Assoc. 20, 453–461 (2012).
Fienberg, S. E., Karr, A. F., Nardi, Y. & Slavkovic, A. Secure logistic regression with multi-party distributed databases. In Proc. of the 56th Session of the ISI, 3506–3513 (The Bulletin of the International Statistical Institute, 2007).
Slavković, A. B., Nardi, Y. & Tibbits, M. M. Secure logistic regression of horizontally and vertically partitioned distributed databases. In Proc. of Workshop on Privacy and Security Aspects of Data Mining. 723–728 (IEEE Computer Society Press, 2007).
Lu, C. L. et al. WebDISCO: a web service for distributed cox model learning without patient-level data sharing. J. Am. Med. Inf. Assoc. 22, 1212–1219 (2015).
Gaye, A. et al. DataSHIELD: taking the analysis to the data, not the data to the analysis. Int J. Epidemiol. 43, 1929–1944 (2014).
Her, Q. L. et al. A query workflow design to perform automatable distributed regression analysis in large distributed data networks. EGEMS (Wash. DC) 6, 11 (2018).
Toh, S. et al. Confounding adjustment in comparative effectiveness research conducted within distributed research networks. Med. Care 51, S4–S10 (2013).
Toh, S., Shetterly, S., Powers, J. D. & Arterburn, D. Privacy-preserving analytic methods for multisite comparative effectiveness and patient-centered outcomes research. Med. Care 52, 664–668 (2014).
Toh, S. et al. Multivariable confounding adjustment in distributed data networks without sharing of patient-level data. Pharmacoepidemiol. Drug Saf. 22, 1171–1177 (2013).
Li, X. et al. Validity of privacy-protecting analytical methods that use only aggregate-level information to conduct multivariable-adjusted analysis in distributed data networks. Am. J. Epidemiol. 188, 709–723 (2019).
Acknowledgements
This work was supported through the Patient-Centered Outcomes Research Institute (PCORI) Program Award (OBS-1505-30699). All statements in this manuscript are solely those of the authors and do not necessarily represent the views of PCORI, its Board of Governors, or its Methodology Committee. The PCORnet Antibiotics and Childhood Growth Study Team includes a diverse group of investigators, research staff, clinicians, community members, and parent caregivers. All members of the team including the study’s Executive Antibiotic Stakeholder Advisory Group (EASAG) contributed to the study design, data acquisition, and interpretation of results. The Study Team would like to thank the leaders of the participating PCORnet Clinical Data Research Networks (CDRNs) and PCORnet Coordinating Center as well as members of the PCORI team for their support and commitment to this project. The funding organization was not involved in the design of the study; the collection, analysis, and interpretation of the data; or the decision to approve publication of the finished manuscript.
PCORnet Antibiotics and Childhood Growth Study Group:
Brad Appelhans6, David Arterburn7, Janne Boone-Heinenon8, Andrew L. Brickman9, H. Timothy Bunnell10, F. Sessions Cole, III11, Matthew F. Daley12, Amanda Dempsey13, Jonathan Finkelstein14, Stephanie L. Fitzpatrick15, William Heerman16, Michael Horberg17, Carmen R. Isasi18, Melanie Jay19, Elyse Kharbanda20, Ritu Khare21, Dominick Lemas22, Simon M. Lin23, Mary Jo Messito24, Allison O’Neill25, Holly Landrum Peay26, Micah Prochaska27, Daksha Ranade28, Goutham Rao29, Maria Rayas30, Juliane S. Reynolds31, Marc Rosenman32, Bradley Taylor33, Zachary Willis34
Author information
Authors and Affiliations
Consortia
Contributions
S.T., S.L.R.S., L.C.B., C.B.F., C.E.H., D.L., E.M., J.L.S., J.G.Y., J.P.B., and the PCORnet Antibiotics and Childhood Growth Study Group contributed substantially to conception and design, acquisition of data, or analysis and interpretation of data; S.T., J.P.B. and P.I.L. drafted the article or revising it critically for important intellectual content; and S.T., J.P.B., L.C.B. and C.B.F. granted final approval of the version to be published.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Members of the “PCORnet Antibiotics and Childhood Growth Study Group” are listed above the Acknowledgements.
Rights and permissions
About this article
Cite this article
Toh, S., Rifas-Shiman, S.L., Lin, PI.D. et al. Privacy-protecting multivariable-adjusted distributed regression analysis for multi-center pediatric study. Pediatr Res 87, 1086–1092 (2020). https://doi.org/10.1038/s41390-019-0596-0
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41390-019-0596-0
This article is cited by
-
A Distributed Regression Analysis Application Package Using SAS
Statistics in Biosciences (2024)
-
Privacy-preserving estimation of an optimal individualized treatment rule: a case study in maximizing time to severe depression-related outcomes
Lifetime Data Analysis (2022)
-
Privacy-preserving statistical analyses in Learning Health Systems
Pediatric Research (2020)