Table 1 Datasets curated from the 11 sites of the RECOVER Initiative

From: Extracting post-acute sequelae of SARS-CoV-2 infection symptoms from clinical notes via hybrid natural language processing

Dataset

Patients

Notes

Sentences

Mentions

Sites

Model Development Dataset

WCM Training

30

30

642

1,324

WCM

2010 i2b2 assertion dataset27

-

-

3055

4243

Partners Healthcare, Beth Israel Deaconess Medical Center, University of Pittsburgh Medical Center

Internal Validation

30

30

350

953

WCM

Multi-site External Validation

100

100

1113

1886

Medical College of Wisconsin, Cincinnati Children’s Hospital Medical Center, The Children’s Hospital of Philadelphia, University of Missouri, Nationwide Children’s Hospital, Nemours Children’s Health System, Oregon Community Health Information Network, Seattle Children’s, UT Southwestern Medical Center, and Montefiore Medical Center

Population-level Prevalence Study

47,654

47,654

-

-

Weill Cornell Medicine, Medical College of Wisconsin, Cincinnati Children’s Hospital Medical Center, The Children’s Hospital of Philadelphia, University of Missouri, Nationwide Children’s Hospital, Nemours Children’s Health System, Oregon Community Health Information Network, Seattle Children’s, UT Southwestern Medical Center, and Montefiore Medical Center