Fig. 4: Agreement between tabular and natural language processing-extracted vital signs.
From: Cohort design and natural language processing to reduce bias in electronic health records research

Depicted is agreement between vital signs obtained from tabular data and those obtained from our NLP model among individuals with values obtained on the same day. a Depict height values, b depict weight values, c depict systolic blood pressures, and d depict diastolic blood pressures. For individuals with multiple eligible values, only the pair most closely preceding the start of follow-up was used. Left panels show the distribution of values obtained from tabular versus NLP sources. Middle panels show the correlation between tabular values (x-axis) and NLP values (y-axis). Right panels are Bland–Altman plots showing agreement between paired tabular and NLP values. The x-axis depicts the increasing mean of the paired values, and the y-axis depicts the difference between the paired values, where positive values denote tabular values greater than corresponding NLP values and negative values denote tabular values lower than corresponding NLP values. The colored horizontal lines depict the mean difference between sources, and the hashed horizontal lines depict 1.96 standard deviations above and below the mean. The values corresponding to the bounds and percentage of values contained within those bounds is printed on each plot.