Table 2 Summary of the NLP/ML component outcomes.

From: Evaluation of the portability of computable phenotypes with natural language processing in the eMERGE network

Phenotype

People Involved

Charts reviewed

Precision

Recall

Comments

Chronic rhinosinusitis

2

126

76% → 78–83%

97% → 100%

Also significant improvement on specificity

ECG traits

1–3

1050

Cases: 80–100% Controls: 94–99%

N/A

Unable to extract 1 sub-phenotype; precision varied between sub-phenotypes

Systemic lupus erythematosus

2–3

1022

99% → 96%

79% → 91%

2/3 sub-phenotypes performed better at validation site

Asthma/chronic obstructive pulmonary disease overlap

1–2

300

90% → 91%

38% → 54%

Although overall improved, performed worse at validation site possibly due to how the ML model used counts of features

Familial hypercholesterolemia

1–4

150

96–98% → 74–96%

N/A

Negative predictive value decreased

Atopic dermatitis

1–3

150

73–79% → 72–84%

51–54% → 63–75%

Mixed results across sub-phenotypes & sites

  1. The “People Involved” column lists the estimated number or range of full-time equivalent persons involved with all aspects of the implementation, and includes programmers, clinicians, and computational linguists. Charts reviewed is the total number of patients’ charts reviewed for each phenotype, a sum of the charts reviewed for cases, and controls if applicable, at both the lead and validating sites. Precision and Recall columns list those statistics for the original computable phenotype rule-based algorithm vs. the new computable phenotype rule-based algorithm with NLP components added: arrows indicate change in these statistics from these original to new phenotype algorithms. Some algorithms have a range for precision or recall as either multiple (secondary) validation sites reviewed patients’ charts from which accuracy statistics were calculated, or there were separate precision/recall measures for sub-phenotypes. N/A not applicable: recall was not targeted for improvement in all phenotypes; thus, it was not calculated for all phenotypes.