Table 2 Summary of the NLP/ML component outcomes.

Phenotype	People Involved	Charts reviewed	Precision	Recall	Comments
Chronic rhinosinusitis	2	126	76% → 78–83%	97% → 100%	Also significant improvement on specificity
ECG traits	1–3	1050	Cases: 80–100% Controls: 94–99%	N/A	Unable to extract 1 sub-phenotype; precision varied between sub-phenotypes
Systemic lupus erythematosus	2–3	1022	99% → 96%	79% → 91%	2/3 sub-phenotypes performed better at validation site
Asthma/chronic obstructive pulmonary disease overlap	1–2	300	90% → 91%	38% → 54%	Although overall improved, performed worse at validation site possibly due to how the ML model used counts of features
Familial hypercholesterolemia	1–4	150	96–98% → 74–96%	N/A	Negative predictive value decreased
Atopic dermatitis	1–3	150	73–79% → 72–84%	51–54% → 63–75%	Mixed results across sub-phenotypes & sites

The “People Involved” column lists the estimated number or range of full-time equivalent persons involved with all aspects of the implementation, and includes programmers, clinicians, and computational linguists. Charts reviewed is the total number of patients’ charts reviewed for each phenotype, a sum of the charts reviewed for cases, and controls if applicable, at both the lead and validating sites. Precision and Recall columns list those statistics for the original computable phenotype rule-based algorithm vs. the new computable phenotype rule-based algorithm with NLP components added: arrows indicate change in these statistics from these original to new phenotype algorithms. Some algorithms have a range for precision or recall as either multiple (secondary) validation sites reviewed patients’ charts from which accuracy statistics were calculated, or there were separate precision/recall measures for sub-phenotypes. N/A not applicable: recall was not targeted for improvement in all phenotypes; thus, it was not calculated for all phenotypes.

Quick links

Search