Table 3 Clinical utility of EHR-based model compared to traditional risk factors and genetic variants

From: Enhancing EHR-based pancreatic cancer prediction with LLM-derived embeddings

	CUMC
Time window (case/control)	6-12 m (338/157,067)
	Max F1			Max PPV			Number of screening (% total pop) needed for \({\boldsymbol{x}}\)% case detection
	TP	FP	PPV	TP	FP	PPV	\(x=20 \%\) (TP = 68)	\(x=50 \%\) (TP = 169)	\(x=80 \%\) (TP = 270)
EHR base (AUROC 0.742)	30	279	0.097	10	37	0.213	4851 (3%)	26757 (17%)	76666 (49%)
EHR gpt (AUROC 0.755)	22	134	0.141	13	43	0.232	4690 (3%)	26349 (17%)	68453 (43%)
One of risk factors	96	24950	0.004	96	24950	0.004
All risk factors	0	54	0.048	0	54	0.048
TP overlap b/w model & one of risk factors	base: 18 (60%)			base: 5 (50%)			base: 43 (63%)	base: 79 (47%)	base: 94 (35%)
TP overlap b/w model & one of risk factors	gpt: 13 (59%)			gpt: 7 (54%)			gpt: 48 (71%)	gpt: 72 (43%)	gpt: 90 (33%)
	CSMC
Time window (case/control)	6-12 m (216/92,084)
	Max F1			Max PPV			Number of screening (% total pop) needed for \({\boldsymbol{x}}\)% case detection
	TP	FP	PPV	TP	FP	PPV	\(x=20 \%\) (TP = 43)	\(x=50 \%\) (TP = 108)	\(x=80 \%\) (TP = 172)
EHR base (AUROC 0.810)	15	164	0.084	2	1	0.667	2234 (2%)	12500 (14%)	32320 (35%)
EHR gpt (AUROC 0.830)	34	2852	0.012	1	61	0.016	4061 (4%)	11413 (12%)	25205 (27%)
One risk factors	95	14311	0.007	95	14311	0.007
All risk factors	1	60	0.016	1	60	0.016
Genetic variants	5	0	1	5	0	1
TP overlap b/w model & one of risk factors	base: 2 (13%)			base: 0 (0%)			base: 6 (14%)	base: 16 (15%)	base: 19 (11%)
TP overlap b/w model & one of risk factors	gpt: 1 (3%)			gpt: 0 (0%)			gpt: 2 (5%)	gpt: 14 (13%)	gpt: 18 (10%)
TP overlap b/w model & genetic variants	base: 0 (0%)			base: 0 (0%)			base: 0 (0%)	base: 0 (0%)	base: 0 (0%)
TP overlap b/w model & genetic variants	gpt: 0 (0%)			gpt: 0 (0%)			gpt: 0 (0%)	gpt: 0 (0%)	gpt: 0 (0%)

Positive predictive values (PPVs) and the number of screenings required to achieve sensitivities of 20%, 50%, and 80% are presented. For the EHR-based model, both the maximum PPVs and the PPVs at the threshold that yields the maximum F1 score are shown. We used 6–12 m model presented in Fig. 3 in this analysis.

Back to article page

Quick links

Search

Quick links