Table 2 Performances of Fu-LLM with different strategies for adjudications of clinical events

From: A large language model for clinical outcome adjudication from telephone follow-up interviews: a secondary analysis of a multicenter randomized clinical trial

	Raw agreement, % (95% CI)	Sensitivity, % (95% CI)	Specificity, % (95% CI)	Positive predictive value, % (95% CI)	Negative predictive value, % (95% CI)
Performances of finetune_qwen2_7b
Whether the information came from the participant himself/herself	97.2 (96.1–98.2)	97.9 (96.7–98.9)	96.2 (94.5–97.9)	97.4 (96.1–98.7)	96.9 (95.2–98.6)
Whether the participant died	99.8 (99.5–100.0)	100.0 (100.0–100.0)	99.8 (99.5–100.0)	91.7 (79.3–100.0)	100.0 (100.0–100.0)
Whether the participant was hospitalized^a	82.7 (80.4–85.0)	88.9 (84.7–93.3)	91.3 (88.9–93.6)	79.1 (73.7–83.9)	95.7 (94.0–97.3)
Whether the participant underwent surgery^a	92.1 (90.3–93.8)	95.3 (91.8–100.0)	89.8 (87.3–92.5)	75.1 (69.5–80.5)	98.4 (97.0–99.4)
Whether the participant taken medication^a	96.4 (95.2–97.5)	99.8 (99.4–100.0)	91.0 (86.1–95.5)	98.5 (97.6–99.3)	98.5 (96.2–100.0)
Total	93.7 (93.1–94.3)	97.5 (96.7–98.2)	95.0 (94.2–95.8)	93.1 (91.9–94.2)	98.2 (97.8–98.7)
Performances of finetune_qwen2_7b_wo_aug
Whether the information came from the participant himself/ herself	92.7 (91.2–94.3)	95.5 (93.8–97.1)	88.7 (85.6–91.5)	92.5 (90.5–94.6)	93.1 (90.2–95.3)
Whether the participant died	99.6 (99.2–99.9)	100.0 (100.0–100.0)	99.6 (99.2–99.9)	84.6 (70.0–96.8)	100.0 (100.0–100.0)
Whether the participant was hospitalized^a	69.5 (66.5–72.3)	77.9 (72.6–83.1)	89.8 (87.0–92.2)	73.6 (67.9–79.8)	91.7 (89.4–94.0)
Whether the participant underwent surgery^a	87.5 (85.5–89.5)	87.7 (82.8–92.4)	85.7 (82.8–88.7)	66.4 (59.7–72.8)	95.6 (93.7–97.4)
Whether the participant taken medication^a	94.8 (93.3–96.2)	98.9 (98.2–99.5)	81.4 (75.2–87.6)	96.8 (95.4–98.0)	92.9 (87.9–96.9)
Total	88.9 (88.1–89.7)	94.4 (93.3–95.4)	92.1 (91.1–93.1)	89.1 (87.7–90.5)	96.0 (95.2–96.7)
Performances of zero_shot_qwen2_7b
Whether the information came from the participant himself/ herself	87.2 (85.0–89.0)	91.2 (88.8–93.2)	81.4 (77.7–85.0)	87.8 (85.3–90.1)	86.3 (82.6–89.3)
Whether the participant died	99.4 (98.9–99.8)	100.0 (100.0–100.0)	99.4 (98.9–99.8)	78.6 (63.0–91.7)	100.0 (100.0–100.0)
Whether the participant was hospitalized^a	25.0 (22.4–27.5)	51.4 (45.0–57.9)	10.6 (8.0–13.1)	17.5 (14.5–20.5)	37.3 (29.9–44.9)
Whether the participant underwent surgery^a	67.3 (64.3–70.2)	76.6 (69.9–82.6)	64.4 (60.6–68.3)	40.9 (35.7–46.3)	89.5 (86.3–92.3)
Whether the participant taken medication^a	82.9 (80.5–85.2)	83.6 (81.0–85.9)	86.9 (81.3–92.1)	97.3 (96.1–98.4)	48.1 (42.3–54.2)
Total	72.5 (71.3–73.7)	82.1 (80.2–83.6)	70.3 (68.5–71.9)	65.5 (63.5–67.4)	85.1 (83.5–86.5)

CI confidence interval; GPT generative pretrained transformer.
^aFor participants who were reported as dead during follow-up, for humanitarian reasons, the follow-up staff would not inquire about the information of hospitalization, surgery or medication, therefore, these three events of the death cases would not be evaluated (22 recordings vignettes reported death events).

Back to article page

Table 2 Performances of Fu-LLM with different strategies for adjudications of clinical events

Search

Quick links