Table 2 Overall performance on MH-1813 test data, performance without 1-1-2 training data, and performance on data from 2021 without diagnostic categories, as well as performance on MH-1813 based on demographic subgroups (age/sex) [mean (95% CI)].

From: A retrospective study on machine learning-assisted stroke recognition for medical helpline calls

 

F1-score [%] ↑

Sensitivity [%] ↑

PPV [%] ↑

FOR [%] ↓ (1 - NPV)

FPR [%] ↓ (1 - specificity)

Overall

     

Call-takers

25.8 (23.7–27.9)

52.7 (49.2–56.4)

17.1 (15.5–18.6)

0.105 (0.094–0.116)

0.565 (0.539–0.590)

Model

35.7 (35.0–36.4)

63.0 (62.0–64.1)

24.9 (24.3–25.5)

0.082 (0.079–0.085)

0.419 (0.413–0.426)

w/o 1-1-2 training data

     

Model

32.4 (31.8–33.1)

60.4 (59.3–61.4)

22.2 (21.6–22.7)

0.088 (0.085–0.091)

0.467 (0.460–0.474)

2021 test data w/o category

     

Model

32.6 (31.9–33.4)

48.3 (47.2–49.4)

24.7 (23.9–25.3)

0.153 (0.148–0.158)

0.435 (0.427–0.443)

8–64 years

     

Call-takers

15.9 (13.1–18.5)

50.5 (43.6–57.2)

9.40 (7.61–11.18)

0.036 (0.028–0.043)

0.353 (0.331–0.375)

Model

22.9 (21.8–24.0)

54.1 (52.1–56.3)

14.5 (13.8–15.3)

0.033 (0.031–0.035)

0.231 (0.226–0.236)

65+ years

     

Call-takers

32.9 (30.1–35.7)

53.5 (49.4–57.6)

23.7 (21.4–26.0)

0.401 (0.352–0.449)

1.467 (1.373–1.560)

Model

42.8 (41.9–43.7)

66.3 (65.1–67.5)

31.6 (30.8–32.4)

0.290 (0.278–0.303)

1.224 (1.198–1.249)

Male

     

Call-takers

30.2 (27.2–33.3)

53.9 (49.1–58.9)

21.0 (18.5–23.5)

0.124 (0.105–0.141)

0.542 (0.506–0.580)

Model

39.0 (38.0–40.1)

63.7 (62.3–65.2)

28.1 (27.3–29.0)

0.097 (0.093–0.102)

0.435 (0.425–0.445)

Female

     

Call-takers

21.9 (19.1–24.6)

51.3 (46.0–56.6)

13.9 (12.0–15.8)

0.090 (0.076–0.103)

0.582 (0.547–0.616)

Model

32.4 (31.4–33.4)

62.3 (60.7–63.8)

21.9 (21.1–22.7)

0.069 (0.066–0.073)

0.407 (0.399–0.416)

  1. NPV negative predictive value, PPV positive predictive value, FOR false omission rate, FPR false positive rate, CI confidence interval.