Table 2 Model performance—one-shot prompting
From: Privacy-preserving large language models for structured medical information retrieval
Sensitivity | Specificity | Positive predictive value | Negative predictive value | Accuracy | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7b | 13b | 70b | 7b | 13b | 70b | 7b | 13b | 70b | 7b | 13b | 70b | 7b | 13b | 70b | |
Ascites | 0.95 | 1.00 | 0.95 | 0.94 | 0.76 | 0.99 | 0.38 | 0.13 | 0.79 | 1.00 | 1.00 | 1.00 | 0.94 | 0.76 | 0.99 |
Abdominal pain | 0.99 | 0.95 | 0.92 | 0.18 | 0.68 | 0.92 | 0.22 | 0.40 | 0.72 | 0.99 | 0.98 | 0.98 | 0.33 | 0.73 | 0.92 |
Shortness of breath | 0.64 | 0.59 | 0.83 | 0.95 | 0.98 | 0.96 | 0.72 | 0.87 | 0.82 | 0.92 | 0.91 | 0.96 | 0.89 | 0.91 | 0.94 |
Confusion | 0.71 | 0.85 | 0.88 | 0.85 | 0.78 | 0.94 | 0.30 | 0.25 | 0.56 | 0.97 | 0.98 | 0.99 | 0.84 | 0.79 | 0.93 |
Liver cirrhosis | 1.00 | 1.00 | 1.00 | 0.65 | 0.76 | 0.97 | 0.14 | 0.18 | 0.69 | 1.00 | 1.00 | 1.00 | 0.67 | 0.77 | 0.98 |