Table 2 Zero-shot, few-shot (5) and Chain of thought prediction performance with Data are precision, recall, F1-score and accuracy.

From: Automated RECIST tumor response classification through prompt-guided large language models

Class

Precision

Recall

F1-Score

n

BL

1

0,87

0,93

23

CR

0,053

1

0,1

1

PD

1

0,8

0,889

10

PR

0,643

0,551

0,593

49

SD

0,714

0,588

0,645

34

Accuracy

  

0,65 [0,564 – 0,735]

117

Macro avg

0,682

0,762

0,632 [0,555 – 0,700]

117

Micro avg

0,759

0,65

0,696 [0,564 – 0,735]

117

  1. The F1 score was calculated as the harmonic mean of precision (also known as positive predictive value) and recall (also known as sensitivity). The micro scores were computed by aggregating the true-positive, false-negative, and false-positive findings across all classes. The macro scores was computed by calculating the scores for each class individually and then averaging them, giving equal weight to each class regardless of its size.