Table 1 Zero-shot, few-shot (5) and Chain of thought prediction performance with Data are precision, recall, F1-score and accuracy.

From: Automated RECIST tumor response classification through prompt-guided large language models

Class

Precision

Recall

F1-Score

n

BL

1

0,844

0,915

32

CR

0,357

0,833

0,5

6

PD

0,588

0,909

0,714

11

PR

0,708

0,793

0,748

58

SD

0,737

0,4

0,519

35

Accuracy

  

0,718 [0,641 – 0,789]

142

Macro avg

0,678

0,756

0,679 [0,581 – 0,760]

142

Micro avg

0,757

0,718

0,716 [0,641 – 0,789]

142

  1. The F1 score was calculated as the harmonic mean of precision (also known as positive predictive value) and recall (also known as sensitivity). The micro scores were computed by aggregating the true-positive, false-negative, and false-positive findings across all classes. The macro scores was computed by calculating the scores for each class individually and then averaging them, giving equal weight to each class regardless of its size.