Table 1 Zero-shot, few-shot (5) and Chain of thought prediction performance with Data are precision, recall, F1-score and accuracy.

Class	Precision	Recall	F1-Score	n
BL	1	0,844	0,915	32
CR	0,357	0,833	0,5	6
PD	0,588	0,909	0,714	11
PR	0,708	0,793	0,748	58
SD	0,737	0,4	0,519	35
Accuracy			0,718 [0,641 – 0,789]	142
Macro avg	0,678	0,756	0,679 [0,581 – 0,760]	142
Micro avg	0,757	0,718	0,716 [0,641 – 0,789]	142

The F1 score was calculated as the harmonic mean of precision (also known as positive predictive value) and recall (also known as sensitivity). The micro scores were computed by aggregating the true-positive, false-negative, and false-positive findings across all classes. The macro scores was computed by calculating the scores for each class individually and then averaging them, giving equal weight to each class regardless of its size.

Quick links

Search