Table 4 Statistical comparison of classification performance between prompting strategies using McNemar’s test.

From: Automated RECIST tumor response classification through prompt-guided large language models

Comparison

p-value

Corrected p-value

Zero-Shot vs. Few-Shot (5)

0.4725

1.000

Zero-Shot vs. CoT + SC

0.0000

0.0000

Few-Shot (5) vs. CoT + SC

0.0000

0.0000