Table 4 Statistical comparison of classification performance between prompting strategies using McNemar’s test.
From: Automated RECIST tumor response classification through prompt-guided large language models
Comparison | p-value | Corrected p-value |
|---|---|---|
Zero-Shot vs. Few-Shot (5) | 0.4725 | 1.000 |
Zero-Shot vs. CoT + SC | 0.0000 | 0.0000 |
Few-Shot (5) vs. CoT + SC | 0.0000 | 0.0000 |