Table 2 Compares the accuracy (%) of different prompting strategies on the BulkModulus dataset
Method | Qwen2.5-7B-Instruct-Turbo | Mistral-7B-Instruct | gemma-2-9b-it | Llama-3.2- 11B-version | Llama-3.1-8B- Instruct-Turbo | gpt-4o |
---|---|---|---|---|---|---|
BulkModulus | ||||||
Zero-shot CoT | 74.42 | 61.01 | 59.28 | 74.05 | 84.76 | 98.32 |
Few-shot CoT | 59.22 | 60.34 | 79.89 | 75.14 | 80.9 | 99.44 |
ToT | 76.97 | 52.35 | 66.48 | 47.4 | 71.6 | 98.88 |
RAP | 71.34 | 39.13 | 78.09 | 69.57 | 71.26 | 96.65 |
MoA | 87.90% (±1.71%) | |||||
SLM-MATRIX | 92.85% (±2.05%) |