Table 2 Compares the accuracy (%) of different prompting strategies on the BulkModulus dataset

Method	Qwen2.5-7B-Instruct-Turbo	Mistral-7B-Instruct	gemma-2-9b-it	Llama-3.2- 11B-version	Llama-3.1-8B- Instruct-Turbo	gpt-4o
BulkModulus
Zero-shot CoT	74.42	61.01	59.28	74.05	84.76	98.32
Few-shot CoT	59.22	60.34	79.89	75.14	80.9	99.44
ToT	76.97	52.35	66.48	47.4	71.6	98.88
RAP	71.34	39.13	78.09	69.57	71.26	96.65
MoA	87.90% (±1.71%)
SLM-MATRIX	92.85% (±2.05%)

Results for MoA and SLM-MATRIX are reported as the mean ± standard deviation across three independent runs. Results for other methods are from single-run evaluations. All values are reported as percentages.
The bold values highlight the best-performing result within a given comparison group.

Quick links

Search