Table 2 Compares the accuracy (%) of different prompting strategies on the BulkModulus dataset

From: SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction

Method

Qwen2.5-7B-Instruct-Turbo

Mistral-7B-Instruct

gemma-2-9b-it

Llama-3.2-

11B-version

Llama-3.1-8B-

Instruct-Turbo

gpt-4o

BulkModulus

Zero-shot CoT

74.42

61.01

59.28

74.05

84.76

98.32

Few-shot CoT

59.22

60.34

79.89

75.14

80.9

99.44

ToT

76.97

52.35

66.48

47.4

71.6

98.88

RAP

71.34

39.13

78.09

69.57

71.26

96.65

MoA

87.90% (±1.71%)

SLM-MATRIX

92.85% (±2.05%)

  1. Results for MoA and SLM-MATRIX are reported as the mean ± standard deviation across three independent runs. Results for other methods are from single-run evaluations. All values are reported as percentages.
  2. The bold values highlight the best-performing result within a given comparison group.