Table 6 Effectiveness analysis of the SLM-MATRIX generator in answer verification across datasets

From: SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction

Generator

Llama-3.1-8B-Instruct-Turbo

Mistral-7B-Instruct

GSM8K

BulkModulus

GSM8K

BulkModulus

Answer verification

Maj

SLM-MATRIX

Maj

SLM-MATRIX

RAP

74.07

76.35

49.81

52.54

SC(@64)

82.18

86.43

54.36

58.61

SLM-MATRIX

87.57

90.83

58.98

63.46

  1. The bold values highlight the best-performing result within a given comparison group.