Table 3 Performance comparison on the MatSynTriplet dataset

From: SLM-MATRIX: a multi-agent trajectory reasoning and verification framework for enhancing language models in materials data extraction

Method

Average Accuracy (%)

Performance Gap to GPT-4o

MatSynTriplet

MoA

69.86% ±2.66%

−15.49 pp

SLM-MATRIX

77.68% ±1.81%

−7.67 pp

GPT-4o

85.35%

0 (Baseline) pp

  1. The average accuracies (%) for MoA and SLM-MATRIX are reported over three independent runs. The result for GPT-4o (Few-shot CoT) is based on a single evaluation.