Fig. 2: Performance of the SLM-MATRIX framework on the BulkModulus dataset.

The bar chart compares the accuracy of various language models and reasoning strategies. Our proposed framework, SLM-MATRIX (highlighted in red), is benchmarked against the state-of-the-art model GPT-4o (orange bars) and other open-source models (blue bars). The y-axis lists the combination of a base model and a reasoning strategy. The x-axis represents the task accuracy in percent. The vertical dashed line indicates the performance level achieved by SLM-MATRIX for easy comparison against all other methods.