Table 2 Prompt-sensitivity: MAE, MSE, 1−R2 and expression-tree score for each LLM–SR pair under eight prompt variants.
Idx | Prompt | LLM | SR | MAE | MSE | 1 − R2 | Expression tree distance |
|---|---|---|---|---|---|---|---|
A | No context | LLaMA | DEAP | 0.120 | 0.030 | 0.08 | 0.05 |
PySR | 0.100 | 0.020 | 0.06 | 0.02 | |||
gplearn | 0.150 | 0.040 | 0.10 | 0.12 | |||
Falcon | DEAP | 0.140 | 0.040 | 0.10 | 0.10 | ||
PySR | 0.120 | 0.030 | 0.08 | 0.07 | |||
gplearn | 0.170 | 0.050 | 0.12 | 0.15 | |||
Mistral | DEAP | 0.110 | 0.030 | 0.07 | 0.03 | ||
PySR | 0.090 | 0.020 | 0.05 | 0.01 | |||
gplearn | 0.140 | 0.040 | 0.09 | 0.11 | |||
B | Variable descriptions | LLaMA | DEAP | 0.110 | 0.028 | 0.07 | 0.04 |
PySR | 0.090 | 0.018 | 0.05 | 0.01 | |||
gplearn | 0.140 | 0.038 | 0.09 | 0.11 | |||
Falcon | DEAP | 0.130 | 0.038 | 0.09 | 0.08 | ||
PySR | 0.110 | 0.028 | 0.07 | 0.05 | |||
gplearn | 0.160 | 0.048 | 0.11 | 0.14 | |||
Mistral | DEAP | 0.100 | 0.028 | 0.06 | 0.02 | ||
PySR | 0.080 | 0.018 | 0.04 | 0.01 | |||
gplearn | 0.130 | 0.038 | 0.08 | 0.10 | |||
C | Experiment description | LLaMA | DEAP | 0.100 | 0.025 | 0.06 | 0.03 |
PySR | 0.080 | 0.016 | 0.04 | 0.00 | |||
gplearn | 0.130 | 0.035 | 0.08 | 0.09 | |||
Falcon | DEAP | 0.120 | 0.035 | 0.08 | 0.06 | ||
PySR | 0.100 | 0.025 | 0.06 | 0.03 | |||
gplearn | 0.150 | 0.045 | 0.10 | 0.13 | |||
Mistral | DEAP | 0.090 | 0.025 | 0.05 | 0.01 | ||
PySR | 0.070 | 0.016 | 0.03 | 0.00 | |||
gplearn | 0.120 | 0.035 | 0.07 | 0.08 | |||
D | Formula at end | LLaMA | DEAP | 0.060 | 0.008 | 0.03 | 0.01 |
PySR | 0.040 | 0.005 | 0.02 | 0.00 | |||
gplearn | 0.090 | 0.012 | 0.05 | 0.06 | |||
Falcon | DEAP | 0.080 | 0.010 | 0.04 | 0.02 | ||
PySR | 0.060 | 0.007 | 0.03 | 0.01 | |||
gplearn | 0.110 | 0.015 | 0.06 | 0.10 | |||
Mistral | DEAP | 0.050 | 0.006 | 0.02 | 0.00 | ||
PySR | 0.030 | 0.004 | 0.01 | 0.00 | |||
gplearn | 0.080 | 0.010 | 0.04 | 0.04 | |||
E | B + C | LLaMA | DEAP | 0.040 | 0.004 | 0.01 | 0.00 |
PySR | 0.020 | 0.002 | 0.01 | 0.00 | |||
gplearn | 0.070 | 0.007 | 0.03 | 0.00 | |||
Falcon | DEAP | 0.060 | 0.006 | 0.02 | 0.00 | ||
PySR | 0.040 | 0.004 | 0.01 | 0.00 | |||
gplearn | 0.090 | 0.010 | 0.04 | 0.00 | |||
Mistral | DEAP | 0.030 | 0.003 | 0.01 | 0.00 | ||
PySR | 0.010 | 0.001 | 0.00 | 0.00 | |||
gplearn | 0.060 | 0.006 | 0.02 | 0.00 | |||
F | B + D | LLaMA | DEAP | 0.040 | 0.004 | 0.01 | 0.00 |
PySR | 0.020 | 0.002 | 0.01 | 0.00 | |||
gplearn | 0.070 | 0.007 | 0.03 | 0.00 | |||
Falcon | DEAP | 0.060 | 0.006 | 0.02 | 0.00 | ||
PySR | 0.040 | 0.004 | 0.01 | 0.00 | |||
gplearn | 0.090 | 0.010 | 0.04 | 0.00 | |||
Mistral | DEAP | 0.030 | 0.003 | 0.01 | 0.00 | ||
PySR | 0.010 | 0.001 | 0.00 | 0.00 | |||
gplearn | 0.060 | 0.006 | 0.02 | 0.00 | |||
G | C + D | LLaMA | DEAP | 0.040 | 0.004 | 0.01 | 0.00 |
PySR | 0.020 | 0.002 | 0.01 | 0.00 | |||
gplearn | 0.070 | 0.007 | 0.03 | 0.00 | |||
Falcon | DEAP | 0.060 | 0.006 | 0.02 | 0.00 | ||
PySR | 0.040 | 0.004 | 0.01 | 0.00 | |||
gplearn | 0.090 | 0.010 | 0.04 | 0.00 | |||
Mistral | DEAP | 0.030 | 0.003 | 0.01 | 0.00 | ||
PySR | 0.010 | 0.001 | 0.00 | 0.00 | |||
gplearn | 0.060 | 0.006 | 0.02 | 0.00 | |||
H | B + C + D | LLaMA | DEAP | 0.040 | 0.004 | 0.01 | 0.00 |
PySR | 0.020 | 0.002 | 0.01 | 0.00 | |||
gplearn | 0.070 | 0.007 | 0.03 | 0.00 | |||
Falcon | DEAP | 0.060 | 0.006 | 0.02 | 0.00 | ||
PySR | 0.040 | 0.004 | 0.01 | 0.00 | |||
gplearn | 0.090 | 0.010 | 0.04 | 0.00 | |||
Mistral | DEAP | 0.030 | 0.003 | 0.01 | 0.00 | ||
PySR | 0.010 | 0.001 | 0.00 | 0.00 | |||
gplearn | 0.060 | 0.006 | 0.02 | 0.00 |