Table 6 Manual performance evaluation for LLMS before fine-tuning.
GPT-4 | Yi | InternLM2 | Mixtral | |
---|---|---|---|---|
Aim | 1.0 | 0.71 | 0.89 | 0.55 |
Motivation | 1.0 | 0.65 | 0.90 | 0.61 |
Methods | 0.97 | 0.68 | 0.88 | 0.59 |
Question addressed | 0.98 | 0.73 | 0.81 | 0.68 |
Evaluation metrics | 1.0 | 0.55 | 0.65 | 0.42 |
Result | 0.97 | 0.70 | 0.91 | 0.65 |
Limitations | 0.90 | 0.51 | 0.61 | 0.62 |
Contribution | 1.0 | 0.72 | 0.85 | 0.75 |
Future work | 0.92 | 0.68 | 0.77 | 0.56 |
Average | 0.97 | 0.65 | 0.80 | 0.60 |