Table 6 Manual performance evaluation for LLMS before fine-tuning.

From: A scientific-article key-insight extraction system based on multi-actor of fine-tuned open-source large language models

 

GPT-4

Yi

InternLM2

Mixtral

Aim

1.0

0.71

0.89

0.55

Motivation

1.0

0.65

0.90

0.61

Methods

0.97

0.68

0.88

0.59

Question addressed

0.98

0.73

0.81

0.68

Evaluation metrics

1.0

0.55

0.65

0.42

Result

0.97

0.70

0.91

0.65

Limitations

0.90

0.51

0.61

0.62

Contribution

1.0

0.72

0.85

0.75

Future work

0.92

0.68

0.77

0.56

Average

0.97

0.65

0.80

0.60