Table 2 Performance metrics of the fine-tuned MADRS-BERT and BERT-base models under strict and flexible criteria for accuracy

From: Using a fine-tuned large language model for symptom-based depression evaluation

 

MADRS-BERT

BERT-base

MADRS Item

Accuracy ↑ [%] Flexible

Accuracy ↑ [%] Strict

Accuracy ↑ [%] Flexible

Accuracy ↑ [%] Strict

Reported sadness

80 ( ± 0.03)

40 ( ± 0.07)

29 ( ± 0.04)

14 ( ± 0.03)

Inner tension

88 ( ± 0.06)

49 ( ± 0.10)

25 ( ± 0.04)

12 ( ± 0.07)

Sleep disturbances

82 ( ± 0.08)

44 ( ± 0.09)

30 ( ± 0.07)

17 ( ± 0.07)

Loss of appetite

79 ( ± 0.04)

43 ( ± 0.12)

33 ( ± 0.06)

20 ( ± 0.07)

Difficulties concentrating

83 ( ± 0.08)

40 ( ± 0.14)

31 ( ± 0.06)

15 ( ± 0.06)

Lassitude

86 ( ± 0.07)

46 ( ± 0.16)

31 ( ± 0.09)

19 ( ± 0.08)

Emotional numbness

80 ( ± 0.12)

35 ( ± 0.11)

33 ( ± 0.11)

20 ( ± 0.08)

Pessimistic thoughts

85 ( ± 0.07)

41 ( ± 0.10)

26 ( ± 0.04)

14 ( ± 0.05)

Suicidal ideations

83 ( ± 0.10)

44 ( ± 0.10)

32 ( ± 0.05)

17 ( ± 0.04)

  1. Mean and standard deviation of accuracies across five folds. Strict evaluation for accuracy considers exact score predictions, while flexible evaluation allows a deviation of ±1 from the actual score. Bold numbers highlight the best results.