Table 5 Performance metrics of different models

From: Exploiting large language models for diagnosing autism associated language disorders and identifying distinct features

Model

Accuracy

PPV

Sensitivity

F1 score

XLNet30

58.76%

54.38%

58.76%

56.07%

ALBERT31

69.07%

47.71%

69.07%

56.44%

DistilBERT32

58.76%

51.85%

58.76%

54.44%

RoBERTa33

57.73%

58.12%

57.73%

57.92%

BERT34

63.92%

60.86%

63.92%

61.87%

Mistral - 7b

70.29%

70.29%

100.00%

82.55%

Qwen2 - 7b

70.29%

70.29%

100.00%

82.55%

Gemini 2.0 Flash-based

63.47%

72.34%

77.97%

75.05%

GPT 4o - based

67.05%

70.76%

90.51%

79.43%

GPT 3.5 - based

69.14%

71.52%

93.22%

80.94%