Table 4 Benchmarking Ci-SSGAN against state-of-the-art large language models for automated glaucoma classification
Model | Class | Samples | Accuracy | F1 score | AUROC | AUCPR |
|---|---|---|---|---|---|---|
GPT-4o | Non-GL | 38 | 0.474 | 0.451 | 0.689 | 0.273 |
OAG/S | 60 | 0.667 | 0.537 | 0.741 | 0.362 | |
ACG/S | 49 | 0.601 | 0.625 | 0.766 | 0.461 | |
XFG/S | 55 | 0.752 | 0.793 | 0.858 | 0.679 | |
PDG/S | 45 | 0.801 | 0.816 | 0.883 | 0.701 | |
SGL | 47 | 0.552 | 0.709 | 0.775 | 0.613 | |
Overall | 294 | 0.641 | 0.655 | 0.785 | 0.515 | |
Med-Gemma | Non-GL | 38 | 0.184 | 0.187 | 0.534 | 0.141 |
OAG/S | 60 | 0.451 | 0.274 | 0.491 | 0.201 | |
ACG/S | 49 | 0.122 | 0.135 | 0.492 | 0.165 | |
XFG/S | 55 | 0.073 | 0.098 | 0.488 | 0.184 | |
PDG/S | 45 | 0.045 | 0.062 | 0.486 | 0.151 | |
SGL | 47 | 0.085 | 0.101 | 0.484 | 0.157 | |
Overall | 294 | 0.160 | 0.143 | 0.496 | 0.167 | |
LLaMA-3.2 | Non-GL | 38 | 0.421 | 0.552 | 0.703 | 0.412 |
OAG/S | 60 | 0.834 | 0.565 | 0.774 | 0.391 | |
ACG/S | 49 | 0.184 | 0.305 | 0.590 | 0.302 | |
XFG/S | 55 | 0.364 | 0.421 | 0.640 | 0.301 | |
PDG/S | 45 | 0.712 | 0.736 | 0.836 | 0.586 | |
SGL | 47 | 0.660 | 0.554 | 0.761 | 0.369 | |
Overall | 294 | 0.529 | 0.522 | 0.717 | 0.394 | |
Ci-SSGAN | Non-GL | 38 | 0.818 | 0.750 | 0.915 | 0.722 |
OAG/S | 60 | 0.955 | 0.857 | 0.955 | 0.863 | |
ACG/S | 49 | 1.000 | 0.970 | 0.994 | 0.962 | |
XFG/S | 55 | 0.750 | 0.828 | 0.942 | 0.880 | |
PDG/S | 45 | 0.667 | 0.800 | 0.939 | 0.835 | |
SGL | 47 | 0.853 | 0.855 | 0.949 | 0.847 | |
Overall | 294 | 0.840 | 0.843 | 0.949 | 0.852 |