Fig. 5: Learning curves for nine MADRS topics under the flexible accuracy criterion.
From: Using a fine-tuned large language model for symptom-based depression evaluation

Each line corresponds to one MADRS item. For each outer cross-validation fold, models were trained on increasing fractions (5–80%) of the entire dataset, with evaluation always performed on a fixed validation set comprising 20% of the full dataset. The x-axis indicates the proportion of the full dataset used for training, while the y-axis shows the mean flexible accuracy ( ± 1) across outer folds. Error bars indicate the standard error of the mean (SEM) across folds. Flexible accuracy considers predictions correct if they fall within ±1 of the true score, reflecting clinically acceptable variation.