Fig. 5: Uncertainty analysis across demographic groups for models trained with 25% and 100% of labeled data. Violin plots show entropy distributions for four models across.

a race, b gender, and c age groups. Light colors indicate 25% training data; dark colors indicate 100% data. Ci-SSGAN (gray) maintains lowest uncertainty ( < 0.2 entropy) regardless of data size, while BERT models (red/green) show high uncertainty with 25% data (0.6–1.4 entropy) that reduces substantially with 100% data (0.2–0.4 entropy). SSGAN (blue) exhibits intermediate performance. The horizontal line indicates the median and the error bars represent the minimum and maximum values observed for each group.