We re-evaluated GPT-5 using our published pipelines: 500 emergency vignettes across 32 sociodemographic labels for bias, and adversarial prompts with fabricated details. GPT-5 showed no measurable improvement over GPT-4o in sociodemographic-linked decision variation, with several LGBTQIA+ groups flagged for mental-health screening in 100% of cases. Adversarial hallucination rates were higher (65% vs 53% for GPT-4o); a mitigation prompt reduced this to 7.67%.
- Mahmud Omar
- Reem Agbareia
- Eyal Klang