In a benchmark analysis comprising 5,609 clinical questions developed by 101 community health workers from Rwanda, a panel of 5 general large language models performed better than humans across all metrics.
- Samuel Rutunda
- Gwydion Williams
- Bilal A. Mateen