Fig. 1: Underreporting can skew observed relative prevalences and conceal health disparities.

PURPLE is designed to estimate the relative prevalence while correcting for underreporting. A Underreporting leads to inaccurate observed relative prevalences. Understanding the relative prevalence of a health condition between groups g—for example, men and women—is important to effective medical care. However, these estimates are often based on diagnoses s (i.e., positive diagnosis vs. no diagnosis) instead of the true patient state y (sick vs. not sick). Underreporting, which is known to vary by demographic groups, leads to inaccurate relative prevalence estimates that can hide the groups most affected by a condition. B PURPLE uses data on patient diagnoses s, symptoms x, and group membership g to accurately estimate the relative prevalence of a condition. PURPLE first estimates the group-specific diagnosis probability, p(s = 1∣y = 1, g), and disease likelihood, p(y = 1∣x), up to constant multiplicative factors and then combines these estimates to compute the relative prevalence. We show this is possible under three widely-made assumptions: no false positives, random diagnosis within groups, and constant p(y = 1∣x) between groups.