Fig. 1: Word clouds illustrating top-weighted conditions for selected topics.

Conditions are sized according to probability within each topic and colored according to relevance, with positive relevance indicating conditions more probable in the topic than overall. Each condition displays the numeric OMOP concept ID encoding the relevant medical code used for clustering, as well as the first few words of the condition name. Per-topic statistics in panel headers show usage of each of each topic across sites (\(\rm{U}\), rounded to nearest 0.1%), topic uniformity across sites (\(\rm{H}\), 0–1, higher values being more uniform), and relative topic quality as a normalized coherence score (\(\rm{C}\), z-score, higher values being more coherent).