Fig. 5: Reference-based multiple disease classification of unseen TCR repertoire samples.

a Violin plot showing the distribution of class fractions of cancer, COVID-19, multiple sclerosis (MS) patients and healthy controls (HC). Cancer fraction was calculated as the proportion of query TCRs clustered with reference TCRs from the cancer patients. Other class fractions were defined in the same way. Sample size: HC: n = 153, COVID-19: n = 193, Cancer: n = 62, MS: n = 6. b ROC curves using disease class fractions as single predictor for pairwise separation of the four disease classes. The fraction was the percentage of TCRs co-clustered with a given class of samples in the reference dataset. AUC values were labeled at the bottom right of each panel. 95% confidence intervals were calculated using 2,000 stratified bootstraps.