Fig. 4: Performance of the BELIVE algorithm on the TCGA DNA methylation microarray dataset (n = 4702).

a BELIVE performance in tissue of origin detection for patients diagnosed with primary cancers in the TCGA project. Sample size and recall are plotted at the top of the confusion matrix, while precision is plotted on the right. The columns in the matrix show the primary cancer sites predicted by BELIVE, and the rows show the actual sites. Colored squares along the diagonal represent the percentage of primary sites correctly identified by BELIVE. b The bar graph (top) shows the proportion of samples whose primary sites were correctly identified at different confidence levels; the area plots (bottom) show the proportion (y-axis) of samples whose primary sites were correctly classified with greater than or equal to a confidence level (x-axis). c ROC curves for prediction of primary sites in patients with primary cancer. d Top-k (k = 1, 2, 3, 4, 5) accuracies for predicting primary sites in patients with primary cancer. e Sensitivities of the BELIVE algorithm based on the top-k-ranked predictions. The red line shows the median sensitivity of BELIVE in predicting the primary sites of ten cancers, while the blue and green lines correspond to the sensitivities for the best and worst performing cancers. Source data are available in a supplementary file.