Extended Data Fig. 1: Optimal relative tolerance for the selection of best NMF runs. | Nature Cancer

Extended Data Fig. 1: Optimal relative tolerance for the selection of best NMF runs.

From: A practical framework and online tool for mutational signature analyses show intertissue variation and driver dependencies

Extended Data Fig. 1

(a-f) Repeated NMF application to the Breast 560 dataset (560 patient samples and 1000 NMF runs) of SNV organized in 96 channels. a, Distribution of the optimal Kullback-Leibler divergence (KLD) obtained from 1000 NMF runs (n = 1000) and for different number of k mutational signatures extracted (k from 9 to 13). Red vertical lines indicate the best (lowest) KLD, the 0.1% relative tolerance (RTOL) from best and the 1% RTOL from best. b, Convergence of global minimum for different k values. The 1000 values of optimal KLD in (a) are randomly ordered 50 times and the minimum KLD after each run is computed for each ordered sequence. Average (solid lines) and standard deviation (dotted lines) are then plotted. Red horizontal lines indicate the best KLD and 0.1% RTOL from best. c, The same KLD values from the five plots in (a) are combined in one single plot. (d-f) PCA plots of mutational signatures obtained from the Breast 560 catalogue, with number of signatures k = 10. In each row, three plots show principal components (PC) 1 with 2, 1 with 3 and 2 with 3, using the same projection of the first row. Colors indicate clusters computed with the clustering with matching algorithm, triangles are the medoids of the clusters and on top of the triangles the most similar COSMIC signatures (or sum of signatures), according to cosine similarity, are indicated. A black line connects the two closest medoids according to cosine similarity. The cosine similarity of the two closest medoids (max cos sim of medoids) and the average silhouette width (ASW) are indicated for each row. d, PCA plot obtained using 1000 NMF runs (n = 1000). e, PCA plot obtained using only the NMF runs within 1% RTOL from the best run. f, PCA plot obtained using only the NMF runs within 0.1% RTOL from the best run. The 1000 NMF runs used in this plot are the same as in panel (a) (k = 10). (g-h) Repeated NMF application to the Breast 560 dataset and additional PCAWG datasets. ASW of clustering mutational signatures from best NMF runs for different values of relative tolerance (g) or different number of total NMF runs (h). g, For each of the six datasets and for different number of mutational signatures (n sig), multiple NMF runs are performed (1000 for Breast 560 and 500 for the others). A relative tolerance (RTOL) with respect to the best (lowest) optimization function value obtained is used to select a subset of best NMF runs, that is all runs with optimization function value less or equal to best*(1 + RTOL). For each selected set of best runs, the obtained signatures are clustered using clustering with matching, and the ASW is computed. The six plots show the value of the ASW for different values of RTOL and number of signatures extracted (n sig). h, For each of the six datasets and for different number of mutational signatures (n sig), multiple NMF runs are performed (plot x axis). A relative tolerance (RTOL = 0.1%) with respect to the best (lowest) optimization function value obtained is used to select a subset of best NMF runs, that is all runs with optimization function value less or equal to best*(1.001). For each selected set of best runs, the obtained signatures are clustered using clustering with matching, and the ASW is computed. The six plots show the average of n = 10 replicates of the ASW for different number of total NMF runs performed and number of signatures extracted (n sig). Detailed data for the analyses shown in this figure can be found in Supplementary Tables 1 and 112.

Back to article page