Extended Data Fig. 1: Identification of melanoma mutational signatures using NMF.

a, Determination of the optimal NMF decomposition rank (k) based on the average of the mean squared error (MSE) between observed trinucleotide mutation counts and predictions of masked values (y-axis) imputed by NMF. The average, calculated across three repetitions of 5-fold cross validation, is plotted against the decomposition ranks (x-axis). Error bars represent the standard error of the mean (SEM). b, Sample-wise Spearman’s correlation between the observed and NMF’s fitted trinucleotide mutation counts (n = 96 trinucleotide mutations, n = 1,014 tumors). The color gradient represents the number of mutated trinucleotides in each tumor sample and is meant to highlight that lower correlations result simply from the low sparsity of NMF’s fit. c, Percentage contribution of trinucleotide mutations for each mutational signature. d, Percent contribution of each mutational signature to the total number of mutations per tumor. e, The proportion of UVR-signature mutations per tumor. Melanoma subtypes are distinguished by different colours. f, Comparisons of our trinucleotide mutational signatures to the Catalogue of Somatic Mutations in Cancer (COSMIC) set of signatures. Left panels show the Person’s correlation (y-axis) between the percent contribution of trinucleotide mutations to our signatures (the values in c) and each of 65 signatures in COSMIC (x-axis) (n = 96 trinucleotide mutations). Right panels show the mean squared difference (y-axis) between the percent contributions (n = 96 trinucleotide mutations). g, Heatmap showing the column-sum normalized weights of COSMIC mutational signature (rows) in our set of 1,014 tumor samples (columns), estimated using non-negative linear regression (via the nnlm() function implemented in the NNLM R package). Our unsupervised estimates of mutation signature contributions are shown at the top. There is strong agreement between our estimates and those based on the COSMIC signatures.