Extended Data Fig. 2: Signature derivations.

a) Cosine similarity between input copy number 48 dimensional vectors, and signature reconstructed 48 dimensional vectors (y-axis) against number of segments in each copy number profile (x-axis). Dashed line indicates cosine similarity threshold for non-random similarity (P < 0.05). b) Cosine similarity between input copy number 48 dimensional vectors, and signature reconstructed 48 dimensional vectors (y-axis) against the number of signatures assigned in each sample (x-axis). Dashed line indicates cosine similarity threshold for non-random similarity (P < 0.05). Solid lines indicate median cosine similarity. The number of signatures is plotted offset by the quantile of the sample. c) Cosine similarity between input copy number 48 dimensional vectors, and signature reconstructed 48 dimensional vectors (y-axis) against the Shannon’s diversity of copy number states in input 48 dimensional vector (x-axis). Dashed line indicates cosine similarity threshold for non-random similarity (P < 0.05). d) Relationship between tumour purity (x-axis) and CN1 attribution (y-axis). If purity was a confounding factor for copy number calling, purity would be positively associated with CN1 attribution due to a reduced power to call copy number alterations, however, the opposite relationship is seen here. e) Relationship between tumour purity (x-axis) and Shannon’s diversity of attributed copy number signatures (y-axis). If purity was a confounding factor for copy number calling, purity might be expected to negatively associate with diversity due to reduced power to call copy number alterations, however, no such association is seen here. f) Three artefactual signatures identified in the TCGA pan-cancer analysis. Artefactual signatures are typified by a large number of homozygous deletions (top two), or small segment sizes of equal copy number in LOH and heterozygous segments (bottom). g) Maximum cosine similarities between each WGS signature and any SNP6 identified signatures (i.e. closest matching signature cosine similarity, y-axis) from 512 samples, with varying numbers of signatures decomposed (x-axis). h) Cosine similarities between WGS (x-axis) and SNP6 (y-axis) identified signatures from 512 samples, with a segmentation penalty of 70. i) Maximum cosine similarities between each exome signature and any SNP6 identified signatures (i.e. closest matching signature cosine similarity, y-axis) from 282 samples, with varying numbers of signatures decomposed (x-axis). j) Cosine similarities between exome and SNP6 identified signatures from 282 samples, with a segmentation penalty of 70 and suggested number of signatures extracted. k) Maximum cosine similarities between each ABSOLUTE-derived signature and any ASCAT-derived signatures (i.e. closest matching signature cosine similarity, y-axis) from 3,175 samples, with varying numbers of signatures decomposed (x-axis). l) Cosine similarities between ABSOLUTE-derived and ASCAT-derived signatures from 3,175 samples, with four signatures extracted in each dataset.