Extended Data Fig. 2: Positional G > T variants in miRNA sequencing associate with LGG and LIHC patients.

a, 4G > T miR-124 rate (%) and miR-124 expression (RPM; miRNA-Seq data) in LGG patients (n = 512); High (upper quartile; red box) and low 4G > T ratio patients (lower quartile; blue circles) with miR-124 expression (RPM > 100) were selected for survival analysis (Fig. 2b, right panel). b, In let-7 family members (left panel; seed region, red line), frequency of SNVs at G of position 2 and 4 in LGG patients are separately analyzed (RPM in box plots, right panel); the lines represent the median, first and third quartiles, the whiskers denote the minima and maxima, and ‘x’ marks the mean value. c, Hierarchical clustering of patients with LGG depending on SNV rates (SNV read/corresponding miRNA read, %; heatmap) in let-7a sequence (position 1-18); Positions 2 to 4 are zoomed-in (right panel). 4G > T and 2G > T clusters are highlighted (dashed line boxes). d, Kaplan-Meier survival analysis for the LGG patients in the 2G > T let-7a cluster, comparing high (upper quartile) versus low 4G > T ratio patients (lower quartile) with let-7a expression (RPM > 1000); P = 0.0014, two-sided log-rank test. e, Hierarchical clustering analysis as in c except for miR-122 in LIHC patients (n = 372); 2G > T miR-122 cluster, blue; 3G > T cluster, red. f, 3G > T miR-122 rate (%) and miR-122 expression (RPM; miRNA-Seq data) in LIHC patients, used to select high (upper quartile; red box) and low 3G > T patients (lower quartile; blue circles) with miR-122 expression (RPM > 10000). G, Entire clustering results of Fig. 2h (yellow boxed region), analyzed for the let-7 seed regions (position 2–8) in the matched normal-tumor pairs of LIHC patients (n = 47). H, Same analysis as in g except derived from all LIHC patients (n = 372). I, Distribution of 2G > T let-7 frequency (%) depending on patient survival (years, y), analyzed by box plots (upper graph, as displayed in b) and heatmap (lower panel). All survival analyses were derived from repeated data with biologically independent samples (TCGA).