Supplementary Fig. 2: Characterizing the conversion rate, C-reads and m5C site distribution in different studies.
From: Genome-wide identification of mRNA 5-methylcytosine in mammals

(a) Comparison of conversion rates among different studies. The raw sequencing data from all studies were mapped using our pipeline and all annotated transcripts were used to estimate the overall conversion rate. Conversion rates estimated using ERCC mixes were also shown in our samples. Annot., all genes from Ensembl annotation. (b) The cumulative distributions of C-reads among different studies. The number of Cs (C-content) in each C-read was shown on x axis. A lowly converted sample and a highly converted sample in this study, along with three samples from other studies were shown as examples. The rankings of the samples based on the overall conversion rate in the studies were given in parentheses. Forward (blue) and reverse (orange) reads were plotted separately. The dashed line indicates C-content of 3. c, Comparison of gene-specific conversion rates among BS-seq data generated by different methods. The distributions of gene-specific conversion rates in samples constructed by different BS-seq library construction protocols. The same 5 samples as in b were shown as examples. Genes with two C-position coverage cutoffs were shown: >1,000 (grey) and >10,000 (blue). Overall conversion rate (orange) and conversion rate estimated by ERCC mixes (red) were indicated as dashed lines. d, Stacked barplot showing the number of genes with different numbers of putative m5C sites. Sites reported in the original studies were used for analysis. The mRNA m5C site list of Legrand et al. is not available. e, Stacked barplot showing the number of m5C sites in different cluster statuses (Methods). f, Barplot showing the Gini coefficient in each sample.