Extended Data Fig. 4: Genotyping and imputation of variants from RNA-Seq data.

(a) Distribution of numbers of SNPs directly called from RNA-Seq data across all 8,536 samples. (b) Concordance rates between genotypes (mean = 78,587, range = 47,407–113,868) called from RNA-Seq data and imputed genotypes (mean = 2.50 million, range = 1.20–2.73 million) in three tissues and those called from whole genome sequencing (WGS) data across four Holstein (HOL) animals. (c) Proportion of variants within functional categories using different imputation accuracy cutoffs. These results are derived from 109 Holstein animals with both RNA-seq and 50 K SNP array. ‘All.SNPs’ are those 31,377,923 imputed variants common in the two imputation processes (that is, the genotype imputation based on RNA-Seq SNPs and that based on SNP array). ‘imp.acc>=0.80.Aus’ are those imputed based on 50 K SNP array genotypes (Australian HOL animals) and variants with imputation accuracy DR2 > 0.80 were selected (n = 16,501,943). ‘imp.acc>=0.80.GTEx’ are those in the CattleGTEx data where the imputation was based on RNA-seq SNPs and variants with imputation accuracy DR2 > 0.80 were selected (n = 5,292,828). (d) Comparison of DR2 of SNPs imputed from SNP array (50 K) and those imputed from RNA-Seq SNPs along 1 Mb up-/down- stream of gene body. The up-/down-stream is divided into windows of 100 kb length, while the gene body region of each gene is evenly divided into 10 windows. The DR2 values of SNPs within each window are then averaged for plotting. (e) Pearson correlations of genotype counts between variants imputed from RNA-Seq SNPs and those from 50 K SNP arrays across different imputation quality cutoffs and chromosomes. The horizontal dashed line in each graph indicates the mean of correlations across chromosomes. (f) Distribution of identity by state (IBS) distance between all sample pairs. The IBS distance is calculated using PLINK v1.90 to measure the average proportion of alleles shared between samples. The sample pairs with IBS distance > 0.85 are considered as duplicated samples.