Fig. 2: Performance comparison on sequencing data from different platforms. | Nature Communications

Fig. 2: Performance comparison on sequencing data from different platforms.

From: Comprehensive identification of transposable element insertions using multiple sequencing technologies

Fig. 2

a The benchmark data (HG002) contain 1642 haplotype-resolved TE insertions, distributed as shown. Among the subfamilies, AluYa5 and AluYb8 comprise >60% of the Alus; L1Hs comprise >76% of L1s; and SVA_E and SVA_F comprise >67% of the SVAs. Notably, SVA_F1 and CH10_SVA_F make up 12% and 7%, respectively, of the SVA insertions. b HiFi long reads show better performance on sensitivity (91%, 93%, and 90% for Alu, L1, and SVA, respectively). Benefitting from the repeat type-specific filters, xTea shows high specificity (88%, 93%, and 86%, respectively) on short Illumina reads. Probably due to the smaller fraction of distinct molecules, 10X Linked Reads show lower specificity. c Detailed comparison of the number of shared TE insertions among platforms. 1223 insertions are common among all of the platforms, and 1015 (83%) of them fall in repetitive regions, out of which 261 (25.71%) fall in the same TE family while 754 (74.29%) in different TE families. 127 insertions that are only called from long reads, 116 (91.33%) fall in repetitive regions, and 82 (64.57%) are found located in repetitive regions of the same TE type. The zoomed in pie chart for the insertions exclusively called from long read shows that out of the 82 (65%) TE insertions that fall in the same context TE family, 27 (32.9%) are L1 insertions and 55 (67.1%) are Alu insertions. d Most of the insertions unique to long-read datasets fall in repetitive regions with low divergence rates or higher GC content, which make short-read alignment difficult.

Back to article page