Extended Data Fig. 4: Genome-wide sequence variation of the 22 tea genomes.
From: Gene mining and genomics-assisted breeding empowered by the pangenome of tea plant Camellia sinensis

a) Assessment of the Synteny relationship using Synteny Relationship Index (SRI) among 7 plant pan-genomes. These species include Camellia sinensis (Cs, TE ratio, 78.2%, n = 22), Solanum tuberosum L (St, 63.0%, n = 44), Solanum lycopersicum (Sly, 60.7%, n = 31), Arabidopsis thaliana (At, 17.6%, n = 8), Sorghum bicolor L (Sb, 61.0%, n = 13), Zea mays (Zm, 83.2%, n = 10), Oryza sativa L(Os, 52.9%, n = 33). Each box plot shows the distribution of data, with the median value represented by the bold line at the center of the box. The box itself represents the first (25%) and third (75%) quartiles. The minimum and maximum values are illustrated by the lower and upper whiskers respectively. b) The proportion of genetic variations between any two samples out of the 22 genomes calculated based on pairwise genomic alignments. The white dot in the center of the violin plot represents the median value, and the bounds of each black box indicate first (25%) and third (75%) quartiles. The lower and upper bounds of the whiskers are the minima and maxima, respectively. c) Distribution of SNPs/Indels identified from the 736 re-sequencing samples and SVs identified from 22 genomes along 15 pseudo-chromosomes. The density of SNPs/indels is represented by the colored bands on these pseudo-chromosomes, while the red lines alongside those pseudo-chromosomes indicate the distribution of SVs. d-e) Pearson correlation coefficients, which show the comparisons between LTR or TIR count per window (10 Mbp window size, 5Mbp step) and SV count per window. For d and e, Total of 572 and 598 windows were plotted, respectively. The P-value was calculated by two-sided t-test.