Fig. 1: rDNA sequence variants across 918 S. cerevisiae isolates.
From: Varying strength of selection contributes to the intragenomic diversity of rRNA genes

a Variant annotation. Variants are defined as nucleotide differences from the S288c prototype (light blue). Intragenomic variant frequency (iVF) is the percentage of rDNA reads with a variant sequence within an isolate’s genome and is proportional to variant rDNA copies within a genome. Intragenomic variant frequency polymorphisms (iVFP) are shown in dark blue. One isolate can have multiple sequences at a position and thus the sum of the iVFPs can be more than the total number of isolates. b rDNA copy number distribution across isolates grouped by clade. Numbers in brackets indicate the number of isolates in each clade. The center line is the median, box limits are 25th and 75th percentiles, and the whiskers extend to ±1.5xIQR (interquartile range). Dots show isolates with values outside the whiskers. c iVFPs (dots) compared to the reference S288c rDNA prototype are plotted against one rDNA copy. Colors reflect different isolates (some isolates have same colors because of palette constraints). An rDNA copy is shown. The histogram above shows the number of iVFPs per position across all isolates. d Distribution of the number of non-fixed rDNA variants (present in a maximum of all but one rDNA copy in a genome; see “Methods”). Isolate “BAM” is indicated. e Summary statistics. Element length in rDNA—the relative length of each element in the rDNA; Polymorphic sites—the fraction of all variable sites; Variants—the number of observed variants and fraction of single-nucleotide variants (SNVs) and indels; iVFP—the number of iVFPs across all isolates and the fraction SNVs and indels. The numbers indicate absolute values. The adjusted element lengths are due to filtering (see “Methods”). f Distribution of iVFPs by iVF in NTS, ETS+ITS, and rRNA. g Same analysis as (f) but separated by ecological niche. Brackets—the number of isolates in each niche; the number of iVFPs is shown on the right of each distribution. The missing distributions in the Probiotic niche are due to the low number of iVFPs. Source data are provided as a Source data file.