Fig. 1: Characterization of simulated data generated using models that allow multiple beneficial mutations. | Nature Communications

Fig. 1: Characterization of simulated data generated using models that allow multiple beneficial mutations.

From: Reply to: Population genetic considerations regarding the interpretation of within-patient SARS-CoV-2 polymorphism data

Fig. 1

The SLiM14 simulations of Soni et al.12. were modified to generate 100 whole-genome (30 kbp) samples for each of three distributions of mutational fitness effects (DFEs) based on Flynn et al.18 and Bloom & Neher19. Flynn et al.18 refers to a DFE background estimated for Mpro (nsp5), with either 1.0% (blue text and arrow) or 9.7% (green text and arrow) of mutations beneficial (selection coefficients [s] = 0.05–0.13). Bloom & Neher19 (grey arrow) refers to a DFE estimated from publicly available viral consensus sequence data, where the fractions of each mutation effect type were set to the whole-genome values given in Table 1 (bottom row). For the latter, s values were approximated by dividing fitness effects (range −7.14–6.17) by 7.14 (maximum absolute value), yielding a range of −1.0–0.86. These values were simulated as lethal = −1.0; deleterious = gamma (mean −0.32, shape 1.70); neutral = 0.0; and beneficial = exponential (mean 0.087). For the gamma distribution shape parameter, a maximum likelihood estimate was obtained from the absolute values of all negative s using the MASS::fitdistr() function in R. All other parameters were retained from the scripts of Soni et al.: mutation rate = 2.135 × 10−6 per site per cycle; recombination rate = 5.5 × 10−5 per site per cycle; infection bottleneck size = 1; carrying capacity = 100,000; runtime = 168 cycles (https://github.com/vivaksoni/Gu_etal_2023_response, accessed 2023/09/26). Simulated data were analyzed using the method of our original study11, i.e., eliminating iSNVs with frequency <2.5% and estimating πN – πS with codon-based bootstrapping. a DFEs for nonsynonymous mutations. Violin plots show the emergent s distributions of the three DFE models, each determined by simulating 10,000 mutations. b Nucleotide diversity under each DFE. Error bars show standard errors of mean πN (red) and πS (blue), each determined using 1,000 bootstrap replicates (codon unit, with codon values calculated as means across all 100 samples). P values refer to two-sided Z-tests of πN = πS (three tests; no adjustment for multiple tests). πN/πS ratios are displayed in grey text; for comparison, the mean empirical πN/πS value observed across all biological samples in our original study11 was 0.62. Scripts, analysis code, input data, and intermediate files are available at https://doi.org/10.5281/zenodo.10552831. Source data are provided as a Source Data file.

Back to article page