Extended Data Fig. 3: Read mappings across genera in simulated ancient microbial data.
From: The spatiotemporal distribution of human pathogens in ancient Eurasia

a, Observed breadth of genomic coverage as a function of average read depth for distinct species hits (i.e., mappings with highest number of unique k-mers at species level for a genus; n ≥ 20 reads mapped). Each panel shows results for reads simulated from species indicated. Results for mappings against the simulated species are indicated by diamond shape, whereas mappings against species from other genera are indicated with circles. Symbol fill colour indicates average nucleotide identity for mapped reads (grey symbols ANI < 0.97). Solid black line shows theoretical expected breadth of coverage for a given average read depth. Vertical dashed line indicates 1X average read depth. b, Relative entropy statistic (1000 bp window size) as a function of average nucleotide identity. Blue diamonds indicate results for the mapping against reference genome from the same species as the simulated read data, whereas grey circles indicate reference genomes for species hits in other genera. Dashed lines indicate cutoffs used in analyses of real data (ANI ≥ 0.97, entropy ≥ 0.9). False positive hits of reads mapped to a reference genome from a different genome passing cutoffs and their final number of mapped reads (out of 5 million total simulated reads) are labelled. c, Illustration showing potential sources of false positive hits and expected results for authentication summary statistics. d, Matrix plot showing all microbial hits with n ≥ 20 reads mapped and their authentication statistics, for all simulated species and read numbers. Symbol colour and size indicates the number of replicates passing the cutoff for each of three summary statistics shown (ANI ≥ 0.97, ratio of observed/expected coverage breadth ≥ 0.8, entropy ≥ 0.9). Hits passing cutoffs for all three statistics are indicated with coloured outline and background lines (black - true positives; grey - cross-genus false positive mappings).