Fig. 3: Genomic diversity and differential coverage of repeat classes used as diagnostic targets.

a, b, c, Repeat diversity was determined by searching the canonical repeat units using nucmer with 90% nucleotide similarity and 90% sequence coverage against the genome assemblies of (a) Ascaris lumbricoides, (b) Necator americanus and (c) Trichuris trichiura as a reference. In a, b, and c, the heatmaps show the pairwise distance calculated as the sum of squares of a nucleotide similarity matrix derived from ClustalOmega-aligned repeat sequences for each species, where lighter colour (white) on the colour scale reflect a stronger degree of similarity between two sequences. In d, e, and f, Genome coverage per repeat per country was determined by bedtools multicov (with minimum overlap 0.51) in merged-by-country BAM files (filtered raw reads > ten reads) against the genome assemblies of (d) A. lumbricoides, (e) N. americanus, and (f) T. trichiura. Coverage is expressed as ‘repeat copies,’ calculated by dividing the original repeat coverage by the mean per-country single copy exon coverage. The central box represents the interquartile range, and the whiskers represent the data’s first and third quartiles. The median is shown as a line through the centre of the box. The whiskers extend from the edges of the box to the smallest and largest values within 1.5 times the interquartile range (IQR) from Q1 and Q3, respectively. Only repeats containing both forward and reverse primers and probe binding sites were included. Source data are provided as a Source Data file.