Abstract
Targeted amplicon sequencing is a powerful and efficient tool for interrogating the Plasmodium falciparum genome, generating actionable data from infections to complement traditional malaria epidemiology. For maximum impact, genomic tools should be multi-purpose, robust, sensitive, and reproducible. We developed, characterized, and implemented MAD4HatTeR, an amplicon sequencing panel based on Multiplex Amplicons for Drug, Diagnostic, Diversity, and Differentiation Haplotypes using Targeted Resequencing, along with a bioinformatic pipeline for data analysis. Additionally, we introduce an analytical approach to detect gene duplications and deletions from amplicon sequencing data. Laboratory control and field samples were used to demonstrate the panel’s high sensitivity and robustness. MAD4HatTeR targets 165 highly diverse loci, focusing on multiallelic microhaplotypes, key markers for drug and diagnostic resistance (including duplications and deletions), and CSP and potential vaccine targets. The panel can also detect non-falciparum Plasmodium species. MAD4HatTeR successfully generated data from low-parasite-density dried blood spot and mosquito midgut samples and detected minor alleles at within-sample allele frequencies as low as 1% with high specificity in high-parasite-density dried blood spot samples. Gene deletions and duplications were reliably detected in mono- and polyclonal controls. Data generated by MAD4HatTeR were highly reproducible across multiple laboratories. The successful implementation of MAD4HatTeR in five laboratories, including three in malaria-endemic African countries, showcases its feasibility and reproducibility in diverse settings. MAD4HatTeR is thus a powerful tool for research and a robust resource for malaria public health surveillance and control.
Similar content being viewed by others
Introduction
Effective control and eventual elimination of Plasmodium falciparum malaria hinge on the availability and integration of data to inform research and public health strategies. Genomics can augment traditional epidemiological surveillance by providing detailed genetic information about infections1. Molecular markers of drug and diagnostic resistance can guide the selection of antimalarials and diagnostics, respectively2,3,4,5. Vaccine target sequences may shed light on vaccine efficacy and identify evidence of selective pressure6. Measures of genetic variation can provide insights into transmission intensity, rate and origin(s) of importation, and granular details of local transmission7,8,9,10,11,12,13,14. Differentiation of infections as either recrudescent or reinfections is critical for measuring outcomes of therapeutic efficacy studies that are used to guide antimalarial use worldwide15,16,17,18. Furthermore, the contribution of non-falciparum species to malaria burden is poorly characterized and could complicate control and elimination efforts19.
To maximize public health and research utility, genomic methods should be robust and provide rich information from field samples, which may be low-density and are often polyclonal in malaria-endemic areas of sub-Saharan Africa13,20,21,22. While traditional genotyping methods of length polymorphisms and microsatellites can characterize malarial infections, they suffer from low sensitivity and specificity, and difficulties in protocol standardization23,24,25. Single nucleotide polymorphism (SNP) barcoding approaches have improved throughput, sensitivity and standardization26,27. However, the biallelic nature of most targeted SNPs limits their discriminatory power to compare polyclonal infections. Sequencing of short, highly variable regions within the genome containing multiple SNPs (microhaplotypes) provides multiallelic information that overcomes many of those limitations28. Microhaplotypes can be reconstructed from whole-genome sequencing (WGS) data or amplified by PCR and sequenced. Low abundance variants, especially in low-density samples, may be missed by WGS due to low depth of coverage. Amplicon sequencing offers much higher sensitivity and can target the most informative regions of the genome, increasing throughput and decreasing cost. Several Illumina-based multiplexed amplicon sequencing panels have been developed to genotype P. falciparum infections. SpotMalaria is a panel that genotypes 100 SNPs, most of which are biallelic, for drug resistance and diversity26. Pf AmpliSeq genotypes SNPs, currently focused on Peruvian genetic diversity, and also targets drug and diagnostic resistance markers27. Panels that target multiallelic microhaplotypes, including AMPLseq, provide greater resolution for evaluating polyclonal infections and also include drug resistance markers29,30. Nanopore-based amplicon panels enable the utilization of mobile sequencing platforms31,32,33. Thus, targeted amplicon sequencing is a flexible approach that has the potential to address multiple use cases. To fully realize this potential, a panel for research and public health would ideally include all necessary targets to answer a wide range of questions, while remaining modular to allow flexible allocation of sequencing resources.
Here, we developed MAD4HatTeR, an Illumina-compatible, multipurpose, modular tool based on Multiplex Amplicons for Drug, Diagnostic, Diversity, and Differentiation Haplotypes using Targeted Resequencing. MAD4HatTeR has 276 targets divided into two modules: A diversity module with 165 targets to assess genetic diversity and relatedness; and a resistance module consisting of 118 targets that cover 15 drug resistance-associated genes and assesses hrp2/3 deletions, along with current and potential vaccine targets. The modules also include targets for non-falciparum Plasmodium species identification. We developed a bioinformatic pipeline to report allelic data, and implemented laboratory and bioinformatic methods in several sites, including countries in malaria-endemic sub-Saharan Africa. We then evaluated the panel’s performance on various sample types, including mosquito midguts, and showed that high quality data can be consistently reproduced across laboratories, including from polyclonal samples with low parasite density.
Methods
Participating laboratories
We generated data in five sites: the EPPIcenter at the University of California San Francisco (UCSF), in collaboration with the Chan Zuckerberg Biohub San Francisco, California; Infectious Diseases Research Collaboration (IDRC) at Central Public Health Laboratories (CPHL), Kampala, Uganda; Centro de Investigação em Saúde de Manhiça (CISM), Manhiça, Mozambique; National Institutes for Communicable Diseases (NICD), Johannesburg, South Africa; and Barcelona Institute for Global Health (ISGlobal), Barcelona, Spain. The procedures are described according to the workflows in San Francisco. Minor variations, depending on equipment availability, were implemented at other institutions.
Amplicon panel design
We used available WGS data as of June 20213,30,34,35,36,37,38,39,40,41,42 to identify regions with multiple SNPs within windows of 150–300 bp that lay between tandem repeats, using a local haplotype reconstruction tool (Pathweaver43). We compiled a list of drug resistance-associated and immunity-related SNPs (Tables 1 and 2) and identified regions of 150–300 bp between tandem repeats in and around hrp2 and hrp3 to assess diagnostic resistance-related deletions, as well as a region in chromosome 11 that is often duplicated in hrp3-deleted samples44. Paragon Genomics, Inc. designed amplification primers in multiplexed PCR using the Pf3D7 genome as a reference and related Plasmodium species and human genomes to design primers specific for P. falciparum. Genome versions and their GeneBank or RefSeq accessions for each species are: P. falciparum Pf3D7 (version = 2020-09-01, GCA_000002765.3), P. vivax PvP01 (version = 2018-02-28, GCA_900093555.2), P. malariae PmUG01 (version = 2016-09-19, GCA_900090045.1), P. ovale PocGH01 (version = 2017-03-06, GCA_900090035.2), P. knowlesi PKNH (version = 2015-06-18, GCA_000006355.2) and Homo sapiens GRCh37 (GCA_000001405.14). In addition to the P. falciparum targets, we selected a target in the ldh gene (PF3D7_1325200) and its homologs in the other 4 Plasmodium species listed above for identification of concurrent infections with these species. To minimize PCR bias against longer amplicons, we restricted the design to amplicons of 225–275 bp, which can be covered with a significant overlap in paired-end sequencing in Illumina platforms with 300-cycle kits, except for targets around hrp3 that needed to be 295–300 bp long to design primers successfully. We excluded or redesigned primers that contained more than 1 SNP (including non-biallelic SNPs) or indels in available WGS data or aligned to tandem repeats. To increase coverage of SNPs close to each other, we allowed for overlap in amplicons that targeted drug resistance and immunity-related markers. Primers were grouped in modules, as outlined in the results section (Fig. 1 and Supplementary Table 1).
MAD4HatTer is a multi-purpose malaria amplicon sequencing panel. A. Primer pools to amplify targets in 5 categories are grouped into two modules (Diversity and Resistance). R1 refers to two primer pools: R1.1, the original pool, and R1.2, a reduced version of primer pool R1.1 designed to increase sensitivity. The recommended configuration to maximize information retrieval and sensitivity for low parasitemia samples are two mPCR reactions, one with D1 and R1.2 primers, and one with R2 primers (solid lines). Supplementary Tables 1–5 contain complete details on primer pools and targets. B. Chromosomal locations of all targets in the P. falciparum genome (not including non-falciparum targets). Note that the Diagnostic Resistance category includes targets in and around hrp2 and hrp3 as well as targets in chromosome 11 that are often duplicated when hrp3 is deleted44 and length controls in other chromosomes. C. Simplified workflow for library preparation and sequencing, highlighting the need for two multiplexed PCR reactions when using primer pools R1 and R2 which are incompatible due to tiling over some genes of interest. A more detailed scheme can be found in Supplementary Figure 1, and a full protocol, including didactic materials, can be found online78.
In silico panel performance calculations
Alleles were extracted from available WGS data as of July 20243,30,34,35,36,37,38,39,40,41,45. SNPs, and microhaplotypes were reconstructed using Pathweaver43for targets in MAD4HatTeR, SpotMalaria26, AMPLseq30, and AmpliSeq27. In silico heterozygosity was calculated using all allele calls in available WGS data. Principal coordinate analysis was performed on the binary distance matrix from presence/absence of alleles using alleles within loci present in both samples for each pair.
To assess statistical power of testing if two (potentially polyclonal) infections are related, we obtained within sample allele frequencies (WSAF) for the most variable SNP in each diversity target (165, 111 and 100 total SNPs for MAD4HatTeR, AMPLseq and SpotMalaria, respectively) or microhaplotypes (161, 128 and 135, respectively) from WGS data for each of the three panels, and simulated genotypes for mono- and polyclonal samples. In the simulations, complexity of infection (COI) were fixed and ranged from 1 to 5, and we included genotyping errors with a miss-and-split model46; missing and splitting parameters were 0.05 and 0.01, respectively. Between two samples, only a single pair of parasite strains was related with expected identity-by-descent (IBD) proportion varying from 1/16 to 1/2 (sibling level) to 1 (clones). We then analyzed these simulated datasets to obtain performance measures for combinations of a panel, COI, and a relatedness level: first, we estimated COI and allele frequencies using MOIRE47; we then used these to estimate pairwise interhost relatedness and test the hypothesis that two infections are unrelated at significance level of 0.05 with Dcifer46 and calculated power as the proportion of 1000 simulated pairs where the null hypothesis was correctly rejected.
Samples
We prepared control dried blood spots (DBS) using P. falciparum laboratory strains. We synchronized monocultures in the ring stage. We made polyclonal controls by mixing cultured strains (3D7, Dd2 MRA-156 and MRA-1255, D6, W2, D10, U659, FCR3, V1/S, and HB3), all synchronized and ring-staged at various proportions. We mixed all monocultures and mixtures with uninfected human blood and serially diluted them in blood to obtain a range of parasite densities (0.1–100,000 parasites/µL). We spotted 20 µL of the mixture on filter papers and stored them at − 20 ⁰C until processing.
Finger-prick DBS samples were collected in Northern Ethiopia between 2022 and 2023 as part of a mixed-methods study, which included a case-control study in two highland districts and cross-sectional surveys in one lowland district48. In the case-control study, samples were obtained from symptomatic patients presenting at health facilities. These included malaria cases - individuals who tested positive for P. falciparum and/or P. vivax using a rapid diagnostic test (Bioline Malaria Ag P.f/Pan by Abott, STANDARD Q Malaria P.f/P.v Ag by SD Biosensor, or First Response Malaria Ag. P.f./P.v. Card Test) – as well as test-negative controls who were later confirmed positive for malaria via qPCR. Cross-sectional surveys in lowland areas were conducted at agricultural worksites and in households in nearby villages. The DBS were air-dried, stored individually with desiccant, and kept at −20 °C until laboratory processing. No patient data collected during sampling was used in this analysis. We analyzed DNA extracts from samples from previous studies, including 26 field samples from Ethiopia known to carry deletions in the hrp2 and/or hrp3genes3, as well as 11 P. falciparum co-infections from Uganda containing P. malariae and P. ovale49. Finally, we analyzed publicly available data from 436 field samples from Mozambique22. The original works detail the sampling schemes and additional sample processing procedures. We used genomic DNA from P. knowlesi Strain H, obtained through BEI Resources, NIAID, NIH, contributed by Alan W. Thomas.
To assess performance of the assay for oocysts, we infected 9 Anopheles gambiae s.s. mosquitos via direct membrane feeding with blood taken from participants who were diagnosed with symptomatic malaria in transmission studies in Uganda. Briefly, mosquitos were fed via direct membrane feeding with blood taken from participants in a cohort study based in Nanongera and Busia districts (3 patients, 5 infected midguts) and from patients diagnosed with P. falciparum malaria at Masafu General Hospital in Busia district (4 patients, 4 infected midguts)50,51. P. falciparum presence in blood samples was confirmed by varATS qPCR. Oocysts were detected and quantified using mercurochrome staining and microscopy as previously described51.
Library preparation
We extracted DNA from control DBS and P. falciparum/P. vivax co-infections using the Chelex-Tween 20 method52. Mosquito midgut DNA was extracted from dissected midguts using the QIAGEN DNeasy blood and tissue DNA extraction kit as previously described53. P. falciparum parasite density was quantified in all samples, including midgut extracts, by varATS54or 18S55 qPCR using standards made from DBS spotted with serial dilutions of cultured P. falciparum in uninfected blood (Supplementary Text). P. vivax was quantified by 18S qPCR as previously described56.
Libraries were made with a minor adaptation of Paragon Genomics’ CleanPlex Custom NGS Panel Protocol57 (Supplementary Text). A version of the protocol containing any updates can be found at https://eppicenter.ucsf.edu/resources. Library pools were sequenced in Illumina MiSeq, MiniSeq, NextSeq 550, or NextSeq 2000 instruments with 150 paired-end reads. We tested different amplification cycles and primer pool configurations. Based on sensitivity and reproducibility, the following are the experimental conditions we use as a default: primer pools D1 + R1.2 + R2; 15 multiplexed PCR cycles for moderate to high parasite density samples (equivalent to ≥ 100 parasites/µL in DBS) and 20 cycles for samples with lower parasite density; 0.25X and 0.125X primer pool concentration, respectively.
Bioinformatic pipeline development and benchmarking
We developed a Nextflow-based58 bioinformatic pipeline to filter, demultiplex, and infer alleles from fastq files (Supplementary Text). Briefly, the pipeline uses cutadapt59 and DADA260 to demultiplex reads on a per-amplicon basis and infer alleles, respectively. The pipeline further processes DADA2 outputs to mask low-complexity regions, generate allele read count tables, and extract alleles in SNPs of interest. We developed custom code in Python and R to filter out low-abundance alleles and calculate summary statistics from the data. The current pipeline version, with more information on implementation and usage, can be found at www.github.com/EPPIcenter/mad4hatter.
We processed the data presented in this paper with release 0.1.8 of the pipeline.
We evaluated pipeline performance by estimating sensitivity (ability to identify expected alleles) and precision (ability to identify only expected alleles) from monoclonal and mixed laboratory controls with different proportions of strains (Supplementary Text). We tested the impact of multiple parameters and features on allele calling accuracy, including DADA2’s stringency threshold OMEGA_A and sample pooling treatment for allele recovery, masking homopolymers and tandem repeats, and post-processing filtering of low abundance alleles. Masking removed false positives with the trade-off of masking real biological variation. We obtained the highest precision and sensitivity using sample pseudo-pooling, highly stringent OMEGA_A (10−120), and a moderate postprocessing filtering threshold (minor alleles of > 0.75%). These results indicate that bioinformatic processing of MAD4HatTeR data can be optimized to retrieve accurate sample composition with a detection limit of approximately 0.75% WSAF.
For analyses of allelic data from mixed controls, only samples with ≥ 90% of targets with > 50 reads (183 for diversity, and 165 for drug resistance markers) were included in the analysis. For drug resistance markers, only SNPs with variation between controls were included (20/91 codons from 12/22 targets). Within a sample, targets with less than 100 reads were excluded as alleles with a minor WSAF of 1% are very likely to be missed. The large majority of controls (122/183 and 162/165 for diversity and drug resistance markers, respectively) had very good coverage (at most 2 missing loci).
Species-specific ldh targets in the panel were used to identify non-falciparum species. Targets with less than 5 reads were filtered out. P. ovale ldh target sequences were extracted from the P. ovale curtisi (PocGH01, GCA_900090035.2) or P. ovale wallikeri (PowCR01, GCA_900090025.2) genomes using target primer sequences. Observed sequences were then aligned to these reference sequences using BLAST. Heterozygosity was estimated using MOIRE47 version 3.2.0.
Deletions and duplications
We used the following laboratory strains to benchmark deletion and duplication detection using MAD4HatTeR data: hrp2 deletions in Dd2 and D10, mdr1 duplications in Dd2 and FCR3, hrp3 deletion in HB3, and hrp3 duplication in FCR344. We also used a set of field samples from Ethiopia previously shown to have deletions in and around hrp2 and hrp3 at multiple genomic breakpoints3. For sensitivity analysis using field samples, we estimated COI using MOIRE47 and excluded polyclonal samples due to the uncertainty in their true genotypes. Two field samples were excluded from the analysis due to discordance in breakpoint classification, possibly due to sample mislabeling and sequencing depth, respectively.
We applied a generalized additive model (Supplementary Text) to account for target length amplification bias and differences in coverage across primer pools, likely due to pipetting error. We fit the model on controls known not to have deletions or duplications to obtain correction factors for targets of interest within sample batches. We then estimated read depth fold changes from data for each gene of interest (hrp2, hrp3 and mdr1). We did not have sufficient data to validate duplications in plasmepsin 2 and 3.
For a subset of laboratory controls copy numbers were determined by qPCR using previously described methods for mdr161, hrp2, and hrp362.
Results
MAD4HatTeR is a multi-purpose tool that exploits P. falciparum genetic diversity
We designed primers to amplify 276 targets (Fig. 1, Supplementary Tables 1–4) and separated them into two modules: (1) Diversity module, a primer pool (D1) targeting 165 high diversity targets and the ldh gene in P. falciparum and in 4 non-falciparum Plasmodium species (P. vivax, P. malariae, P. ovale, and P. knowlesi); and (2) Resistance module, comprised of two complementary and incompatible primer pools (R1 and R2) targeting 118 loci that genotype 15 drug resistance-associated genes (Table 1) along with CSP and potential vaccine targets (Table 2), assess for hrp2/3 deletion, and identify non-falciparum species. The protocol involves two initial multiplex PCR reactions, one with D1 and R1 primers, and another with R2 primers (Fig. 1C, Supplementary Fig. 1). After multiplexed PCR, subsequent reactions continue in a single tube.
Based on publicly available WGS data, P. falciparum targets in the diversity module, excluding ldh, had a median of 3 SNPs or indels (interquartile range [IQR] 2–5, N= 165, Supplementary Table 5). Most (140/165) targets were microhaplotypes (containing > 1 SNP or indel). Global heterozygosity was high, with 35 targets with heterozygosity > 0.75 and 135 with heterozygosity > 0.5. Within African samples, heterozygosity was > 0.75 in 40 targets, > 0.5 in 132 targets, and we observed 2 to 20 unique alleles (median of 5, across a minimum of 3617 samples) in each target. MAD4HatTeR included more high-heterozygosity targets than other published panels (Fig. 2A, Supplementary Figs. 2–3). Additionally, MAD4HatTeR targets better resolved geographical structure globally, within Africa, and even within a country63 (Fig. 2B).
In silico analysis demonstrates that MAD4HatTeR’s microhaplotypes capture high genetic diversity within African samples. We reconstructed alleles (microhaplotypes) from publicly available WGS data to estimate genetic diversity. For SpotMalaria, SNP barcodes are used instead of microhaplotypes based on intended design and current usage. We note that additional information may be present within the amplified targets if microhaplotype sequences are accurately identifiable with appropriate bioinformatic processing. As such, alternate results for microhaplotypes reconstructed for the targets that contain the SNPs in SpotMalaria are shown in Supplementary Fig. 2. A. Diversity module pool D1 includes more highly heterozygous targets than other published highly multiplexed panels. Only targets for diversity in each panel are included and heterozygosity is calculated for samples across Africa. B. We performed principal coordinate analysis on alleles on global, African or Mozambican data. The percentage of variance explained by each principal component is indicated in parentheses.
We next evaluated the power of the diversity module to detect interhost relatedness between parasites in pairs of simulated infections with COI ranging from 1 to 5. We selected one country from each of three continents with the most publicly available WGS data and used reconstructed genotypes for the analysis (Fig. 3). MAD4HatTeR identified partially related parasites between polyclonal infections across a range of COI and geographic regions, and generally performed as well or better than the other panels evaluated. For example, in simulated Ghanaian infections sibling parasites (IBD proportion, r = 1/2) were reliably detected with COI of 5 (82% power), half siblings (r = 1/4) in infections with COI of 3 (73% power), and less related parasites (r = 1/8) were still identifiable with COI of 2 (53% power). When using independent SNPs instead of microhaplotypes, the power to identify related parasites between infections was much lower, irrespective of the panel. Constraining the panel to the 50 targets with the highest heterozygosity (mean heterozygosity of 0.8 ± 0.05) reduced the power to infer relatedness by as much as 50%, highlighting the value of highly multiplexed microhaplotype panels for statistical power.
Power to identify relatedness of strains between infections is enhanced by highly multiplexed microhaplotypes. Simulated infections using population allele frequencies from available WGS data were used to estimate the power of testing if a pair of strains between infections is related. Countries in each of three continents with the most available WGS data were selected. Infections were simulated for a range of COI. Only one pair of strains between the infections was related with a given expected IBD proportion (r). The results were compared for reconstructed microhaplotypes and their most highly variable SNP for 3 panels (MAD4HatTeR, SpotMalaria and AMPLseq). Note that SpotMalaria bioinformatics pipeline outputs a 100 SNP barcode, and thus its actual power (dark orange) is not reflective of the potential power afforded by microhaplotypes (light orange). Additionally, the 50 most diverse microhaplotypes and their corresponding SNPs were used to evaluate the effect of down-sizing MAD4HatTeR (MAD4HatTeR50).
MAD4HatTeR allows for genotyping of a variety of sample types and parasite densities
We evaluated MAD4HatTeR’s performance using dried blood spots (DBS) containing up to 7 different cultured laboratory strains each. Sequencing depth was lower for samples amplified with the original resistance R1 primer pool R1.1 than D1 (Supplementary Fig. 4A), and primer dimers comprised 58–98% of the reads for R1.1 compared to only 0.1–4% for D1. We thus designed pool R1.2, a subset of targets from R1.1, by selecting the targets with priority public health applications and discarding the primers that accounted for a significant portion of primer dimers in generated data (Fig. 1, Supplementary Table 2). Libraries prepared with pools containing R1.2 instead of R1.1 showed higher depth across the range of parasitemia evaluated (Supplementary Fig. 4B). With the recommended set of primer pools (D1, R1.2, and R2), sequencing provided > 100 reads for most amplicons from DBS with > 10 parasites/µL, with depth of coverage increasing with higher parasite densities (Fig. 4A). Samples with < 10 parasites/µL still yielded data albeit less reliably. Approximately 100,000 total unfiltered reads (the output of sample demultiplexing from a sequencing run) were sufficient to get good coverage across targets; on average, 95% of targets had at least 100 reads, and 98% had at least 10 reads (Supplementary Fig. 4C, D). While results indicate that the protocol provides consistently robust results, different experimental parameters may be optimal for different combinations of primer pools and sample concentration.
MAD4HatTeR produces reproducible and sensitive genetic data from a variety of samples. A. Mean read counts for each target in DBS controls (N in parenthesis in x-axis labels for each parasitemia). B. Proportion of targets with > 10 reads in DBS controls with 1 and 10 parasites/µL and 9 midgut samples (median parasite density equivalent to 0.9 parasites/µL in a DBS). 10 targets that generally do not amplify well (> 275 bp) were excluded. C–D. Recovery within-sample allele frequency (WSAF) in the diversity module for 161 loci across 183 samples (C), and biallelic SNPs in drug resistance markers across 20 codons in 165 samples (D). E. Observed WSAF in laboratory mixed controls of known expected WSAF. F. WSAF observed in libraries prepared and sequenced in different laboratories from the same DBS mixed control. Participating laboratories are the EPPIcenter at the University of California San Francisco (UCSF); Infectious Diseases Research Collaboration (IDRC), Uganda; Centro de Investigação em Saúde de Manhiça (CISM), Mozambique; National Institutes for Communicable Diseases (NICD), South Africa; and Barcelona Institute for Global Health (ISG), Spain. G. Observed heterozygosity in field samples from Mozambique22 and the respective expected heterozygosity for each target obtained from available WGS data (which does not include the MAD4HatTeR-sequenced field samples). False positives are excluded from C–G, as are targets with < 100 reads, except in E.
Depth of coverage per amplicon was highly correlated within technical replicates (Supplementary Fig. 5A) with most deviations observed between primer pools. Importantly, coverage was also reproducible when the same samples were tested across five laboratories on 3 continents, with minor quantitative but negligent qualitative differences in coverage (Supplementary Fig. 5B). Amplicon coverage was well balanced within a given sample, with differences in depth negatively associated with amplicon length (Supplementary Fig. 6). Nine of the 15 worst-performing amplicons were particularly long (> 297 bp, Supplementary Table 6). The other worst-performing amplicons covered drug resistance markers in mdr1 and crt (neither covering mdr1 N86Y or crt K76T), 2 high heterozygosity targets, and a target within hrp2. These results indicate that robust coverage of the vast majority of targets can be consistently obtained from different laboratories.
Given the high sensitivity of the method, we evaluated the ability of MAD4HatTeR to generate data from sample types where it is traditionally challenging to obtain high quality parasite sequence data. We amplified DNA extracted from nine infected mosquito midguts with a median P. falciparum DNA concentration equivalent to 0.9 parasites/µL from a DBS. On average, 58% of amplicons had ≥ 100 reads, 84% had ≥ 10 reads, and only one sample did not amplify (Fig. 4B). These results are comparable to libraries from DBS controls with 1–10 parasites/µL from the same sequencing run, where 45–77% of amplicons with ≥ 100 reads. WSAF indicated that some of the mosquito midguts contained several genetically distinct P. falciparum clones. These data show the potential for applying MAD4HatTeR to study a variety of sample types containing P. falciparum.
MAD4HatTeR reproducibly detects genetic diversity, including for minority alleles in low density, polyclonal samples
We used DBS controls containing 2 to 7 laboratory P. falciparum strains with minor WSAF ranging from 1 to 50% to evaluate sensitivity of detection and accuracy of WSAF (Supplementary Tables 7–8). We optimized and benchmarked the bioinformatic pipeline to maximize sensitivity and precision using the diversity pool D1, which included masking regions of low complexity (tandem repeats and homopolymers) to avoid capturing PCR and sequencing errors in allele calls (Supplementary Text, Supplementary Fig. 7). Sensitivity to detect minority alleles given that the locus amplified was very high, with alleles present at ≥ 2% reliably detected in samples with > 1,000 parasites/µL and at ≥ 5% in samples with > 10 parasites/µL (Fig. 4C). For very low parasitemia samples (< 10 parasites/µL), sensitivity was still 82% for alleles expected at 10% or higher. Similar results were obtained for drug resistance markers targeted by pools R1.2 and R2 (Fig. 4D). Overall precision (reflecting the absence of spurious alleles) was also high and could be increased by using a filtering threshold for minimum WSAF. Each sample had a median of 3 false positive alleles (mean = 4.4, N = 161 targets) above 0.75% WSAF, a median of 1 (mean = 2.5) false positives over 2%, and a median of 0 (mean = 0.7) over 5% (Supplementary Fig. 8). A strong correlation between expected and observed WSAF was observed in the diversity module targets at all parasite densities and was stronger at higher parasite densities (R2 = 0.99 for > 1,000 parasites/µL Fig. 4E).
Reproducibility is an important feature in generating useful data, particularly given differences in equipment and technique that often exists between laboratories. To evaluate this potential source of variation, we generated data for the same mixed-strain controls in five different laboratories on three continents. Reassuringly, the alleles obtained, along with their WSAF, were highly correlated (Fig. 4F). Missed alleles in one or more laboratories were mostly present at < 2% within a sample. Finally, we tested MAD4HatTeR’s ability to recover expected diversity in field samples. Observed genetic heterozygosity in samples from Mozambique22 was correlated with expected heterozygosity based on available WGS data (Fig. 4G, Supplementary Fig. 9). These results highlight the reliability of MAD4HatTeR as a method to generate high quality genetic diversity data across laboratories.
MAD4HatTeR provides data on copy number variations and detection of non-P. falciparum species
In addition to detecting sequence variation in P. falciparum, amplicon sequencing data can be used to detect gene deletions and duplications, as well as the presence of other Plasmodium species. We tested the ability of MAD4HatTeR to detect hrp2 and hrp3 deletions, and mdr1 and hrp3 duplications (laboratory strain FCR3 has a duplication in hrp344) in DBS controls consisting of one or two laboratory strains, and field samples with previously known genotypes. We applied a generalized additive model to normalize read depth and estimate fold change across several targets per gene, accounting for amplicon length bias and pool imbalances, after using laboratory controls to account for batch effects, e.g. running the assay in different laboratories (Fig. 5A, Supplementary Fig. 10). The resulting depth fold changes for all loci assayed correlated with the expected sample composition (Fig. 5B). At 95% specificity, sensitivity was 100% for all controls composed of > 95% strains with duplications or deletions (Fig. 5C). Sensitivity was lower for samples with lower relative abundance of strains carrying duplications or deletions, although this could be increased with a tradeoff in specificity (e.g. if used as a screening test). Fold change data correlated well with quantification by qPCR, indicating that the data obtained from MAD4HatTeR are at a minimum semi-quantitative (Fig. 5D). We could also correctly detect deletions in field samples from Ethiopia previously shown to be hrp2- or hrp3-deleted3, and correctly classify the genomic breakpoint profiles within the resolution offered by the targets included (Supplementary Fig. 11). Finally, we detected P. malariae and P. ovale in 11 samples from Uganda known to contain the corresponding species, as previously determined by microscopy or nested PCR. We could distinguish P. ovale wallikeri from P. ovale curtisi based on the alleles in the target sequence. The assay’s sensitivity for detecting non-falciparum species was evaluated using a set of field samples from Ethiopia containing P. falciparum and P. vivax, with known parasite density for both species. Sensitivity depended on the P. falciparum to P. vivax ratio within the sample and was estimated at 96% for samples with more than 100 P. vivax 18S copies/µL (N = 148) and 90% for those with more than 10 18S copies/µL (N = 170) for samples with a P. falciparum to P. vivax ratio below 100 (Supplementary Fig. 12). Furthermore, the P. falciparum to P. vivax ratio estimates obtained by qPCR and MAD4HatTeR were highly correlated. Specificity was 100% for all non-falciparum species, based on P. falciparum controls (N = 368). These data highlight the potential of MAD4HatTeR to capture non-SNP genetic variation and to characterize mixed species infections.
MAD4HatTeR can be used to screen for deletions and duplications. A. Technical replicates of Dd2 (a strain with hrp2 deletion and mdr1 duplication) with similar total reads were used to estimate fold changes in targets in and around hrp2, hrp3, mdr1 and plasmepsin2/3 (pm). A generalized additive model (black line) was applied to raw reads (Supplementary Figure 10) after correction by a control known not to have deletions or duplications in the genes of interest (3D7) to estimate fold changes in each of the genes. Note that there are two groups of hrp2 targets, those that are deleted in field samples (hrp2) and those also deleted in Dd2 (hrp2Dd2). Mean reads and fold changes are shown (N = 3); error bars denote standard deviation. B. Estimated fold change for hrp2, hrp3, and mdr1 loci in laboratory controls containing 1 or more strains at known proportions, or in field samples from Ethiopia3 with known hrp2 and hrp3 deletions. Sample composition is estimated as the effective number of copies present in the sample based on the relative proportion of the strain carrying a deletion or duplication. Fold changes are obtained using the targets highlighted in A. Fold changes for Dd2-specific targets are shown in Supplementary Figure 11. Linear regression and R2 values were calculated with data with parasitemia> 10 parasites/μL. The thresholds used to flag a sample as containing a duplication or deletion are shown in dashed black lines. C. Sensitivity in detecting hrp2 and hrp3 deletions and mdr1 duplications in controls, and field samples from Ethiopia with known hrp2 and hrp3 deletions. Effective sample composition (copies in sample) is estimated as in B. Sensitivity was calculated using a threshold to classify samples with 95% specificity. Note that the small number of samples in the 0.05–0.5 copies range may be responsible for the paradoxical lower sensitivity for higher parasitemia samples. D. Estimated fold change for each gene correlates with qPCR quantification for the same samples.
Discussion
In this study, we developed, characterized and deployed a robust and versatile method to generate sequence data for P. falciparum malaria genomic epidemiology, prioritizing information for public health decision-making. The modular MAD4HatTeR amplicon sequencing panel produces high-resolution data on genetic diversity, key markers for drug and diagnostic resistance, the C-terminal domain of the CSP vaccine target, and presence of other Plasmodium species. MAD4HatTeR is highly sensitive, providing data for low parasite density DBS samples and detecting minor alleles at WSAF as low as 1% with good specificity in high parasite density samples; challenging sample types such as infected mosquitos were also successfully amplified. MAD4HatTeR has successfully generated data from field samples from Mozambique and Ethiopia, with particularly good recovery rates for samples with > 10 parasites/µL (~ 90%)22,]64. Deletions and duplications were reliably detected in mono- and polyclonal controls. The data generated by MAD4HatTeR are highly reproducible and have been reliably produced in multiple laboratories, including several in malaria-endemic countries. Thus, MAD4HatTeR is a valuable tool for malaria surveillance and research, offering policymakers and researchers an efficient means of generating useful data.
The 165 diversity and differentiation targets in MAD4HatTeR, of which the majority are microhaplotypes, can be used to accurately estimate within-host and population genetic diversity, and relatedness between infections. These data have promising applications: evaluating transmission patterns, e.g. to investigate outbreaks3; characterizing transmission intensity, e.g. to evaluate interventions10,13,65 or surveillance strategies22; classifying infections in low transmission areas as imported or local11,66; or classifying recurrent infections in antimalarial therapeutic efficacy studies as recrudescence or reinfections18. The high diversity captured by the current microhaplotypes could be further improved with updated WGS data to replace targets with relatively low diversity and amplification efficiency. Fully leveraging the information content of these diverse loci, which are particularly useful for evaluating polyclonal infections, requires bioinformatic pipelines able to accurately call microhaplotype alleles and downstream analysis methods able to incorporate these multi-allelic data. While some targeted sequencing methods and pipelines similarly produce microhaplotype data30,32,67,68,69, others only report individual SNPs, resulting in the loss of potentially informative data26,27 encoded in phased amplicon sequences. Many downstream analysis tools are similarly limited to evaluating data from binary SNPs70,71,72. Fortunately, methods to utilize these data are beginning to be developed, providing statistically grounded estimates of fundamental quantities such as population allele frequencies, COI47, and IBD46, and highlighting gains in accuracy and power provided by analysis of numerous highly diverse loci.
Multiple targeted sequencing tools designed with different use cases and geographies in mind are being used, raising questions about data compatibility. Comparing diversity metrics from data generated using different target sets is feasible, provided that the panels have equivalent performance characteristics and that the analysis methods appropriately account for differences such as allelic diversity47. Comparing genetic relatedness between infections evaluated with different panels, however, is limited to common loci. Over 25% of SNPs targeted by AMPLseq or SpotMalaria diversity targets were intentionally included in MAD4HatTeR. Other panels have less or no overlap27,67,69 (Supplementary Tables 9–10). Efforts to increase overlap between future versions of amplicon panels would facilitate more direct comparison of relatedness between infections genotyped by different panels.
MAD4HatTeR genotypes several key drug resistance markers as well as vaccine targets. The primer pool configuration recommended for optimized sensitivity covers markers of resistance to artemisinin, artemisinin-based combination therapy partner drugs, and other drugs used in treatment, chemoprevention, and other interventions. Additionally, it targets the C-terminal domain of CSP, present in the RTS,S and R21 malaria vaccines currently recommended for use in children living in areas with moderate to high malaria transmission73,74,75. Other drug resistance markers and vaccine targets can be genotyped in high parasite density samples using the full primer pool configuration. Nevertheless, primer design and target prioritization have necessitated some exclusions. For example, the central repeat region of CSP, also targeted by the RTS,S and R21 vaccines, is not covered. Future iterations of MAD4HatTeR should aim to include additional targets, such as evolving drug resistance markers and candidate vaccine targets.
Depth of coverage and amplification biases were reproducible across samples, with most deviations likely due pipetting volume differences and systematic differences in laboratory equipment and reagent batches. Detection of hrp2/3 deletions and mdr1 duplications was achieved by applying a model that accounts for these factors. MAD4HatTeR detected deletions and duplications in mono- and polyclonal samples, even at low parasitemia. Additional data and analytical developments could improve MAD4HatTeR’s performance in deletion and duplication analysis. The current approach does not make use of COI estimates for inference and relies on controls known not to have duplications or deletions in the target genes within each library preparation batch. While target retrieval was generally uniform, some samples showed target drop-off, indicating the need for multiple targets to avoid falsely calling a deletion. Nonetheless, in its current form, MAD4HatTeR serves as an efficient screening tool for identifying putative duplications and deletions, which can then be validated with gold-standard methodologies.
Continuous improvement of the allele-calling bioinformatic pipeline is planned to increase accuracy and usability. Masking of error-prone regions (e.g. homopolymers and tandem repeats) is useful in reducing common PCR and sequencing errors, but it also removes biological variation. This can be optimized by tailored masking of error hotspots, rather than uniformly masking all low-diversity sequences. To improve the detection of low-abundance alleles, we currently conduct a second inference round using alleles observed within a run as priors, but this approach may also increase the risk of incorporating low-level contaminant reads. Improvements in experimental strategies to detect and prevent cross-contamination76, along with post-processing filtering, could mitigate this. Additionally, curating an evolving allele database from ongoing empiric data generation could replace the run-dependent priors, thereby improving the accuracy and consistency of allele inference.
Integrating genomics into routine surveillance and developing genomic capacity in research and public health institutions in malaria-endemic countries is facilitated by efficient, cost-effective, reliable and accessible tools. MAD4HatTeR is based on a commercially available method for multiplexed amplicon sequencing77. As such, while primer sequences are publicly available (Supplementary Table 2), reagents are proprietary. However, procuring bundled, quality-controlled reagents to generate libraries is straightforward, including for laboratories in malaria endemic settings. Procurement costs for laboratory supplies often vary significantly, making direct comparisons with other methods challenging, but we have found the method to be cost-effective compared with other methods. At the time of writing, the list price for all library preparation reagents, excluding plastics, consumables used for other steps (e.g. DNA extraction), sequencing costs, taxes, or handling, was $12–25 per reaction, depending on order volume. Sequencing costs can vary considerably based on the scale of sequencer used. For optimal throughput, we recommend multiplexing up to 96 samples using a MiSeq v2 kit to achieve results comparable to those shown here; much greater efficiency can be obtained with higher throughput sequencers.
This study includes data from five laboratories, three of which are located in sub-Saharan Africa. Beyond this study, MAD4HatTeR is also being used by four other African laboratories for applications ranging from estimating the prevalence of resistance-mediating mutations to characterizing transmission networks. Expertise and computational infrastructure for advanced bioinformatics and data analysis remains a challenge, with fewer users demonstrating autonomy in these areas compared to wet lab procedures. The robustness of the method, along with detailed training activities and materials (available online78), has facilitated easier implementation. Future developments could also expand accessibility, including adaptations for other sequencing platforms and panels targeting a smaller set of key loci for public health decision-making.
In summary, MAD4HatTeR is a powerful and fit-for-purpose addition to the malaria genomic epidemiology toolbox, well-suited for a wide range of surveillance and research applications.
Data availability
All data are available in the Sequencing Read Archive, accession code PRJNA1180199. Code is available in GitHub (https://github.com/EPPIcenter/mad4hatter and https://github.com/andres-ad/madh_utilities).
Abbreviations
- MAD4HatTeR:
-
Multiplex Amplicons for Drug, Diagnostic, Diversity, and Differentiation Haplotypes using Targeted Resequencing
- SNP:
-
Single nucleotide polymorphism
- WGS:
-
Whole-genome sequencing
- COI:
-
Complexity of Infection
- IBD:
-
Identity-by-descent
- DBS:
-
Dried blood spot
- WSAF:
-
Within sample allele frequency
References
Dalmat, R., Naughton, B., Kwan-Gett, T. S., Slyker, J. & Stuckey, E. M. Use cases for genetic epidemiology in malaria elimination. Malar. J. 18, 163 (2019).
Hamilton, W. L. et al. Evolution and expansion of multidrug-resistant malaria in Southeast Asia: a genomic epidemiology study. Lancet Infect. Dis. 19, 943–951 (2019).
Feleke, S. M. et al. Plasmodium falciparum is evolving to escape malaria rapid diagnostic tests in Ethiopia. Nat. Microbiol. 6, 1289–1299 (2021).
Ndwiga, L. et al. A review of the frequencies of Plasmodium falciparum Kelch 13 Artemisinin resistance mutations in Africa. Int. J. Parasitol. Drugs Drug Resist. 16, 155–161 (2021).
Rosenthal, P. J. et al. Cooperation in countering Artemisinin resistance in Africa: learning from COVID-19. Am. J. Trop. Med. Hyg. 106, 1568–1570 (2022).
Neafsey Daniel, E. et al. Genetic diversity and protective efficacy of the RTS,S/AS01 malaria vaccine. N Engl. J. Med. 373, 2025–2037 (2015).
Wesolowski, A. et al. Mapping malaria by combining parasite genomic and epidemiologic data. BMC Med. 16, 190 (2018).
Tessema, S. et al. Using parasite genetic and human mobility data to infer local and cross-border malaria connectivity in Southern Africa. eLife 8, e43510 (2019).
Tessema, S. K. et al. Applying next-generation sequencing to track falciparum malaria in sub-Saharan Africa. Malar. J. 18, 268 (2019).
Watson, O. J. et al. Evaluating the performance of malaria genetics for inferring changes in transmission intensity using transmission modeling. Mol. Biol. Evol. 38, 274–289 (2021).
Daniels, R. F. et al. Genetic evidence for imported malaria and local transmission in Richard toll, Senegal. Malar. J. 19, 276 (2020).
Mensah, B. A., Akyea-Bobi, N. E. & Ghansah, A. Genomic approaches for monitoring transmission dynamics of malaria: A case for malaria molecular surveillance in Sub–Saharan Africa. Front. Epidemiol. 2, 939291 (2022).
Schaffner, S. F. et al. Malaria surveillance reveals parasite relatedness, signatures of selection, and correlates of transmission across Senegal. Nat. Commun. 14, 7268 (2023).
Fola, A. A. et al. Temporal and Spatial analysis of Plasmodium falciparum genomics reveals patterns of parasite connectivity in a low-transmission district in Southern Province, Zambia. Malar. J. 22, 208 (2023).
Yeka, A. et al. Comparative efficacy of Artemether-Lumefantrine and Dihydroartemisinin-Piperaquine for the treatment of uncomplicated malaria in Ugandan children. J. Infect. Dis. 219, 1112–1120 (2019).
Snounou, G. & Beck, H. P. The use of PCR genotyping in the assessment of recrudescence or reinfection after antimalarial drug treatment. Parasitol. Today. 14, 462–467 (1998).
Uwimana, A. et al. Association of Plasmodium falciparum kelch13 R561H genotypes with delayed parasite clearance in Rwanda: an open-label, single-arm, multicentre, therapeutic efficacy study. Lancet Infect. Dis. 21, 1120–1128 (2021).
Schnoz, A. et al. Genotyping methods to distinguish Plasmodium falciparum recrudescence from new infection for the assessment of antimalarial drug efficacy: an observational, single-centre, comparison study. Lancet Microbe 5, 100914 (2024).
Lover, A. A., Baird, J. K., Gosling, R. & Price, R. N. Malaria elimination: time to target all species. Am. J. Trop. Med. Hyg. 99, 17–23 (2018).
Mwesigwa, A. et al. Plasmodium falciparum genetic diversity and multiplicity of infection based on msp-1, msp-2, glurp and microsatellite genetic markers in sub-Saharan Africa: a systematic review and meta-analysis. Malar. J. 23, 97 (2024).
Briggs, J. et al. Within-household clustering of genetically related Plasmodium falciparum infections in a moderate transmission area of Uganda. Malar. J. 20, 68 (2021).
Brokhattingen, N. et al. Genomic malaria surveillance of antenatal care users detects reduced transmission following elimination interventions in Mozambique. Nat. Commun. 15, 2402 (2024).
Viriyakosol, S. et al. Genotyping of Plasmodium falciparum isolates by the polymerase chain reaction and potential uses in epidemiological studies. Bull. World Health Organ. 73, 85–95 (1995).
Anderson, T. J. C., Su, X. Z., Bockarie, M., Lagog, M. & Day, K. P. Twelve microsatellite markers for characterization of Plasmodium falciparum from finger-prick blood samples. Parasitology 119, 113–125 (1999).
Anderson, T. J. C. et al. Microsatellite markers reveal a spectrum of population structures in the malaria parasite Plasmodium falciparum. Mol. Biol. Evol. 17, 1467–1482 (2000).
Jacob, C. G. et al. Genetic surveillance in the Greater Mekong subregion and South Asia to support malaria control and elimination. eLife 10, e62997 (2021).
Kattenberg, J. H. et al. Molecular surveillance of malaria using the PF ampliseq custom assay for Plasmodium falciparum parasites from dried blood spot DNA isolates from Peru. Bio-Protoc 13, e4621 (2023).
Taylor, A. R., Jacob, P. E., Neafsey, D. E. & Buckee, C. O. Estimating Relatedness between Malar. Parasites Genet. 212, 1337–1351 (2019).
Tessema, S. K. et al. Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. J. Infect. Dis. 225, 1227–1237 (2022).
LaVerriere, E. et al. Design and implementation of multiplexed amplicon sequencing panels to serve genomic epidemiology of infectious disease: A malaria case study. Mol. Ecol. Resour. 22, 2285–2303 (2022).
de Cesare, M. et al. Flexible and cost-effective genomic surveillance of P. falciparum malaria with targeted nanopore sequencing. Nat. Commun. 15, 1413 (2024).
Holzschuh, A. et al. Using a mobile nanopore sequencing lab for end-to-end genomic surveillance of Plasmodium falciparum: A feasibility study. PLOS Glob Public. Health. 4, e0002743 (2024).
Girgis, S. T. et al. Drug resistance and vaccine target surveillance of Plasmodium falciparum using nanopore sequencing in Ghana. Nat. Microbiol. 8, 2365–2377 (2023).
Melnikov, A. et al. Hybrid selection for sequencing pathogen genomes from clinical samples. Genome Biol. 12, R73 (2011).
Villena, F. E., Lizewski, S. E., Joya, C. A. & Valdivia, H. O. Population genomics and evidence of clonal replacement of Plasmodium falciparum in the Peruvian Amazon. Sci. Rep. 11, 21212 (2021).
Mathieu, L. C. et al. Local emergence in Amazonia of Plasmodium falciparum k13 C580Y mutants associated with in vitro Artemisinin resistance. eLife 9, e51015 (2020).
Cerqueira, G. C. et al. Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging Artemisinin resistance. Genome Biol. 18, 78 (2017).
Parobek, C. M. et al. Partner-Drug resistance and population substructuring of Artemisinin-Resistant Plasmodium falciparum in Cambodia. Genome Biol. Evol. 9, 1673–1686 (2017).
Pelleau, S. et al. Adaptive evolution of malaria parasites in French Guiana: reversal of chloroquine resistance by acquisition of a mutation in Pfcrt. Proc. Natl. Acad. Sci. 112, 11672–11677 (2015).
Dara, A. et al. New var reconstruction algorithm exposes high var sequence diversity in a single geographic location in Mali. Genome Med. 9, 30 (2017).
Tvedte, E. S. et al. Evaluation of a high-throughput, cost-effective illumina library Preparation kit. Sci. Rep. 11, 15925 (2021).
Ahouidi, A. & Ali, M. An open dataset of Plasmodium falciparum genome variation in 7,000 worldwide samples. Wellcome Open. Res. 6, 42 (2021).
Hathaway, N. A suite of computational tools to interrogate sequence data with local haplotype analysis within complex Plasmodium infections and other microbial mixtures. (2018). https://doi.org/10.13028/M2039K
Hathaway, N. J. et al. Interchromosomal segmental duplication drives translocation and loss of P. falciparum histidine-rich protein 3. eLife 13, (2024).
MalariaGEN et al. Pf7: an open dataset of Plasmodium falciparum genome variation in 20,000 worldwide samples. Wellcome Open. Res. 8, 22 (2023).
Gerlovina, I., Gerlovin, B., Rodríguez-Barraquer, I. & Greenhouse, B. Dcifer: an IBD-based method to calculate genetic distance between polyclonal infections. Genetics 222, iyac126 (2022).
Murphy, M. & Greenhouse, B. MOIRE: a software package for the Estimation of allele frequencies and effective multiplicity of infection from polyallelic data. Bioinformatics 40, btae619 (2024).
Esayas, E. et al. Impact of nighttime human behavior on exposure to malaria vectors and effectiveness of using long-lasting insecticidal Nets in the Ethiopian lowlands and highlands. Parasit. Vectors. 17, 520 (2024).
Asua, V. et al. Plasmodium Species Infecting Children Presenting with Malaria in Uganda. (2017). https://doi.org/10.4269/ajtmh.17-0345
Rek, J. et al. Asymptomatic School-Aged children are important drivers of malaria transmission in a high endemicity setting in Uganda. J. Infect. Dis. 226, 708–713 (2022).
Andolina, C. et al. Sources of persistent malaria transmission in a setting with effective malaria control in Eastern Uganda: a longitudinal, observational cohort study. Lancet Infect. Dis. 21, 1568–1578 (2021).
Teyssier, N. B. et al. Optimization of whole-genome sequencing of Plasmodium falciparum from low-density dried blood spot samples. Malar. J. 20, 116 (2021).
Hugo, L. E. et al. Rapid low-resource detection of Plasmodium falciparum in infected Anopheles mosquitoes. Front. Trop. Dis. 5, 1287025 (2024).
Hofmann, N. et al. Ultra-Sensitive detection of plasmodium falciparum by amplification of Multi-Copy subtelomeric targets. PLOS Med. 12, e1001788 (2015).
Mayor, A. et al. Sub-microscopic infections and long-term recrudescence of Plasmodium falciparum in Mozambican pregnant women. Malar. J. 8, 9 (2009).
Molla, E. et al. Seasonal dynamics of symptomatic and asymptomatic Plasmodium falciparum and Plasmodium Vivax infections in coendemic Low-Transmission settings, South Ethiopia. (2024). https://doi.org/10.4269/ajtmh.24-0021
Paragon Genomics Product Documents. Paragon Genomics https://www.paragongenomics.com/customer-support/product_documents/
Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35, 316–319 (2017).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Callahan, B. J. et al. DADA2: High-resolution sample inference from illumina amplicon data. Nat. Methods. 13, 581–583 (2016).
Gupta, H. et al. Drug-Resistant polymorphisms and copy numbers in Plasmodium falciparum. Mozambique 2015 Emerg. Infect. Dis. 24, 40–48 (2018).
Grignard, L. et al. A novel multiplex qPCR assay for detection of Plasmodium falciparum with histidine-rich protein 2 and 3 (pfhrp2 and pfhrp3) deletions in polyclonal infections. EBioMedicine 55, 102757 (2020).
da Silva, C. et al. Targeted and whole-genome sequencing reveal a north-south divide in P. falciparum drug resistance markers and genetic structure in Mozambique. Commun. Biol. 6, 1–11 (2023).
Emiru, T. et al. Evidence for a role of Anopheles stephensi in the spread of drug- and diagnosis-resistant malaria in Africa. Nat. Med. 29, 3203–3211 (2023).
Daniels, R. F. et al. Modeling malaria genomics reveals transmission decline and rebound in Senegal. Proc. Natl. Acad. Sci. 112, 7067–7072 (2015).
Chang, H. H. et al. Mapping imported malaria in Bangladesh using parasite genetic and human mobility data. eLife 8, e43481 (2019).
Holzschuh, A. et al. Multiplexed ddPCR-amplicon sequencing reveals isolated Plasmodium falciparum populations amenable to local elimination in Zanzibar, Tanzania. Nat. Commun. 14, 3699 (2023).
Hathaway, N. J., Parobek, C. M., Juliano, J. J. & Bailey, J. A. SeekDeep: single-base resolution de Novo clustering for amplicon deep sequencing. Nucleic Acids Res. 46, e21 (2018).
Lerch, A. et al. Development of amplicon deep sequencing markers and data analysis pipeline for genotyping multi-clonal malaria infections. BMC Genom. 18, 864 (2017).
Schaffner, S. F., Taylor, A. R., Wong, W., Wirth, D. F. & Neafsey, D. E. HmmIBD: software to infer pairwise identity by descent between haploid genotypes. Malar. J. 17, 196 (2018).
Henden, L., Lee, S., Mueller, I., Barry, A. & Bahlo, M. Identity-by-descent analyses for measuring population dynamics and selection in recombining pathogens. PLOS Genet. 14, e1007279 (2018).
Chang, H. H. et al. THE REAL McCOIL: A method for the concurrent Estimation of the complexity of infection and SNP allele frequency for malaria parasites. PLOS Comput. Biol. 13, e1005348 (2017).
Collins, K. A., Snaith, R., Cottingham, M. G., Gilbert, S. C. & Hill, A. V. S. Enhancing protective immunity to malaria with a highly Immunogenic virus-like particle vaccine. Sci. Rep. 7, 46621 (2017).
Laurens, M. B. & RTS S/AS01 vaccine (Mosquirix™): an overview. Hum. Vaccines Immunother. 16, 480–489 (2020).
World Health Organization. Malaria vaccine: WHO position paper – May 2024. Wkly. Epidemiol. Rec. 19, 225–248 (2024).
Lagerborg, K. A. et al. Synthetic DNA spike-ins (SDSIs) enable sample tracking and detection of inter-sample contamination in SARS-CoV-2 sequencing workflows. Nat. Microbiol. 7, 108–119 (2022).
CleanPlex amplicon sequencing for targeted DNA and & Seq, R. N. A. Paragon Genomics https://www.paragongenomics.com/targeted-sequencing/amplicon-sequencing/cleanplex-ngs-amplicon-sequencing/
Resources | EPPIcenter. https://eppicenter.ucsf.edu/resources
Report on antimalarial. Drug efficacy, resistance and response: 10 years of surveillance (2010–2019). (2020) https://www.who.int/publications/i/item/9789240012813
Miotto, O. et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat. Genet. 47, 226–234 (2015).
Acknowledgements
We thank Phil Rosenthal and Amy Bei for their input in panel design. We also thank members of the EPPIcenter at UCSF, as well as the Rapid Response Team and the Genomics Platform at the Chan Zuckerberg Biohub for valuable discussions.
Funding
This work was supported by several grants from the Bill & Melinda Gates Foundation (INV-019032, OPP1132226, INV-037316, INV-024346, INV-031512, INV-003212, INV-024346). This research is also part of the ISGlobal’s Program on the Molecular Mechanisms of Malaria which is partially supported by the Fundación Ramón Areces. We acknowledge support from the grant CEX2023-0001290-S funded by MCIN/AEI/ https://doi.org/10.13039/501100011033, from the Generalitat de Catalunya through the CERCA Program, from the Departament d’Universitats i Recerca de la Generalitat de Catalunya (AGAUR; grant 2017 SGR 664) and from the Ministerio de Ciencia e Innovación (PID2020-118328RB-I00/AEI/https://doi.org/10.13039/501100011033). CISM is supported by the Government of Mozambique and the Spanish Agency for International Development (AECID). The parent study from which the 2017–2018 Ethiopia samples were derived was funded by the Global Fund to Fight AIDS, Tuberculosis, and Malaria through the Ministry of Health - Ethiopia (EPHI5405) and by the Bill & Melinda Gates Foundation through the World Health Organization (OPP1209843). A.A.-D. was supported by the Chan Zuckerberg Biohub Collaborative Postdoctoral fellowship. B.G. was supported by NIH-NIAID K24AI144048. J.B. was supported by NIH-NIAID K23AI166009. J.L.S. was supported by NIH-NIAID 5K01AI153555. J.B.P. was supported by NIH-NIAD R01 AI77791. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
Designed the study: A.A.-D., N.H., A.B., J.L.S., E.G., B.G. Developed and benchmarked bioinformatic pipeline: A.A.-D., K.M., B.P., M.G.U., D.D. Managed samples and data: A.A.-D, E.N.V, B.P., N.H, S.B, M.G.U., H.G., S.K, I.W., S.M.F, J.B.P., W.L., E.E. Generated data: A.A.-D., E.N.V., S.B., P.C, T.K., F.D.S., B.N., H.G., C.G.F., C.D.S., S.M.F., W.L., E.E. Analyzed data: A.A.-D., E.N.V., K.M., B.P., N.H., I.G., W.L. Interpreted data: A.A.-D., E.N.V., K.M., B.P., N.H., I.G., M.G.U., M.C., J.R., S.T., I.S., E.R.-V., C.T., J.B., A.M., B.G., W.L. Drafted the manuscript: A.A.D., E.N.V., K.M., B.P., B. G. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
J.B.P. reports research support from Gilead Sciences, non-financial support from Abbott Laboratories, and consulting for Zymeron Corporation, all outside the scope of the current work. All other authors report no potential conflicts of interest.
Ethical approval and consent to participate
Ethical approval for the study that generated the 26 field samples from Ethiopia3 was granted by the Ethiopia Public Health Institute (EPHI) Institutional Review Board (IRB; protocol EPHI-IRB-033-2017) and WHO Research Ethics Review Committee (protocol ERC.0003174 001). Processing of de-identified samples and data at the University of North Carolina at Chapel Hill (UNC) was determined to constitute non-human subjects research by the UNC IRB (study 17–0155). The study was determined to be non-research by the Centers for Disease Control (CDC) and Prevention Human Subjects office (0900f3eb81bb60b9). Study protocols for the study that generated the data for the 436 field samples from Mozambique22 were approved by the ethical committees of CISM and Hospital Clínic of Barcelona, and the Mozambican Ministry of Health National Bioethics Committee. Ethical approval for the studies that included the collection of blood samples used in mosquito feeding assays was received from the Uganda Council of Science and Technology, Makerere University School of Medicine, the University of California, and the London School of Hygiene & Tropical Medicine. Ethical approval for the study that collected P. falciparum and P. vivax samples in Northern Ethiopia in 2022–2023 was obtained from the National Research Ethical Review Committee, Addis Ababa, Ethiopia (reference number: 02/256/630/14), AHRI/ALERT Ethics Review Committee (protocol number: P0-08-22), Aklilu Lemma Institute of Pathobiology Institutional Research Ethics Review Committee (reference number: ALIPB IRERC/111/2015/23) and the WCG IRB approval (protocol number: 1769134-1; IRB tracking number: 20214694). Participants, or guardians/parents in the case of minors, in all these studies provided written informed consent. All research was performed in accordance with relevant guidelines and regulations.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Aranda-Díaz, A., Neubauer Vickers, E., Murie, K. et al. Sensitive and modular amplicon sequencing of Plasmodium falciparum diversity and resistance for research and public health. Sci Rep 15, 10737 (2025). https://doi.org/10.1038/s41598-025-94716-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-94716-5
Keywords
This article is cited by
-
Genetic surveillance reveals low but sustained malaria transmission with clonal replacement in Sao Tome and Principe
Communications Medicine (2025)
-
Performance of molecular inversion probe DR23K and Paragon MAD4HatTeR Amplicon sequencing panels for detection of Plasmodium falciparum mutations associated with antimalarial drug resistance
Malaria Journal (2025)
-
Antimalarial drug resistance and population structure of Plasmodium falciparum in Mozambique using genomic surveillance at health facilities in 2021 and 2022
Scientific Reports (2025)