Introduction

Mosquitoes of the genus Aedes, particularly Aedes (Ae.) aegypti, are responsible for the transmission of many arboviral diseases, including dengue, Zika, yellow fever, West Nile and Chikungunya, resulting in millions of infections globally per year with limited treatment and vaccination options1. The geographical distribution of Ae. aegypti has expanded considerably in recent years, predominantly due to adaptation of this vector to urban environments, climate change and the globalization of human activities, thereby increasing the risk of resurgence and spread of arbovirus infections2,3,4. Compounding the problem is the global emergence of insecticide resistance among Ae. aegypti and other mosquito species, which is threatening to jeopardise the operational effectiveness of vector control campaigns.

Resistance to the four most common classes of insecticides used against adult mosquitoes (carbamates, organochlorines, organophosphates, and pyrethroids) has now been documented worldwide. Resistance in many mosquito species has been associated with target site mutations, metabolic detoxification, cuticular alterations and behavioural avoidance5,6 with a suite of alternative resistance mechanisms being revealed7,8,9,10. Target site resistance is related to mutations in genes that code for insecticide target molecules, such as the voltage-gated sodium channel (vgsc also known as knockdown resistance; kdr), acetylcholinesterase-1 (ace-1 also known as AChE1) and γ-aminobutyric acid (GABA) receptor (resistance to dieldrin; rdl). Mutations in glutathione-s-transferase epsilon two (GSTe2), which encodes an insecticide metabolising enzyme, have also been associated with resistance5,11,12,13. The vgsc is a large protein that is an integral part of the insect nervous system. DDT (dichloro-diphenyl-trichloroethane) and pyrethroid insecticides interfere with the vgsc by prolonging the pore open state leading to insect paralysis and death14. In the reference insect for this gene, Musca domestica, the most frequent kdr resistance mutations are S989 and L101415. In Ae. aegypti, the 1014 codon requires at least two mutations to change to a M. domestica amino acid known to cause resistance; thus, the substitution L1014F, seen pervasively in Anopheles mosquitoes, has not been observed in this species11. Instead, F1534C/L, V1016I/G, I1011V/M and V410L mutations have been associated with pyrethroid resistance in Ae. aegypti and confirmed experimentally6. Other amino acid substitutions reported previously in Ae. aegypti include G923V, L982W, S989P, T1520I and D1763Y11,16,17,18. Many of these mutations are often found in combination and appear only on specific continents. For example, V1016G and S989P appear limited to Asia, while V1016I has only been identified in the Americas and Africa and 723T only in the Americas19.

The ace-1 gene encodes acetylcholinesterase (AchE1), which is responsible for hydrolysis of acetylcholine terminating the transmission of neural signals. Organophosphates and carbamates bind to the acetylcholinesterase active site which inhibits hydrolysis and consequently neural signal termination, leading to insect death. Unlike mammals and some insects (including Drosophila melanogaster), mosquitoes usually have two copies of the ace-1 gene. In Anopheles mosquitoes, the G119S amino acid substitution in ace-1 is generally associated with resistance (all coordinates are based on Torpedo californica)20,21. As with the vgsc, in Ae. aegypti such an amino acid change requires two mutations and has only been observed in one study in India22. Despite the lack of described mutations in ace-1, resistance to organophosphates in Aedes is widespread in the Americas and Asia, while data from Africa is limited6.

The rdl mutation is found in the γ-aminobutyric acid (GABA) receptor gene that controls neural signal inhibition through opening and closing of the transmembrane chloride channel on the cells of the mosquito nervous system. Cyclodienes (e.g., dieldrin) prevent interaction of GABA with its receptor, leading to neuron hyperexcitation and eventual insect death23,24,25,26. The most common resistance mutation in this gene is A301S/G (D. melanogaster numbering) and is observed in multiple insects including mosquitoes of the Anopheles and Aedes genera21,27. Despite a ban on the use of cyclodienes in 200128 due to their slow degradation and environmental persistence, rdl mutations have persisted for decades later in vector populations, suggesting that they impart limited fitness costs29,30.

Unlike rdl, ace-1 and vgsc, which are targets of insecticides, the homodimer glutathione S-transferase (GST) is a detoxifying enzyme. Most organisms, including Ae. aegypti, have multiple GST enzymes of which epsilon two (GSTe2) has been associated with resistance to both DDT and pyrethroids6,12,31,32. The GSTe2 gene contributes to insecticide resistance through both enzyme overexpression and point mutations. Increased expression of this gene was linked to DDT resistance in An. gambiae5,25,26,33. The L119F substitution in GSTe2 was observed to enhance resistance to both DDT and pyrethroids in An. funestus, and I114T exacerbated resistance to DDT in An. gambiae5,33,34,35. In Ae. aegypti, L111S and I150V mutations have been linked to temephos resistance in silico36.

Despite observed phenotypic resistance of Ae. aegypti to all main insecticide classes across many countries in Africa, Americas, and Asia6, the distribution of genetic variants in underlying candidate genes is less studied across Aedes populations compared to Anopheles species. Here, we examined a large (n = 729), globally diverse dataset of publicly available Ae. aegypti whole genome sequencing (WGS) data to uncover the genetic diversity present in vgsc, ace-1, rdl and GSTe2. The diversity in insecticide resistance loci was interpreted alongside current global trends in phenotypic insecticide resistance in Ae. aegypti. This data provides a catalogue of genetic variants that could be involved in insecticide resistance and supports further studies on the molecular surveillance of emerging and spreading insecticide resistance mechanisms amongst Ae. aegypti populations.

Material and methods

Aedes aegypti genomic data

We searched the NCBI SRA database for “Aedes aegypti” sample data and restricted results to WGS libraries where the number of bases contained implied at least fivefold coverage when mapped to the reference genome AaegL5 (GCF_002204515.2)32. We obtained a total of 703 WGS Ae. aegypti (non-AaegL5) libraries from 15 countries, across Africa (n = 476, 8 countries), the Americas (n = 191, 3 countries), Oceania (n = 16, 1 country) and Asia (n = 20, 1 country), and 26 colony samples of which 20 had known country of collection. Additionally, we included 7 Ae. mascarensis samples from Madagascar (n = 4) and Mauritius (n = 3) as outgroup37,38,39,40,41 (Table S1).

Insecticide resistance phenotypic data

Insecticide response data was only available for the Bora-Bora susceptible reference strain, which has been maintained in the insectary for 134 generations without any exposure to insecticides42 and the Nakon Sawan reference strain, which is resistant to deltamethrin and temephos41,43, (Table S2). Global insecticide resistance phenotype data was retrieved from the IR Mapper tool44 (sourced on 19/04/2023), which covered 73 countries of which 8 overlap with samples in this study. No data was available for 5 countries (Kenya, Madagascar, Mauritius, South Africa, and Uganda); an additional literature search in PubMed failed to retrieve additional publicly available phenotypic data for Ae. aegypti in these countries. We included the data where the phenotype was tested with World Health Organization (WHO) tube or bottle bioassay or Centers for Disease Control and Prevention (CDC) bottle bioassay. Phenotypic data based solely on PCR or RT-PCR methods were excluded. Overall, we analysed 3172 data points for 19 different insecticides across four insecticide classes (Pyrethroids, Organophosphates, Organochlorines and Carbamates) (Table S3). Data points from IR mapper were reported as susceptible, possible resistance or resistant based on mortality as per WHO and CDC guidelines.

Bioinformatic analysis

We aligned the WGS libraries using bowtie2 (v2.4.1) software (with a setting --fast-local)45. We processed the alignment files using samtools (v1.7) software and SNPs were called using the GATK HaplotypeCaller tool (v4.1.9) with default settings46,47. A minimum coverage of 5-fold was used to accept SNPs. We merged the individual VCF files into a multi-sample file using BCFtools (v1.9)48. The impact of SNPs in the multi-sample VCF was predicted using snpEff software (v5.0) with AaegL5 genome annotation (GCF_002204515.2)49. The alignment process was performed against the mRNA sequences of twenty Ae. aegypti genes (Table 1). Four were loci linked to insecticide resistance [vgsc (XM_021852340.1), rdl (XM_021840622.1), ace-1 (XM_021851332.1) and GSTe2 (XM_021846286.1)] and the remaining sixteen genes were used to establish population structure. One of these was mitochondrial cox1 (YP_009389261.1) and the remaining fifteen genes were evenly spread across all three Ae. aegypti chromosomes (Table 1). These 15 genes were determined to have unique genome-wide exon sequences (using NCBI BLASTn v2.9.0 with—word-size 28 and—evalue 0.01) which minimised potential mis-mapping of WGS reads to the Ae. aegypti genome known to contain many duplications50. Read coverage per nucleotide per gene was calculated using the samtools “depth” function and was used to identify possible gene duplications in samples48. We merged the coverage data into a single data matrix and removed all regions except gene exons, because intronic regions contained high numbers of repeats. For each sample, we divided each per base coverage value by that sample’s overall median coverage across all genes, except vgsc and GSTe2, which may have copy number variants. We applied UMAP (v0.5.1) software (with a Euclidean distance metric) on this scaled matrix to identify gene clusters based purely on the coverage51.

Table 1 The genes analysed.

Population genetics analysis

To determine population structure, we used UMAP software (with Russell-Rao distance metric) on the multi-sample VCF, followed by application of HDBSCAN (v0.8.28)51,52 to determine sample clustering (see53,54,55 for recent applications). This work was performed in python (v3.7.6), with scripts available from https://github.com/AntonS-bio/resistance-AedesAegypti. Linkage disequilibrium was calculated using vcftools on phased vcf files created with beagle (v 22Jul22.46e) software to provide a R2 value for each combination of non-synonymous mutations by sample country. Plots of these values were visualised using the gaston (v1.5.9) package in R.

Protein structure modelling

Protein structure modelling was performed using AlphaFold Multimer software with full protein databases56,57. When referring to substitutions and their effects on proteins, we have followed the established nomenclature based on reference resistance linked proteins and structures in the protein databank: ACE1 (2C4H; Tetronacre californica), GABA receptor (NP_729462.2; Drosophila melanogaster), GSTe2 (XP_319968.3; An. gambiae) and VGSC (NP_001273814.1; Musca domestica)58,59. Unless otherwise specified, all substitution coordinates are with respect to these reference sequences.

Results

Genetic variation and population structure

Across the 729 Aedes samples from 15 countries, a total of 1829 SNPs (474 non-synonymous (NS)) were detected across the CDS of four insecticide resistance associated genes (vgsc, rdl, ace-1 and GSTe2), and 9673 SNPs were identified across the CDS of 15 non-resistance associated genome-wide gene (Tables 1, 2, Table S4).

Table 2 Missense mutations identified in samples and occurring in more than 10 non-lab sample.

Using the SNPs from the CDS of 15 genes not associated with insecticide resistance, a UMAP clustering analysis revealed five distinct clusters (Fig. 1A), broadly linked to: (i) eastern Kenya and South Africa (n = 112); (ii) west, central Africa and west Kenya (n = 350); (iii) the Americas, Thailand, and other (n = 258); (d) the Bora-Bora mosquito line from French Polynesia (n = 9); (e) Ae. mascarensis from Madagascar and Mauritius (n = 7). Similar results were obtained when analysing only the 1829 SNPs in genes that are associated with resistance (Fig. 1B). These results are broadly consistent with previous reported population structure of Ae. aegypti using SNPs and microsatellite data, where African samples formed one cluster and samples from Asia, America and the Caribbean comprised another cluster60; however, more focused studies provide better understanding of population structure38,60,61,62,63. As we observed a separation of most eastern Kenyan samples (n = 121) from west Kenya (n = 37), we investigated the genotype data in these groups independently. Some eastern Kenyan samples (n = 14/121) from a human- biting colony of domestic Ae. aegypti, originally collected indoors in Rabai60,64, clustered with non-African samples (Americas and Thailand and other cluster), as previously observed. When including only non-African samples, the UMAP clustering analysis revealed modest separation of the samples from Brazil, Mexico, French Polynesia, American Samoa and Thailand. For the samples from Africa, clustering separated east Kenyan samples from the rest (Fig. S1). The same patterns were detected across both resistance and non-resistance genes (Fig. S1). Clustering using mitochondrial cox1 gene was different from the results based on chromosomal loci (Fig. 1C–F). In multiple samples, SNPs had heterozygous cox1 genotypes possibly multiploidy due to the presence of previously described copies of nuclear mitochondrial (NUMT) DNA which could confound clustering65,66.

Figure 1
figure 1

Population structure using UMAP embedding of SNPs from non-resistance linked genes (A,C,D,E), and resistance linked genes (B) and cox1 (F).

Genetic variation across insecticide resistance associated genes

Vgsc

In the vgsc gene, a total of 1075 SNPs (202 non-synonymous; NS) were identified, of which 36 NS SNPs were present in > 1 sample, including eight mutations previously linked to insecticide resistance (V410L, G923V, S989P, I1011M, V1016I/G, T1520I and F1534C) (Table 2, Table S4). We did not observe any other pyrethroid resistance associated substitutions such as L982W, detected previously in Vietnam and Cambodia, and D1763Y reported in Taiwan. However, the D1763G mutation was present in a single USA sample11,16,17,18. The most frequent mutations were F1534C (39%), S723T (23%), V410L (22%) and V1016I (22%) (Fig. 2). The most prevalent F1534C mutations occurred in nearly all samples from the Americas (186/191) and Thailand (20/20). The frequency of F1534C was lower in African samples, appearing only in Burkina Faso (n = 20/34), Ghana (n = 33/58), Nigeria (n = 1/19) and East Kenya (n = 8/107). The F1534C mutation was accompanied by V1016I, S723T and V410L substitutions in most samples from USA, Burkina Faso, and Mexico, as well as in a single Nigerian sample. In Thailand, F1534C co-occurred in many samples with V1016G, T1520I and S989P (Table 2).

Figure 2
figure 2

Allele frequency of each missense SNP across the insecticide resistance associated genes; vgsc, ace-1, rdl, and GSTe2, by country. Only SNPs with at least 10 samples with a non-reference allele are shown mutation.

Several mutations were found to be regionally specific. The V1016G mutation was found only in Asia (Thailand) while V1016I was detected in USA, Mexico, and a few countries in Africa19. The M944V substitution was unique to East Kenya (n = 42/107), L946G was almost exclusive to Brazil (n = 15/16) except for one Nigerian sample. The V1016G, T1520I (n = 10/20), S989P (n = 7/20), and S66F (n = 11/20) were also almost exclusive to Thailand, apart from a single Nigerian and a Brazilian sample (Table 2). Two conservative in-frame insertions occurred in ~ 20% of west and central African samples, which included an addition of amino acid Glycine (Gly) into a sequence of four consecutive Gly (positions 2047–2050), and an addition of Serine-Glycine (positions 2016 and 2017).

Rdl (GABA receptor)

In the rdl gene, we identified a total of 244 SNPs (64 NS), of which only 17 NS SNPs occurred in > 1 sample and the most frequent were G84A, S115T and A301S. The S115T substitution was present in almost all samples (n = 733/736) including all Ae. mascarensis (Fig. 2, Table 2). The T115 is the dominant allele in An. gambiae suggesting that the common ancestor of both An. gambiae and Ae. aegypti had the 115T allele, and a mutation in the Ae. aegypti reference strain changed T to S67.

The previously described A301S substitution, associated with resistance to organochlorines, was frequent in the USA (n = 97/160) and Thailand (n = 11/20), and infrequent in a few countries in Africa (Table 2)21,27. This substitution is located on the a-helix forming the protein pore (Fig. S2). The only other notable mutation was E84D present in 18 samples (Africa n = 13, Thailand n = 5), and located on the outward facing section of the protein but could not be robustly modelled by the AlphaFold software.

Ace-1

A total of 243 SNPs were identified in the ace-1 gene, of which 99 led to amino-acid substitutions, with 30 present in > 1 sample (Table 2). Only 6 amino-acid substitutions (G12S, H35L, D131Q, L687F, S693A, C699S) occurred in > 10 samples (Fig. 2). The most frequent mutation was C699S (n = 42/736), which was present in samples from west and central Africa (n = 29) and the Americas (n = 13). The second most frequent substitution was H35L (5.0%) observed only in west and central African samples. The third most frequent substitution was G12S (4.8%) found mostly in the Americas (n = 26/37) and Thailand (n = 7/37) (Table 2). All three substitutions are defined in Ae. aegypti coordinates because these amino acids are outside the range of the T. californica reference ACE1 (PDB: 2C4H). In fact, only 20 substitutions had a corresponding coordinate in the T. californica protein (Table 2). The only substitution in Ae. mascarensis was T55P (T. californica coordinates) present in all samples of this species. We modelled the ACE1 protein structure in AlphaFold, and in line with results of crystallographic experiments, the residues 1–131 and 660–702 were disordered, likely reflecting their role in anchoring the protein to the cellular membrane and receptor proteins68. The G119S resistance substitution commonly reported in ACE1 in other insect species was not detected in this dataset. This absence is likely because G119S would require two nucleotide substitutions in Ae. aegypti. Further, instead of two ace genes commonly found in insects, the Ae. aegypti reference genome has four ace genes including one analysed here (LOC5578456) and three others (LOC5574466, LOC5575867, LOC5570776). The mRNA encoding the cognate proteins had < 5% pair-wise coverage which rules out recent duplication as the origin of these genes. One of these loci (LOC5570776) had the 119S amino acid. We found that despite the very high prevalence of transposable elements in Ae. aegypti, this gene remains uninterrupted by them suggesting this locus might be functional32.

GSTe2

The GSTe2 gene has a variable copy number in Ae. aegypti, and the reference genome contains four copies of this gene32. The variable copy number was also evident in our analysis. Because we used short read data, we could not robustly assign each mutation to individual GSTe2 loci. A total of 267 SNPs were detected in GSTe2 genes, with 109 leading to amino-acid substitutions, of which 42 were present in >1 sample (Table 2). Seven substitutions were highly frequent: I150V (n = 670), A198E (n = 670), C115F (n = 542), L111S (n = 288), I169S (n = 172), L9I (n = 151) and C115S (n = 108) (Fig. 2). The samples from Thailand had neither synonymous nor missense mutations in GSTe2, which we confirmed by visual examination of the read alignments. The C115F substitution was present in almost all countries (except Thailand and Mauritius). The C115S substitution was most common in Africa (n = 101/353). In addition to C115F/S, we observed two other common substitutions (L111S, L9I) at the DDT binding site69. The L111S substitution (n = 288/736) appears globally distributed, and L9I was found mainly in Africa and USA, but not observed in Ae. mascarensis. The I169S mutation was common in the presence of L9I. Based on a high confidence AlphaFold protein structure model for GSTe2, the I169S mutation is not part of either glutathione or DDT binding site; however, it interacts with both F115 and M111, which are part of the glutathione binding pocket (Fig. S3).

Gene duplications

Gene variable copy numbers were identified based on excess median-scaled read coverage. For the vgsc gene, a group of 26 samples had potential duplications, with a median-scaled coverage of 1.4-fold compared to 1.0-fold for the rest of the samples. The samples in this set came from a disparate group of countries: Senegal (n = 13), American Samoa (n = 4), and USA (n = 3), Mexico (n = 2), Mauritius (n = 2), Kenya (n = 1) and Thailand (n = 1) (Table S1).

For GSTe2, two groups of samples had likely copy number events. First, a group of samples with median 4.2-fold median-scaled coverage consisting of samples from Thailand (n = 27/28) including samples from the Nakh lab strain, USA (n = 38/160), Mexico (n = 5/15), Brazil (n = 1/16) and two from the Vienna F4 colony70. A second group consisted of samples from USA (n = 15/160) and Mexico (n = 9/16) with median-scaled coverage of 9.3-fold compared to 0.9-fold for the rest of the samples (Table S1, Fig. S4). In our search of the literature, we did not identity previous reports of such high duplication rate; this finding requires further validation. However, this result also shows that majority of Ae. aegypti reference sequence have single copy of GSTe2, in contrast to the reference strain which has four32.

Linkage disequilibrium between missense mutations

We examined the geographical distribution of the non-synonymous SNPs across the four resistance genes and observed that many mutations co-occur together in certain populations (Fig. 2). For each locus, per population, we assessed the pairwise linkage disequilibrium (LD) of non-synonimous SNPs. We found twenty-seven pairwise SNPs that had, without adjusting for multiple testing, an R2 value above 0.5 (GSTe2 n = 15, vgsc n = 9, ace-1 n = 2, and rdl n = 1) (Table S5). The GSTe2 mutations L9I/I169S (Burkina Faso, Kenya, Gabon, Ghana, Uganda) and I150V/A198E (Kenya, French Polynesia, Mauritius) were detected with a R2 > 0.5 in several countries. In the vgsc gene, several SNPs that have been associated with insecticide resistance also had R2 > 0.5, particularly V410L, V1016I, V1016G and F1534C.

Geographical distribution of insecticide resistance mutations and phenotypes

The IR mapper was used to obtain phenotypic data for 8 of the 15 countries examined in this study. These phenotypes show disparity between the availability of phenotypic and genomic data, for example, Brazil and Thailand have the highest number of bioassay records while only having 16 and 20 genomic sequences available, respectively. However, in some countries there was genomic data available with limited phenotypic data, such as Uganda and Kenya. Phenotypic data available for each country from IR Mapper was mapped to the co-occurrence of nine mutations previously associated with insecticide resistance (A301S (RDL) associated with organochlorine resistance, and F1534C, T1520I, V1016I/G, I1011V/M, S989P, G923V, V410L (VGSC) all associated with pyrethroid resistance). Thailand, Burkina Faso, and the USA had the highest proportion of samples with several known insecticide resistance mutations (Fig. 3). This is supported by the Thailand phenotypic data from IR Mapper, which shows reports of resistance to all four main insecticide classes in this country (Fig. 4), particularly to organochlorines, carbamates and pyrethroids. Elevated levels of resistance have also been reported in southeast Asian regions, such as Indonesia, Malaysia, and Thailand; however, there are gaps in the genomic data from these countries71,72,73,74. For the USA there is no information on phenotype data on IR Mapper, but resistance to pyrethroids has been reported in several states75,76,77.

Figure 3
figure 3

Proportion of samples with 1 or more mutations associated with insecticide resistance in each geographical population. Insecticide resistance SNPs included are: A301S (rdl), F1534L/C, T1520I, V1016I/G, I1011V/M, S989P, G923V, V410L (vgsc). Only populations with more than 10 samples were included.

Figure 4
figure 4

Publicly available phenotype data for Ae. aegypti showing the proportion of records that report resistance, possible resistance and susceptibility. Numbers denote total number of records for the insecticide class for that country region44. Only data collected on Aedes aegypti after 2000 were included for countries that were present in the WGS data set.

In Africa, 53% of samples from Burkina Faso had more than two insecticide resistance mutations, all in the vgsc gene. Burkina Faso also had the highest reported resistance to pyrethroids when compared to the other African samples in this data set (Nigeria, Senegal, Ghana, and Gabon). Levels of resistance to pyrethroids varied between the 8 countries analysed here. The highest levels of resistance were also observed in Brazil, Mexico, and Thailand, coinciding with samples with the most mutations in the vgsc gene (excluding the USA, where limited phenotypic data is available) (Figs. 3, 4).

The data from IR mapper showed that the largest number of reports of resistance involved insecticides of the organochlorine class. Mutations associated with this resistance include SNPs in the vgsc and rdl genes. However, countries with high resistance to organochlorines, such as Senegal and Nigeria have no or very low frequency of mutations in these loci. As the genomic data presented here do not have matching phenotypic information, it is possible that these samples were from a susceptible background or that there are other mechanism of resistance causing the observed phenotype. The least resistance was reported against organophosphates, although resistance is still high in Mexico, followed by Brazil and Thailand (Table 2). These countries only have 1 mutation, G12S, in the ace gene common across all of them.

Discussion

We explored the genetic diversity present in four genes (vgsc, ace-1, rdl and GSTe2) involved in insecticide response across 729 Ae. aegypti and 7 Ae. mascarensis samples from 15 countries. We identified many known and unreported amino-acid substitutions which may be involved in insecticide resistance. This catalogue of genetic variants is a valuable resource that can be explored to investigate molecular mechanism associated with insecticide resistance together with phenotypic information and used to design diagnostics genetic markers for molecular surveillance.

The populations with greater numbers of amino acid substitutions linked to insecticide resistance were Thailand (RDL: A301S; VGSC: V410L, S989P, V1016G and F1534C) and the USA (RDL A301S; VGSC: V410L, Gly923V, I1011M and F1534C). In Africa, the substitutions most frequently observed were RDL A301S and VGSC V410L and F1534C, but many countries had none of the reported mutations. We have also observed that VGSC V410L and S723T co-occur in all but one sample. None of the Thai samples had any mutations in the GSTe2 gene, despite having adequate read coverage. In other countries, we detected two common mutations in GSTe2 (C115F/S and L111S/F) in the DDT binding site. The C115F and C115S mutations were most frequent in Kenya (n = 142, n = 20), the USA (n = 114, n = 20) and Senegal (n = 82, n = 35). Previous work involving DDT docking with An. gambiae GSTe2 has suggested that one of the DDT’s planar p-chlorophenyl rings can fit into a sub-pocket, but the other ring faces spatial hindrance from M111 and F115 in the side chains69. In An. gambiae, the M111S substitution would require two nucleotide changes in contrast to one required for L111S/F in Ae. aegypti. To our knowledge, there are no reports of An. gambiae M111S or F115C/S; although the latter substitution requires a single amino acid change. These two substitutions were detected in almost all countries in this Aedes dataset.

We found only two mutations on the surface of the ACE1 pocket directly involved in hydrolysis (A81S, n = 5; D85H, n = 2)13. Since we did not have phenotype data, we cannot determine if these mutations are associated with resistance, but their low prevalence would appear at odds with much higher rate and multiple instances of emergence of G119S in An. gambiae20. Nevertheless, further functional work can contribute to elucidating the involvement of these mutations in resistance phenotypes.

We have also explored the possibility of gene duplications, and detected such variants in GSTe2 in USA, Mexico, Brazil, and Thailand, which are of interest due to the high rates of permethrin resistance reported in the Americas and Asia78,79. We found no duplications in west and central Africa or Eastern Kenya and South Africa regions6, but bioassay data in these regions is lacking. The possible duplication of the gene encoding VGSC is more puzzling. Previous research in D. melanogaster found that individuals lacking VGSC are not viable, but in contrast those with a single functioning gene copy are healthy apart from increased temperature sensitivity80. However, DDT and pyrethroids both prolong the open state of VGSC, so the extra gene copy is unlikely to induce resistance through increased number of pores14. Experimental work is required to explain the functional role of the extra copy and determine if it is associated with increased insecticide resistance. Long-read sequencing can help to validate the duplications detected and the differences between the vgsc sequences.

The inferred population structure was broadly consistent with previous research based on chromosomal loci. We even identified the two previously described distinct subpopulations of Ae. aegypti in Rabai District of Kenya60, but we also observed inconsistency between the structure we inferred from 15 non-resistance genes and 4 resistance genes (Fig. 1A,B). This inconsistency is very clear in case of VGSC where the same four mutations were present in 18/34 samples from Burkina Faso, 133/160 from USA and 8/15 from Mexico. While these could have arisen independently, single emergence and introduction elsewhere appears more parsimonious especially since these samples also share synonymous mutations. Such separate introductions of Ae. aegypti have been examined in the past38,62. However, the result may also be artefact of our methodology. The clustering methods we used have two shortcomings. First, they don’t have a measure of confidence; second, the relative distances between clusters and spread of points in cluster are usually not meaningful53. As a consequence, it’s impossible to infer diversity of population within a cluster, nor to determine relatedness between clusters.

An important observation for future research is that the cox1 gene and other mitochondrial loci may be problematic for population studies in Ae. aegypti because of the unknown number of cox1 copies per genome65,66. This is the result of unknown numbers of mitochondria per cell, unknown number of mitochondrial DNA copies on chromosomes, and unknown allelic diversity of all these cox1 sequences.

While we focused on exploring the genetic diversity in four genes associated with target site insecticide resistance, there are many loci that could have an important role, particularly in metabolic resistance. Multiple P450 genes, particularly members of the CYP6 and CYP9 subfamilies, have been associated with resistance by overexpression when comparing insecticide-resistant to susceptible strains81,82,83.

Having both phenotypic and genotypic data is fundamental for the full understanding of the link between phenotypic resistance and genetic mutations, as well as cross resistance mechanisms. Unfortunately, we did not have phenotypic data for all the countries with genotypic data in this study. We strongly advocate that where possible, phenotypic data be generated for samples with genomic sequences.

Further work on exploring genetic diversity in these gene families, particularly using long-read sequencing to support assembly and correct assignment of copy numbers to each individual gene, may reveal important molecular markers that can be involved in insecticide resistance. Genomic studies, like ours, can provide guidance to functional studies and inform the design of genotyping assays for large scale surveillance of insecticide resistance.