Abstract
Aedes aegypti and Aedes albopictus mosquitoes spread major vector-borne viral diseases in tropical and sub-tropical regions of the globe. In this study, we sequenced the genome of Indian Ae. aegypti and Ae. albopictus and mapped to their reference genomes. Comparative genomics were performed between our strain and the reference strains. A total of 14,416,484 single nucleotide polymorphisms (SNPs) and 156,487 insertions and deletions (InDels) were found in Ae. aegypti, and 28,940,433 SNPs and 188,987 InDels in Ae. albopictus. Particular emphasis was given to gene families involved in mosquito digestion, development, and innate immunity, which could be putative candidates for vector control. Serine protease cascades and their inhibitors called serpins, play a central role in these processes. We extracted high-impact variants in genes associated with serine proteases and serpins. This study reports for the first time a high coverage genome sequence data of an Indian Ae. albopictus mosquito. The results from this study will provide insights into Indian Aedes specific polymorphisms and the evolution of immune related genes in mosquitoes, which can serve as a resource for future comparative genomics and those pursuing the development of targeted biopesticides for effective mosquito control strategies.
Similar content being viewed by others
Introduction
The arboviral disease-transmitting mosquitoes Aedes aegypti and Aedes albopictus spread the most dangerous diseases like dengue, chikungunya, and zika1,2. Dengue is prevalent in over 129 countries, of which 70% of cases are concentrated in the Asian continent, with an annual incidence of 100–400 million infections3,4,5,6. India alone accounts for an estimated 34 out of 96 million global cases4. Notably, dengue endemic regions in India include 12 states, of which Tamil Nadu and Kerala are the most afflicted with dengue incidences concentrated in the cities of south India5. Chennai, the capital city of Tamil Nadu has recently seen a surge in cases and deaths due to dengue in December 20237. An association in the Madurai district of Tamil Nadu has been reported between severe dengue epidemics and an increase in Ae. albopictus population during the wet season8,9,10,11,12. Moreover, other studies in India have also highlighted the significance of Aedes mosquitoes as dengue vectors, emphasizing vectorial capacity13,14, insecticide susceptibility15, and genetic diversity16.
The mosquito vector Ae. aegypti originated in Africa17,18,19,20, thrives in urban settings, and is anthropophilic20,23,24. In contrast, Ae. albopictus originated in Southeast Asian forests1,12,21, is commonly found in peri-urban and rural areas, and is an opportunistic feeder22,24. In India, anthropogenic disturbances and climate changes have driven mosquito abundance and increased dengue occurrences25,26, necessitating different control strategies for each species. Current vector control strategies in India include mosquito surveillance, altering its habitat, and biological control, including larvivorous fish and endotoxin-containing bacteria27. Chemical control methods include the use of insecticides such as temephos, pyrethrum spray, and malathion fogging27,28. However, insecticide resistance is a significant challenge28,29, highlighting the need for novel strategies such as the use of Aedes-specific enzymes30.
Genome-wide single nucleotide polymorphism (SNP) analysis is a powerful tool in genetic studies, particularly for investigating genetic diversity, structure, and evolutionary processes31,32,33. In Aedes mosquitoes, specific immune genes exhibit significant polymorphism and could serve as an emerging molecular marker34. For instance, genes like CLIPs (clip domain serine proteases), serpins (serine protease inhibitors), and heme-containing peroxidases (HPXs) show frequent polymorphisms, particularly in their protein-coding sequences34. This suggests that genes are under strong selective pressure due to their role in defending against pathogens like the dengue virus34. The high polymorphism in these immune-related genes makes them excellent markers for tracking disease resistance, monitoring genetic diversity, and developing targeted control measures31,32,34.
Aedes mosquitoes’ life cycle is driven by an array of metabolic enzymes called proteases35. Based on their activity, they can be classified into endopeptidases and exopeptidases36. These proteases fall into seven prominent families based on active site composition: aspartic protease, cysteine protease, glutamic protease, metalloprotease, a serine protease, threonine peptidases, and asparagine lyases36. Serine protease (S1 family protease) constitutes one of the most prominent gene families and plays a crucial role in various physiological processes, including digestion, development, and innate immunity35. For example, trypsin, chymotrypsin-like mid-gut serine proteases play a crucial role in protein digestion while C-terminal like importin binding protein (Clip) domain serine protease or CLIP play an essential role in innate immune response35,36,37,38. This protein family is a diverse and expanding family reported in Aedes mosquito genomes39,40 compared to other insect genomes. Although the roles of serine proteases are well-known, only a few studies have reported on their experimental characterization and genomic information, including polymorphism33,34. A complete understanding of the genetics and their genetic diversity in mosquitoes, either through the whole genome or population studies, could help develop a successful vector control method31. However, all genomic information from Ae. aegypti and Ae. albopictus is a foundation for comparative genomics, offering insights into prominent gene families related to adaptability as invasive species39,40. Apart from a study by Bernard et al.,41 on Ae. aegypti there is a paucity of information regarding Ae. albopictus from India.
Based on this information, our study employs high-coverage genomes to identify DNA polymorphisms in Indian field isolates, Ae. aegypti and Ae. albopictus. We compared genomes of the Indian Aedes mosquito with their respective reference genomes39,40 to understand the genomic variations specific to India. This study revealed a substantial number of single nucleotide polymorphisms (SNPs), insertions, and deletions (InDels), with a specific focus on protein-coding genes. Notably, this is the first report of Indian Ae. albopictus genome. The information obtained from this study fills a significant knowledge gap regarding the genome structures of Indian Aedes mosquitoes in terms of polymorphism.
Results and discussion
Genome re-sequencing and read-mapping
Our study pooled inbred lines for each mosquito sample to obtain enough high-quality DNA for sequencing. The use of inbred lines and pooling them was done to get highly consistent sequencing data, reflecting the uniform genetic makeup of the samples42. Although this approach might miss some rare variants in non-inbred lines, our primary objective was identifying the genomic differences between the Indian Aedes mosquito and reference genomes39,40. Hence, we generated 94.06 Gb and 137.22 Gb data, of which 88,482,206,354 and 833,695,702 base pair high-quality reads were obtained for Ae. Aegypti and Ae. Albopictus, respectively. Approximately 98.5% of Ae. Aegypti and 97.53% of Ae. Albopictus reads were successfully mapped with their respective reference genomes39,40. The GC content for Ae. Aegypti and Ae. Albopictus were 37.5% and 40.7%, respectively, similar to their respective reference genomes. The genome coverage was 65.4% for Ae. Aegypti and 41.4% for Ae. Albopictus and provides high-quality bases for further analysis. A summary of the sequence reads and mapping data for both the mosquito genomes is presented in Table 1.
Detection of DNA polymorphism (SNPs and InDels)
Two different variant callers, Genome Analysis Toolkit (GATK)43 and Samtools-mpileup44 were employed to detect high-quality genome variants, including SNPs and InDels. Both methods identified a total of 14,416,484 SNPs for Ae. aegypti and 28,940,433 SNPs for Ae. albopictus after applying a 10 × filtration threshold (Figure S1). Additionally, the total number of InDels for Ae. aegypti and Ae. albopictus were 156,487 and 188,987 respectively (Figure S1). Aedes albopictus exhibited about 1.9 times more SNPs than Ae. aegypti. A smaller number of InDels were observed in Ae. albopictus when compared to Ae. aegypti. A summary of SNPs and InDels frequency in both genomes relative to their reference genomes is presented in Table 2.
Characterization of predicted DNA polymorphism
The DNA polymorphisms in Ae. aegypti and Ae. albopictus genomes were analyzed based on their genomic location and their impact on protein function. The number of detected SNPs and InDels showed considerable variations across chromosomes and scaffolds in Ae. aegypti and Ae. albopictus genomes, respectively, using SnpEff (Supplementary Table S1, S2, S3, and S4). We compared the number of transitions (Ts) and transversions (Tv) between both mosquito species relative to their reference genomes. In Ae. aegypti, we identified 9,496,179 SNPs being transitions and 7,532,184 being transversions, resulting in a Ts/Tv ratio of 1.26. Similarly, in the Ae. albopictus genome, we found 19,861,171 SNPs being transition and 14,032,475 being transversions, yielding a Ts/Tv ratio of 1.415, suggesting a very small influence of sequencing error on our variant calling33. Notably, the frequency of transitions (A/G and C/T; Ts) outnumbered transversions (A/T, G/C, A/C, and G/T; Tv), possibly due to the frequent occurrence of tautomeric shifts and deamination45,46. In the case of transition, frequencies of A/G and C/T were 1,934,862 and 2,085,766, respectively. However, the frequency of A/T transversion (1,359,849) and G/C transversion (37,625) was much less than the frequency of transitions (Fig. 1). This pattern has been observed previously in organisms such as An. funestus, Drosophila47,48,49, where the higher prevalence of transition (C ↔ T and A ↔ G) compared to the transversion (A ↔ T, G ↔ C, A ↔ C and G ↔ T) is partially attributed to the frequent occurrence of 5-methylcytosine deamination reactions, particularly at CpG dinucleotides47. This preference for specific transition types is intricately linked to factors influencing codon degeneracy and selective pressures that govern gene preservation49. The length distribution for deletion ranged from 1 bp-64 bp in both species, while insertion length varied in Ae. aegypti (2 bp-46 bp) and Ae. albopictus (2 bp-43 bp). Dinucleotide insertions were prominent, accounting for 45.6% of Ae. aegypti and 43.8% in Ae. albopictus, while single-nucleotide deletions made up 42.3% and 41.6% of the respective species. Deletions slightly outnumbered insertions, with 82,060 in Ae. aegypti and 100,044 in Ae. albopictus, compared to 74,427 and 88,943 insertions, respectively.
Genomic distribution and annotation of DNA polymorphism
We conducted a comprehensive genome-wide annotation of predicted DNA polymorphisms in Ae. aegypti and Ae. albopictus, focusing on their genomic localization and effects on protein function. DNA polymorphisms occurred more frequently in the noncoding region and regulatory regions, including intron, followed by intergenic, 5’UTR, and 3’UTR regions. In contrast, the coding region, such as exonic regions, had the lowest percentages, with 1% and 1.8% of SNPs, 0.11% and 0.45% of InDels in Ae. aegypti and Ae. albopictus, respectively (Supplementary Table S5). The significant genomic variation found in introns and intergenic regions may affect the protein function via cis or Trans-activation pathways50. The increased number of polymorphisms found in non-coding regions could result from reduced natural selection pressure and/or domestication effects in these areas34,51. We have selected DNA polymorphism from the exonic region for further analysis. The functional impacts of the exonic polymorphisms were classified into four categories: high, moderate, low, and modifier (Supplementary Table S6). Among all variants, modifier variants occupied nearly 99%, followed by low-impact variants at 0.9% and 1.5%, moderate impact at 0.17% and 0.27%, and high-impact variants being the least common around 0.002% and 0.003% in Ae. aegypti and Ae. albopictus genomes, respectively. High-impact variants were associated with disruptions in functional proteins, including splice sites and start/stop codons. Moderate impacts included non-disruptive changes in protein function (missense mutation), while low impacts were primarily harmless (e.g., synonymous coding), and modifier impacts were observed in non-coding regions51,52,53. Frame-shift mutations were prevalent in the high-impact group, while missense mutations dominated the moderate-impact category. Synonymous mutations were most common in the low-impact group, while variation in intron regions was most common in the modifier group (Supplementary Table S7). Since high-impact variants perturb the function of protein-coding genes or their transcripts50, we investigated these variants among Ae. aegypti and Ae. albopictus. The high-impact variants include disruption of splicing sites, loss of translation start codon, introduction of a premature stop codon, and frameshift mutation54.
Genes carrying high-impact polymorphism and their functional relevance
Comparative analysis revealed a slightly higher number of genes carrying high-impact SNPs/InDels in Ae. albopictus compared to Ae. aegypti. We separated the complete polymorphism data into SNPs and InDels. In Ae. aegypti, we identified 1742 SNPs and 168 InDels while, in Ae. albopictus, 3175 SNPs, and 933 InDels were found to impact protein-coding genes. These SNPs included disruptions in splice sites, loss of translation initiation codons, the introduction of premature stop codons, and loss of stop codons (Supplementary Table S7). Notably, both genomes exhibited a higher occurrence of premature stop codons (stop-gained), followed by splice site donor and acceptor sub-type variants. The functional classification of variants using eggnog-mapper (COG database)55, revealed a large group of variants affecting genes involved in signal transduction, post-translational modification, protein turnover, chaperones, transcription, cytoskeleton, intracellular trafficking, secretion, vesicular transport, lipid transport, and metabolism (Fig. 2A, B, C).
To assess the impact of these variants on protein function, we filtered and determined the number of SNPs detected in each protein family (Pfam) utilizing the results from the COG analysis. Protein families with at least ten high-impact variants (HI-SNPs) were selected for further analysis (Fig. 3A, B). We identified five protein families with HI-SNPs such as Zinc Finger Domain (ZFD), Serine Protease (SP) and Serine protease inhibitors (Serpins), Protein Kinase Domain (PKD), and Leucine-rich Repeat Domain (LRD). Around 154 and 68 HI-SNPs were identified in ZFD of Ae. albopictus and Ae. aegypti, respectively. In the protease family (SP and Serpins), there were 143 and 43 HI-SNPs in Ae. albopictus and Ae. aegypti, respectively. Additionally, in PKD, we found 40 and 34 HI-SNPs, and in LRD, we found 37 and 33 HI-SNPs in Ae. albopictus and Ae. aegypti, respectively.
Some of the HI-SNPs have implications that might be significant in immunity, digestion of host blood, and developmental mechanisms in mosquitoes. For example, compared to other eukaryotes, mosquitoes, in particular, showed rapid expansion in ZFD. This may be true because ZFDs are the most common transcription factors, and their variations may affect their binding specificity to DNA56,57,58. Also, LRD's HI-SNPs are associated with innate immunity in Anopheles mosquitoes mainly linked to Plasmodium59,60. Moreover, transcriptome analysis of Ae. aegypti has shown that LRD-related transcripts are involved in immunity and response to arbovirus infections60,61.
Sequence variation in genes encoding serine proteases and their inhibitors
Serine protease polymorphisms are known to significantly impact mosquito immunity and digestion of host blood34,35,36,37. This observation was based on lab strains of Ae. aegypti, while there is no information available on the field strains of Ae. aegypti and Ae. albopictus34. We focused on polymorphisms related to SP and Serpins to investigate variants that are specific to Indian strains. Among the HI-SNPs, we have further characterized stop-gained (SGV) and frame-shift variants (FSV) in SP and Serpins. This is because SGV leads to pre-mature termination of protein translation, resulting in protein truncation. FSV can disrupt the reading frame of the genetic code, leading to altered amino acid sequences and degradation of the transcript54,62.
We identified 13 SGVs and two FSVs in 28 serine protease genes of Ae. aegypti, and 58 SGV and seven FSV in 74 serine protease genes of Ae. albopictus (Supplementary Table S8, S9). These SGVs highlight serine proteases' evolutionary significance and environmental adaptations34,63. According to SnpEff and COG classification, these genes belong to the S1 protease family under classes “O” and “E,” associated with amino-acid transport, metabolism, cellular processes, signaling, post-translational modification, and protein turnover55,64. Validation with NCBI and VectorBase confirmed sequence variations, domain regions, active sites, and mutation positions. Based on the presence or absence of the catalytic triad, serine protease-like genes are classified as serine proteases or serine protease homologs, respectively65,66. These proteases feature various domain modules, such as clip, complement control protein (CCP), low-density lipoprotein receptor class A (LDLA), myb/SANT-like (MADF), complement CUB, frizzled (FRI), and scavenger receptor Cys-rich (SR) domains66,67.
Among the SNP-bearing serine proteases in both Aedes genomes, a shared lineage includes trypsin-3-like, trypsin-4-like, chymotrypsin-like, snake-like serine protease, and transmembrane serine protease-9. These common proteases exhibit stop-gain variants in both Aedes genomes, likely terminating peptide synthesis. Furthermore, Ae. aegypti features polymorphism in some additional proteases, such as stubble-like and trypsin-5G1-like, while Ae. albopictus showcases melanization protease, polymerase type, easter-like, grass-like, prostate-like, masquerade-like, phenoloxidase activator-like, Persephone-like, and acrosin-like and other clip-domain serine proteases. These proteins play various roles such as immune Toll pathway modulation68,69,70, involvement in the melanization pathway [67, 68, 71 and 72], dorsal–ventral patterning68,73, wing development74, somatic muscle attachment68,75 and mosquito blood-feeding behavior68,76. The polymorphism present in these genes could impact its functionality. Hence, this findings call for detailed genome wide characterization, biochemical and expression studies on these serine proteases due to their crucial role in mosquito physiological process and association with insecticide resistance76,77,78.
Polymorphisms in CLIP B and D serine protease have been found in Aedes infected with Dengue and Rift Valley fever34,38, including in our data. Notably, CLIP-B is the major sub-family; members like CLIPB15, CLIPB9, and CLIPA14, alongside CLIPB8, participate in prophenoloxidase activation in the melanisation process and play roles in the Toll and IMD pathways of the innate immune response in Aedes mosquitoes79,80. In comparison to other insects, the CLIP domain serine protease genes have undergone expansion in both Ae. aegypti and Ae. albopictus genomes39,40. Therefore, polymorphisms in Indian field isolates of Aedes need further molecular and biochemical exploration of serine proteases.
We also identified high-impact polymorphisms in serpin proteins, a critical superfamily involved in various physiological functions across eukaryotes81,82. Despite their low number in insects, serpins play significant roles in development, immunity, reproduction, and blood feeding83. Our COG analysis classified these genes under the "V" class, associated with defense mechanisms (Fig. 2)55. We found several stop-gain and splice site polymorphisms in serpins of both Ae. aegypti and Ae. albopictus genomes (Supplementary Table S10, S11). Notably, polymorphism in SRPN27A, also identified in our study, is involved in innate immunity by inhibiting phenoloxidase activation and melanization reactions81. The SRPN27A also blocks the Toll and IMD pathways84. Additionally, we observed splice-site variants in Serpin-10 of Ae. albopictus, which may be involved in immunity82.
Both clip-domain serine proteases and serpins are large gene families showing recent diversification34. Despite the small sample size, we observed several HI-SNPs in the serine protease family, especially in clip-domain serine proteases, consistent with previous findings34. These genes maintain their regulatory regions while diversifying specificity by exhibiting several SNPs in the open-reading frame region34. The HI-SNPs in clip domain serine proteases could be validated and characterized in wild Aedes mosquito populations to establish their potential as molecular markers for the detection of viruses. Finally, serpins' anticoagulatory and immunosuppressive roles during mosquito blood feeding on hosts also highlights their potential in vaccine development83.
Even though we report here a high-coverage genomes of Indian-specific Aedes mosquitoes that are major vectors of many zoonotic diseases, it does have some limitations to the approach taken: (1) The limited sample number and sequencing pooled inbred lines might have missed some rare variants. However, the resulting data reflects the consensus sequence of the pooled lines, which does reflect Indian Aedes-specific variants. (2) The reference genome of Ae. aegypti is assembled at chromosome level while Ae. albopictus is assembled at the scaffold level. The scaffold-level assembly can be fragmented and needs to be more precise than chromosomal-level assembly, introducing errors during mapping. (3) We focused on common variants instead of rare ones identified by two variant callers (GATK and SAMtools-mpileup) to find Indian species-specific variants against reference genomes and filter false-positive variants. (4) While we have identified numerous SNPs, it is essential to acknowledge that with a limited number of samples and predominantly single locus mutations, we may not evaluate the significant importance of these mutations affecting gene function alone. Hence, further research is needed for comprehensive functional validation and understanding the impact of these polymorphisms on gene structure and function.
Conclusion
In this study, we successfully generated high-quality genomes of Indian Ae. aegypti and Ae. albopictus field isolates, significantly advancing our understanding of these vector species. We used re-sequencing techniques to determine large-scale genomic variants, such as SNPs and InDels, compared to their reference genomes. Among the repertoire of genes exhibiting high-impact SNPs, our focused scrutiny targeted serine protease genes and their inhibitors, serpin, which are involved in blood digestion and regulation of mosquito innate immune responses, respectively. The serpin majorly regulates specific serine proteases' activities in mosquitoes' innate immunity. Despite the ongoing functional characterization of the expanding clip domain, the Serine protease family remains to be characterized and explored, which could present novel insights into the population inhibition strategies against Aedes mosquitoes. Our genomic data revealed polymorphism within Indian Aedes species against reference genomes. This comprehensive understanding could pave the way for novel target gene identification in developing vector control measure.
Materials and methods
Sample collection and inbreeding
Madurai, situated in Tamil Nadu state, is one of India's significant dengue endemic regions10,11. Aedes aegypti was collected from Goripalayam and Ae. albopictus from Alagarkovil Hills in Madurai district. These wild-caught mosquitoes were stabilized through five generations for subsequent inbreeding experiments at the ICMR-Vector Control Research Centre (VCRC), field unit, Madurai. From the colony, 100 males and 100 females were collected to mate. After mating, the males were removed, and females were allowed to starve for three hrs, and then fed on chicken blood using blood feeding apparatus for 2 h85,86. Only ten full-fed females were collected and kept separately in 1 × 1 foot cages. Routine glucose feeding (5% glucose water-soaked cotton placed in a petri dish with five soaked raisins) was provided to the caged females. On the third day, a bowl with tap water with a paper strip was kept for egg laying. The paper strip was collected from the ten cages kept, and the egg numbers were counted. The eggs were kept for three days in dry condition at room temperature. After three days of incubation, the eggs were floated in a bowl containing clean tap water and observed under a stereo microscope for the emergence of larvae. The pupae were collected for adult emergence. This procedure was repeated to complete ten generations of inbreeding86.
DNA isolation and sequencing
Due to the small size of Aedes mosquitoes, there were various challenges in sequencing the genome from a single specimen. These challenges include issues with DNA quality and the possibility of needing more DNA for sequencing library preparation. To address these issues, we adopted a pragmatic strategy of pooling isogenic lines, as these lines possess identical genetic backgrounds for genome sequencing. Pooled isogenic lines represent identical clones, that increased the amount of total DNA obtained, provide high-coverage, and enhance the data quality, all reflecting a single specimen42. The DNA extraction involved combining 15 specimens from the isogenic lines to create a single sample of Ae. albopictus and Ae. aegypti each. Qiagen DNeasy Blood & Tissue Kit (Qiagen Inc., Valencia, CA, USA) was used for DNA extraction. The quality of DNA after extraction was analyzed by Qubit 2.0 Fluorometer (Invitrogen Life Technologies, Eugene, Oregon). Two tubes, each containing pooled DNA of Ae. aegypti and Ae. albopictus, were shipped in ice to Genotypic Technology Pvt. Ltd., India. The genome sequencing libraries were prepared using the Illumina-compatible NEXTfex rapid DNA sequencing kit (BIOO Scientifc, Inc. USA) at Genotypic Technology Pvt. Ltd. About 200 ng of DNA sample wes used for library preparation and sheared using Covaris S2 sonicator to produce fragments of size ranging from 250 to 600 bp. The fragment sizes were verified using an Agilent 2100 Bioanalyzer and purified using high prep magnetic beads from Magbio. The purified fragments underwent end-repair, adenylation, and ligation to Illumina universal multiplex barcode adaptors as per NEXTfex rapid DNA sequencing kit. After adaptor ligation followed by DNA purification, the fragments were subjected to 8 cycles of PCR amplification using Illumina-compatible primers mentioned in the sequencing kit. The conditions for library preparation and PCR are presented in Supplementary Table S12. The final PCR products were purified, and quality control was assessed. The adequate user-defined insert size ranged from 80 to 680 bp, making the libraries suitable for sequencing. The libraries were sequenced using the Illumina Hiseq XTen platform with paired-end reads of 150 base pairs.
Raw-reads processing
Raw read quality for Ae. aegypti and Ae. albopictus were analyzed using FastQC v0.11.5.87. Further processing was carried out using fastp (v0.23.2)88 to obtain high-quality, adapter-free reads with a minimum length of 50 bp and a minimum Phred quality score of 20. Filtered reads from the Illumina pipeline were processed using FastQC to confirm that no further filtration was needed. High-quality filtered reads from the Ae. aegypti and Ae. albopictus were mapped with their reference genomes of Ae. aegypti (NCBI Accession: GCF_002204515.2) and Ae. albopictus (NCBI Accession: GCF_006496715.1) respectively39,40 using Burrows-Wheeler Alignment (BWA) mem v0.7.1789 with default parameters. Picard tool, SortSam, was used to convert SAM files to BAM file format by sorting the alignment, and then MarkDuplicates (Picard module) was used to mark duplicate alignments in the BAM files90. Mapping statistics were obtained using SAMtool91. Picard's BuildBamIndex and CreateSequenceDictionary modules generated BAM index files (.bai), facilitating quick access to specific regions and dictionary files (.dict), providing metadata about the sequences for downstream analysis.
Variant calling
For variant calling, we used GATK v443 and SAMtools (bcftools/mpileup)44. In GATK, variant calling was performed using the HaplotypeCaller, where variants were filtered using the following parameters, “QD < 2.0, FS > 60.0, MQ < 40.0, DP < 10, MQRankSum < -12.5, ReadPosRankSum < -8.0”. To obtain high-confidence variants (SNPs and InDels), these filtered GATK variants were compared to SAMtools/mpileup variants, and a standard set of variants were identified using bcftools (sec). These common variants were considered final variants (VCF files). The commands used for data analysis are mentioned in the supplementary information file.
Variant annotation and COG analysis
The identified genomic variants were analyzed using the SnpEff 4.364 to infer functional annotation, classify the genomic variants (SNPs and InDels), and their potential effect on protein structure. A SnpEff predictor database file in binary format (.bin) was created using reference genomes to locate each SNP within annotated transcripts or intronic regions using SnpEff (v.4.3) with default parameters. Different features of variants on the genome, such as transitions/transversions ratio (Ts/Tv), gene ID, gene name, and the impact of each variant, were generated in the output file. The variants were classified based on variant impact on protein function, such as modifier impact, moderate impact, low impact, and high impact types52. Further, functional categorization of genes showing DNA polymorphisms (SNPs and InDels) was assigned to perform cluster of orthologous groups (COGs) using eggNOG-mapper v.555 based on the sequence homology approach.
Submission of nucleotide sequence and its accession number
The genomes of Ae. aegypti and Ae. albopictus have been deposited at GenBank under the Biosample accession numbers SAMN40183392 and SAMN40183418 and SRA accession numbers SRR28209348 and SRR28209347, respectively.
Data availability
Genome sequence data generated in this manuscript is submitted in NCBI database under SRA accession numbers SRR28209348 and SRR28209347.
Change history
30 January 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41598-025-86326-y
References
Egid, B. R. et al. Review of the ecology and behaviour of Aedes aegypti and Aedes albopictus in Western Africa and implications for vector control. Curr. Res. Parasitol. Vector-Borne Dis. 2, 100074 (2022).
Soni, S. et al. Dengue, Chikungunya, and Zika: The causes and threats of emerging and re-emerging arboviral diseases. Cureus 15(7), e41717 (2023).
Dengue and severe dengue. World health organization website. Dengue and severe dengue (who.int)
Bhatt, S. et al. The global distribution and burden of dengue. Nature 496(7446), 504–507 (2013).
Dengue situation India, National center for vector borne disease control. Website. DENGUE SITUATION IN INDIA: National Center for Vector Borne Diseases Control (NCVBDC) (mohfw.gov.in).
Banerjee, I. & Robinson, J. Dengue on the rise 2022–2023: A warning for Southern Asia. Res. Dev. Med. Med. Sci. 1, 153–163 (2023).
Chennai News, Times of India Tamil Nadu reports 922 dengue cases, 1 death in 15 days (2024).
Nisha, R. R. & Saravanabavan, V. Dengue vector breeding ecology in Madurai district: Heat map cluster analysis. Int. J. Mosq. Res. 8(1, Part B), 95–104 (2021).
Saravanabavan, V., Balaji, D. & Preethi, S. Identification of dengue risk zone: A geo-medical study on Madurai city. GeoJournal 84, 1073–1087 (2019).
Balaji, D. & Saravanabavan, V. Geo spatial variation of dengue risk zone in Madurai city using autocorrelation techniques. GeoJournal 86(3), 1481–1501 (2021).
Kumar, N. P. et al. Morphological and molecular characterization of Aedes aegypti variant collected from Tamil Nadu, India. J. Vector Borne Dis. 59(1), 22–28 (2022).
Gupta, B. et al. Genetic diversity of Aedes aegypti (Diptera: Culicidae) in rural and urban settings in Tamil Nadu, India. Entomon 46(1), 73–80 (2021).
Srivastava, N. N. et al. Dengue virus serotypes circulating among Aedes mosquitoes in the Lucknow district of North India: Molecular identification and characterization. J. Pure Appl. Microbiol. 17(2), 1141 (2023).
Sarma, D. K. Molecular surveillance of dengue virus in field-collected Aedes mosquitoes from Bhopal, central India: evidence of circulation of a new lineage of serotype 2. Front. Microbiol. 14, 1260812 (2023).
Jangir, P. K. & Prasad, A. Spatial distribution of insecticide resistance and susceptibility in Aedes aegypti and Aedes albopictus in India. Int. J. Trop. Insect Sci. 42(2), 1019–1044 (2022).
Sumitha, M. K. et al. Genetic differentiation among Aedes aegypti populations from different eco-geographical zones of India. PLOS Negl. Trop. Dis. 17(7), e0011486 (2023).
Longbottom, J. et al. Aedes albopictus invasion across Africa: The time is now for cross-country collaboration and control. Lancet Global Health 11(4), e623–e628 (2023).
Powell, J. R. Mosquitoes on the move. Science 354, 971–927 (2016).
Gloria-Soria, A. et al. Global genetic diversity of Aedes aegypti. Mol. Ecol. 25(21), 5377–5395 (2016).
Powell, J. R. & Tabachnick, W. J. History of domestication and spread of Aedes aegypti-a review. Memórias do Instituto Oswaldo Cruz 108, 11–17 (2013).
Battaglia, V. et al. The worldwide spread of Aedes albopictus: New insights from mitogenomes. Front. Genetics 13, 931163 (2022).
Delatte, H. et al. Geographic distribution and developmental sites of Aedes albopictus (Diptera: Culicidae) during a Chikungunya epidemic event. Vector-Borne Zoonotic Dis. 8(1), 25–34 (2008).
Ahebwa, A., Hii, J., Neoh, K. B. & Chareonviriyaphap, T. Aedes aegypti and ae. Albopictus (Diptera: Culicidae) ecology, biology, behaviour, and implications on arbovirus transmission in Thailand. One Health 16, 100555 (2023).
Kraemer, M. U. Past and future spread of the arbovirus vectors Aedes aegypti and Aedes albopictus. Nat. Microbiol. 4(5), 854–863 (2019).
Reinhold, J. M., Lazzari, C. R. & Lahondère, C. Effects of the environmental temperature on Aedes aegypti and Aedes albopictus mosquitoes: A review. Insects 9(4), 158 (2018).
Mondal, N. The resurgence of dengue epidemic and climate change in India. Lancet 401(10378), 727–728 (2023).
Guideline for Integrated vector management for Aedes mosquito control. NVBDCP, India, 79767166351454408152_0.pdf (mohfw.gov.in)
Baig, M. M. et al. Susceptibility status of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse)(Diptera: Culicidae) to insecticides in Southern Odisha, India. Int. J. Mosq. Res. 8, 10–15 (2021).
Sumitha, M. K. et al. Status of insecticide resistance in the dengue vector Aedes aegypti in India: A review. J. Vector Borne Dis. 60(2), 116–124 (2023).
Soares, T. S., Torquato, R. J. S., Lemos, F. J. A. & Tanaka, A. S. Selective inhibitors of digestive enzymes from Aedes aegypti larvae identified by phage display. Insect Biochem. Mol. Biol. 43(1), 9–16 (2013).
Lee, Y. et al. Genome-wide divergence among invasive populations of Aedes aegypti in California. BMC Genomics 20(1), 1–10 (2019).
Schmidt, T. L. et al. Genome-wide SNPs reveal the drivers of gene flow in an urban population of the Asian Tiger Mosquito Aedes albopictus. PLOS Negl. Trop. Dis. 11(10), e0006009 (2017).
Rašić, G., Filipović, I., Weeks, A. R. & Hoffmann, A. A. Genome-wide SNPs lead to strong signals of geographic structure and relatedness patterns in the major arbovirus vector, Aedes aegypti. BMC Genomics 15, 1–12 (2014).
Bonizzoni, M. et al. Probing functional polymorphisms in the dengue vector Aedes aegypti. BMC Genomics 14, 1–10 (2013).
Jagdale, S., Bansode, S. & Joshi, R. Insect Proteases: Structural-Functional Outlook. In Proteases in Physiology and Pathology 451–473 (Springer, 2017).
Santiago, P. B. et al. Proteases of haematophagous arthropod vectors are involved in blood-feeding, yolk formation and immunity-a review. Parasit. Vect. 10, 1–20 (2017).
de Oliveira, S. P. et al. Wolbachia infection in Aedes aegypti mosquitoes alters blood meal excretion and delays oviposition without affecting trypsin activity. Insect Biochem. Mol. Biol. 87, 65–74 (2017).
Licciardi, S. et al. In vitro shared transcriptomic responses of Aedes aegypti to arboviral infections: Example of dengue and Rift Valley fever viruses. Parasit. Vect. 13, 1–10 (2020).
Matthews, B. J. et al. Improved reference genome of Aedes aegypti informs arbovirus vector control. Nature 563(7732), 501–507 (2018).
Palatini, U. et al. Improved reference genome of the arboviral vector Aedes albopictus. Genome Biol. 21(1), 1–29 (2020).
Bernard, V. et al., Whole Genome Sequences of Aedes aegypti (Linn.) Field Isolates from Southern India. Preprint at Whole Genome Sequences of Aedes aegypti (Linn.) Field Isolates from Southern India|bioRxiv (2020)
Anand, S. Next generation sequencing of pooled samples: guideline for variants’ filtering. Sci. Rep. 6(1), 33735 (2016).
Franke, K. R. & Crowgey, E. L. Accelerating next generation sequencing data analysis: An evaluation of optimized best practices for genome analysis toolkit algorithms. Genomics Inf. 18(1), e10 (2020).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10(2), giab008 (2021).
Babbitt, G. A. & Schulze, K. V. Codons support the maintenance of intrinsic DNA polymer flexibility over evolutionary timescales. Genome Biol. Evol. 4(9), 954–965 (2012).
Stoltzfus, A. & Norris, R. W. On the causes of evolutionary transition: Transversion bias. Mol. Biol. Evol. 33(3), 595–602 (2016).
Brookes, A. J. The essence of SNPs. Gene 234(2), 177–186 (1999).
Moriyama, E. N. & Powell, J. R. Intraspecific nuclear DNA variation in drosophila. Mol. Biol. Evol. 13(1), 261–277 (1996).
Wondji, C. S., Hemingway, J. & Ranson, H. Identification and analysis of single nucleotide polymorphisms (SNPs) in the mosquito Anopheles funestus, malaria vector. BMC Genomics 8, 1–13 (2007).
Shameer, K. et al. Interpreting functional effects of coding variants: Challenges in proteome-scale prediction, annotation and assessment. Briefings Bioinf. 17(5), 841–862 (2016).
Barreiro, L. B., Laval, G., Quach, H., Patin, E. & Quintana-Murci, L. Natural selection has driven population differentiation in modern humans. Nat. Genetics 40(3), 340–345 (2008).
Jain, M., Moharana, K. C., Shankar, R., Kumari, R. & Garg, R. Genomewide discovery of DNA polymorphisms in rice cultivars with contrasting drought and salinity stress response and their functional relevance. Plant Biotechnol. J. 12(2), 253–264 (2014).
Rajkumar, M. S., Garg, R. & Jain, M. Genome-wide discovery of DNA polymorphisms among chickpea cultivars with contrasting seed size/weight and their functional relevance. Sci. Rep. 8(1), 16795 (2018).
Rausell, A. et al. Analysis of stop-gain and frameshift variants in human innate immunity genes. PLoS Comput. Biol. 10(7), e1003757 (2014).
Huerta-Cepas, J. et al. eggNOG 5.0: A hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucl. Acids Res. 47(D1), D309–D314 (2019).
Ladomery, M. & Dellaire, G. Multifunctional zinc finger proteins in development and disease. Ann. Hum. Genetics 66(5–6), 331–342 (2002).
Lockwood, S. H. et al. The functional significance of common polymorphisms in zinc finger transcription factors. G3 Genes Genomes Genetics 4(9), 1647–1655 (2014).
Fedotova, A. A., Bonchuk, A. N., Mogila, V. A. & Georgiev, P. G. C2H2 zinc finger proteins: the largest but poorly explored family of higher eukaryotic transcription factors. Acta Naturae 33, 47–58 (2017).
Povelones, M., Waterhouse, R. M., Kafatos, F. C. & Christophides, G. K. Leucine-rich repeat protein complex activates mosquito complement in defense against Plasmodium parasites. Science 324(5924), 258–261 (2009).
Upton, L. M., Povelones, M. & Christophides, G. K. Anopheles gambiae blood feeding initiates an anticipatory defense response to Plasmodium berghei. J. Innate Immun. 7(1), 74–86 (2014).
Tibebu, H., Povelones, M., Blagborough, A. M. & Christophides, G. K. Transmission blocking immunity in the malaria non-vector mosquito Anopheles quadriannulatus species A. PLoS Pathogens 4(5), 0070 (2008).
Nagy, E. & Maquat, L. E. A rule for termination-codon position within intron-containing genes: When nonsense affects RNA abundance. Trends Biochem. Sci. 23(6), 198–199 (1998).
Brenner, S., Stretton, A. O. W. & Kaplan, S. Genetic code: the ‘nonsense’triplets for chain termination and their suppression. Nature 206(4988), 994–998 (1965).
Cingolani, P. et al., A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. fly, 6(2), 80–92 (2012).
Bao, Y. Y. et al. Genomic insights into the serine protease gene family and expression profile analysis in the planthopper Nilaparvata lugens. BMC Genomics 15(1), 1–17 (2014).
Yang, L. et al. The genomic and transcriptomic analyses of serine proteases and their homologs in an endoparasitoid Pteromalus puparum. Dev. Comp. Immunol. 77, 56–68 (2017).
Phillips, D. R. & Clark, K. D. Bombyx mori and Aedes aegypti form multi-functional immune complexes that integrate pattern recognition, melanization, coagulants, and hemocyte recruitment. PLoS One 12(2), e0171447 (2017).
Veillard, F., Troxler, L. & Reichhart, J. M. Drosophila melanogaster clip-domain serine proteases: Structure, function and regulation. Biochimie 122, 255–269 (2016).
Gorman, M. J. & Paskewitz, S. M. Serine proteases as mediators of mosquito immune responses. Insect Biochem. Mol. Biol. 31(3), 257–262 (2001).
Kambris, Z. et al. Drosophila immunity: A large-scale in vivo RNAi screen identifies five serine proteases required for Toll activation. Curr. Biol. 16(8), 808–813 (2006).
Castillejo-López, C. & Häcker, U. The serine protease Sp7 is expressed in blood cells and regulates the melanization reaction in Drosophila. Biochem. Biophys. Res. Commun. 338(2), 1075–1082 (2005).
Tang, H., Kambris, Z., Lemaitre, B. & Hashimoto, C. Two proteases defining a melanization cascade in the immune system of Drosophila. J. Biol. Chem. 281(38), 28097–28104 (2006).
Stein, D., Roth, S., Vogelsang, E. & Nu, C. The polarity of the dorsoventral axis in the Drosophila embryo is defined by an extracellular signal. Cell 65(5), 725–735 (1991).
Ibrahim, D. M., Biehs, B., Kornberg, T. B. & Klebes, A. Microarray comparison of anterior and posterior Drosophila wing imaginal disc cells identifies novel wing genes. G3 Genes Genomes Genetics 3(8), 1353–1362 (2013).
Murugasu-Oei, B., Rodrigues, V., Yang, X. & Chia, W. Masquerade: a novel secreted serine protease-like molecule is required for somatic muscle attachment in the Drosophila embryo. Genes Dev 9(2), 139–154 (1995).
Soares, T. S., Watanabe, R. M., Lemos, F. J. & Tanaka, A. S. Molecular characterization of genes encoding trypsin-like enzymes from Aedes aegypti larvae and identification of digestive enzymes. Gene 489(2), 70–75 (2011).
Brackney, D. E., Foy, B. D. & Olson, K. E. The effects of midgut serine proteases on dengue virus type 2 infectivity of Aedes aegypti. Am. J. Trop. Med. Hyg. 79(2), 267 (2008).
Li, X.-Y., Si, F.-L., Zhang, X.-X., Zhang, Y.-J. & Chen, B. Characteristics of Trypsin genes and their roles in insecticide resistance based on omics and functional analyses in the malaria vector Anopheles sinensis. Pest. Biochem. Physiol. 201, 105883 (2024).
Ji, Y., Lu, T., Zou, Z. & Wang, Y. Aedes aegypti CLIPB9 activates prophenoloxidase-3 in the presence of CLIPA14 after fungal infection. Front. Immunol. 13, 927322 (2022).
Wang, H. C., Wang, Q. H., Bhowmick, B., Li, Y. X. & Han, Q. Functional characterization of two clip domain serine proteases in innate immune responses of Aedes aegypti. Parasit. Vect. 14(1), 1–13 (2021).
Shakeel, M., Xu, X., De Mandal, S. & Jin, F. Role of serine protease inhibitors in insect-host-pathogen interactions. Arch. Insect Biochem. Physiol. 102(3), e21556 (2019).
Gulley, M. M., Zhang, X. & Michel, K. The roles of serpins in mosquito immunology and physiology. J. Insect Physiol. 59(2), 138–147 (2013).
Meekins, D. A., Kanost, M. R. & Michel, K. Serpins in arthropod biology. Sem. Cell Dev. Biol. 62, 105–119 (2017).
Ligoxygakis, P. et al. A serpin mutant links Toll activation to melanization in the host defence of Drosophila. EMBO J. 21, 6330 (2002).
Jong, Z. W., Kassim, N. F. A., Naziri, M. A. & Webb, C. E. The effect of inbreeding and larval feeding regime on immature development of Aedes albopictus. J. Vect. Ecol. 42(1), 105–112 (2017).
Koenraadt, C. J., Kormaksson, M. & Harrington, L. C. Effects of inbreeding and genetic modification on Aedes aegypti larval competition and adult energy reserves. Parasit. Vect. 3, 1–11 (2010).
Lo, C. C. & Chain, P. S. Rapid evaluation and quality control of next generation sequencing data with FaQCs. BMC Bioinf. 15(1), 1–8 (2014).
Hen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34(17), 884–890 (2018).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26(5), 589–595 (2010).
Picard. http://broadinstitute.github.io/picard/. Accessed 30 Nov 2015
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25(16), 2078–2079 (2009).
Acknowledgements
The work was initiated when U.S.S. and AD were working at the-then Centre for Research in Medical Entomology (now re-designated as a field unit of the Vector Control Research Centre, Puducherry), Madurai, Tamil Nadu, India during 2016-2017 in collaboration with V.R. and M.M. We thank the-then Director General, Indian Council of Medical Research for intramural funding for genome sequencing, and encouragements. We thank Genotypic Pvt. Ltd, Bangalore, India for sequencing services and technical assistance in this project, and Dr. Rajesh Gazra for his help with data analysis.
Funding
ICMR intramural fund was utilised for genome sequencing in this study.
Author information
Authors and Affiliations
Contributions
A.D and U.S.S conceptualized the study. M.M, V.R., U.S.S and P.A. performed laboratory experiments. P.A. performed data analysis and generated all the figures and tables. P.A. and U.S.S wrote the manuscript. A.D, B.N, M.M., V.R. and U.S.S. revised and approved the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original online version of this Article was revised: The original version of this Article contained an error in the Data availability section, “Genome sequence data generated in this manuscript is submitted in GenBank database under SRA accession numbers SRR28209348 and SRR28209347.” now reads: “Genome sequence data generated in this manuscript is submitted in NCBI database under SRA accession numbers SRR28209348 and SRR28209347.”
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Acharya, P., Singh, U.S., Rajamannar, V. et al. Genome resequencing and genome-wide polymorphisms in mosquito vectors Aedes aegypti and Aedes albopictus from south India. Sci Rep 14, 22931 (2024). https://doi.org/10.1038/s41598-024-71484-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-71484-2