Introduction

In the Indian subcontinent, sand flies belong to two genera, Phlebotomus and Sergentomyia, with more than 70 species identified1,2. Of these, only one species, Phlebotomus (Euphlebotomus) argentipes (Annandale and Brunetti, 1908) sensu lato, is incriminated as a main vector of the Leishmania donovani complex (Kinetoplastida: Trypanosomatidae) and is responsible for the spread of anthroponotic visceral leishmaniasis (VL), also known as “kala-azar” in Nepal, India and Bangladesh3,4,5,6. This same vector-parasite duo is implicated in transmitting cutaneous leishmaniasis (CL) in the Western Ghats of India7 and Sri Lanka8. Other phlebotomine species, Ph. (Larrousius) major (Annandale, 1910) s.l. and Ph. (Adlerius) longiductus (Parrot, 1928) were recently identified as suspected vectors of L. donovani complex, causing CL in northwestern India bordering the far western region of Nepal9. It is noteworthy that CL has been emerging from the same region of Nepal in recent years10,11 and an in-depth investigation into epidemiology, serology, and entomology to assess the local transmission is currently underway. Other vector species transmitting the Leishmania parasite in the Indian sub-continent are Ph. (Phlebotomus) papatasi (Scopoli, 1786), Ph. (Phlebotomus) salehi (Mesghali, 1965) and Ph. (Paraphlebotomus) sergenti (Parrot, 1917). The former two species transmit L. major, the causative agent of zoonotic cutaneous leishmaniasis (ZCL) and the latter transmits L. tropica causing ZCL in arid parts of Northwest India12,13,14. Given the diversity in vector and parasites in the region, we aimed to collect up-to-date information on the distribution of phlebotomine sand flies, with a focus on Leishmania vectors, across a wide range of climatic and ecological settings in Nepal, regardless of their endemicity status.

In the context of the Indian subcontinent, VL has been slated for elimination as a public health problem by lowering the disease incidence to less than one case in 10,000 at the district level in Nepal, and at the sub-district level in India and Bangladesh by 2026 (most likely to be extended to 2030)15,16. Until 2019, VL was endemic in 18 out of 77 districts in Nepal, where local transmission occurred with reports of autochthonous cases; with records of asymptomatic infection in humans, and with the presence of a competent vector population17. In the same year, 53 non-endemic districts were labeled as endemicity doubtful districts due to the presence of VL cases but without evidence of local transmission18,19,20. The trend in the geographical expansion of VL cases has been observed from eastern to western parts and from lower (< 600 m asl) to higher altitudes (> 2000 m asl), particularly in areas that otherwise were considered ecologically unfavorable for the survival of the known vector species, Ph. argentipes21. Currently, VL cases have not been reported from only five of the 77 districts22,23. Further, the co-existence of VL along with CL at higher altitudes (> 1000 m asl) in hilly and mountainous areas poses an additional threat to the national VL elimination program, as the vector and parasite species present in the local human population remain unexplored11,24. Hence, integrated surveillance (disease, parasite and vectors) to monitor the circulating vectors and parasites in broader areas encompassing various ecological regions is deemed essential for planning and implementing tailored interventions with disease and vector control measures.

Focusing on vector surveillance, which integrates the process of collection, identification and reporting of sand fly species of public health importance, is key to the prospective entomological research aiming at controlling leishmaniasis25. Conventional approaches for the species identification of the phlebotomine sand flies are labor-intensive and time-consuming. They are based on minute morphological and anatomical characteristics that require skilled taxonomists26. Additionally, species complexes and phenotypic plasticity complicate morphological identifications. To overcome these difficulties, integrative taxonomic approaches, including morphology as well as the use of molecular tools such as DNA barcoding, are promising for species identification27,28. To this end, mitochondrial DNA genes display interesting features, especially the cytochrome c oxidase subunit I (COI) gene, which is extensively used in delineating sand fly species complexes worldwide29 as well as in Southeast Asia30,31,32,33,34,35,36.

In this study, we updated the geo-ecological distribution of phlebotomine sand fly species, including the known vector of the L. donovani, potential vectors and other non-vectors in Nepal, aimed to provide significant baseline information to the national VL elimination program. We also assessed the use of the DNA barcoding method as a complementary tool for sand fly species identification and evaluated the genetic variation within and among the Phlebotomus species collected across Nepal.

Results

Distribution and diversity of sand flies in surveyed districts

Based on ecological region classification, the 43 districts included in this study comprised 14 in lowlands, 22 in hills and 7 in mountainous regions (Fig. 1). Altitudes of the collection sites ranged from 70 to 308 m in lowlands, 364–1680 m in hills and 1182–2960 m in mountainous districts.

Fig. 1
figure 1

Map of Nepal with sampling locations of sand flies and their ecological regions. VL endemicity status at the district level based on 2016–2019 data17. The map was produced with QGIS (version 3.36.3) with open access shapefile downloaded from https://opendatanepal.com/dataset/new-political-and-administrative-boundaries-shapefile-of-nepal#.

A total of 8,132 sand flies were collected from all the surveyed districts. The known vector Ph. argentipes was recorded from all except three districts (Fig. 2a). These three districts were located in the mountainous region. Phlebotomus argentipes represented 45.18% of the total collection, followed by Ph. major s.l. (10.85%) and Ph. (Adlerius) spp. (9.49%). Other species were Ph. papatasi and Sergentomyia spp. from lowlands and hills (Fig. 2b, Supplementary Excel file S1). Ecological regions imposed a significant effect on sand fly abundance. Abundance of all sand fly species and Ph. argentipes per district were lower in hills (IRR = 0.49, CI at 95% = 0.24–0.94 and IRR = 0.22, CI at 95% = 0.10–0.45, respectively) and mountains (IRR = 0.47, CI at 95% = 0.20–1.26 and IRR = 0.06, CI at 95% = 0.02–0.18, respectively) as compared to lowlands. Results also indicate the higher abundance of Ph. (Adlerius) spp. and Ph. major s.l. in the mountains (IRR = 4.76, CI at 95% = 1.45–20.94 and IRR = 1.88, CI at 95% = 0.80–5.09, respectively) as compared to hills. There were negligible collections of these two species from the lowlands (Fig. 2b).

Fig. 2
figure 2

Diversity and distribution of sand flies in Nepal; (a) in 43 surveyed districts and (b) in three ecological regions of the country based on the cross-sectional entomological survey conducted from 2017 to 2022. The map was produced with QGIS (version 3.36.3) with open access shapefile downloaded from https://opendatanepal.com/dataset/new-political-and-administrative-boundaries-shapefile-of-nepal#.

DNA-based species identifications

PCR and Sanger sequencing were successful for all 316 sand flies except one. We succeeded in the identification of two genera (Phlebotomus and Sergentomyia), seven subgenera (Euphlebotomus, Phlebotomus, Larroussius, Adlerius, Parrotomyia, Neophlebotomus, Sergentomyia) and six species (Ph. argentipes, Ph. papatasi, Se. babu, Se. iyengari, Se. punjabensis, Se. bailyi) based on morphology and the query results of the generated COI sequences against the available sequences in the BOLD and GenBank reference databases. However, 11 specimens of Ph. major s.l., a morphologically confirmed species, and six specimens of subgenus Adlerius were poorly matched (91.24% − 92.49%) with the available reference sequences in the BOLD/GenBank online databases. Species-level identification of one Phlebotomus and four Sergentomyia specimens remained inconclusive both by morphology and sequence analysis. Overall, 96.5% (304/315) of the sand fly specimens were successfully identified based on morphological characteristics, while 93% (293/315) had more than 96% pairwise identity match with the reference sequences available in the open-access databases (Table 1).

Table 1 District-wise sand fly species identification based on morphological characteristics and COI sequences similarity with reference to BOLD and GenBank databases.

Among the generated sequences, the majority (84.4%; 266/315) were extracted from female sand flies, of which 207 were from the Ph. argentipes, the primary vector species of L. donovani (Table 1).

Genetic diversity estimates

In 315 successfully generated sequences of phlebotomine sand flies (both Phlebotomus and Sergentomyia), 101 haplotypes were described, with a haplotype diversity of 0.933 ± 0.008, a nucleotide diversity (Pi) of 0.078 ± 0.006, an average number of nucleotide differences of 49.99, and parsimony informative sites of 228 (with two variants – 137, three variants – 74 and four variants – 17). Haplotype and nucleotide diversities of Ph. argentipes, Ph. papatasi, Ph. (Adlerius) sp., Ph. major s.l., Se. babu, Se. iyengari, Se. punjabensis, Se. bailyi and Se. (Un3) sp. were 0.88 and 0.004, 0.98 and 0.007, 0.73 and 0.008, 0.93 and 0.007, 0.82 and 0.010, 1.00 and 0.024, 1.00 and 0.014, 1.00 and 0.015, and 1.00 and 0.003, respectively. The average number of nucleotide differences (k) for genus Phlebotomus was highest in Ph. (Adlerius) sp. (k = 5.13) followed by Ph. papatasi (k = 4.78), P. major s.l (k = 4.40) and Ph. argentipes (k = 2.78). The “k” value in the genus Sergentomyia was highest in Se. iyengari (k = 15.60) followed by Se. bailyi (k = 9.33), Se. punjabensis (k = 9.07), Se. babu (k = 6.44) and Se. (Un3) sp. (k = 2). The genetic diversity in individual phlebotomine species at the collection site (district) is shown in Supplementary Table S2.

Species identification efficiency and barcoding gap analysis

The species identification success rate of generated COI barcode sequences (n = 315) based on the “Best Close Match” was 97% (305/315). The overall mean genetic distance within our database of generated sequences was 8.89% ± 0.66% and the maximum pairwise K2P distance was 24.25% ± 2.18% (Supplementary Excel file S2). The mean intraspecific K2P distance within our database of generated sequences ranged from 0.31% ± 0.22% in Se. (Un3) sp. to 2.51% ± 0.42% in Se. iyengari (Table 2). All the species in this study showed a relatively large genetic variation (deep intraspecific divergence; mean K2P distance > 0.25), while interspecific divergence ranged from 12.23% ± 1.39% (Se. babu and Se. punjabensis) to 23.45% ± 2.06% (Ph. argentipes and Se. (Un1) sp.) (Supplementary Excel file S3).

Table 2 List of sand fly species barcoded in this study and retrieved from the open-access repositories (Old World phlebotomine vectors), including their pairwise intra and nearest interspecies distances computed using the Kimura-2- parameter (K2P).

Excluding sequences of Sergentomyia but including Old World vector phlebotomine sand flies sequences mined from online repositories to the database (n = 1,400), the overall average divergence was 14.90% ± 1.05% and the maximum pairwise K2P distance was 24.95% ± 2.36% (Supplementary Excel file S4). Mean intraspecific genetic divergence was the lowest in Ph. transcaucasicus (0.08% ± 0.07%) and the highest in Ph. major (5.90% ± 0.67%) (Table 2). The lowest value of pairwise mean genetic divergence was assessed to identify the nearest neighbor among the species: Ph. perfiliewi and Ph. transcaucasicus were the closest species with a mean interspecific K2P distance of 2.26% ± 0.36%; the nearest species to Ph. argentipes was the unidentified Phlebotomus sp. with a mean interspecific K2P distance of 16.51% ± 1.71% (Table 2, Supplementary Excel file S5). The pairwise inter- and intraspecific divergence is presented as a boxplot (Fig. 3).

Fig. 3
figure 3

Boxplot displaying inter- and intraspecific genetic distances of 1,400 COI sequences (BOLD, GenBank and generated), including 27 vector species of Old World phlebotomine sand flies (Phlebotomus). The calculation was based on the K2P nucleotide substitution model, using the package Spider and the software R. Thick horizontal lines inside the boxes represent the median and vertical dashed lines show the range. The boxes themselves represent the upper and lower quartiles. Outliers are displayed as open circles.

Neighbor-Joining tree

Neighbor-Joining tree shows the distinct branching in the two genera and 12 species of sand flies collected in Nepal (Fig. 4). The clustered nodes of the sequences from individuals of the same species were supported by high bootstrap values (99% – 100%). Based on the NJ tree, the unspecified Phlebotomus species clustered closely with Ph. argentipes, supported by an 89% bootstrap value.

Fig. 4
figure 4

Bootstrapped Neighbor-Joining (NJ) tree including 101 haplotypes (639 bp) representing 12 taxa of phlebotomine sand flies collected in Nepal (1,000 replications; Kimura two-parameter distances).

Additionally, most haplotypes from particular species or species complexes clustered together on the NJ tree constructed based on the database of all Phlebotomus vector species reported from Old World countries. Exceptions are sequences of Ph. major s.l. from Nepal, Jordan and Turkey, forming three supported clusters. Sequences of Ph. bergeroti and Ph. papatasi are divided into two highly supported clusters (Fig. 5, Supplementary PDF file S1).

Fig. 5
figure 5

Bootstrapped Neighbor-Joining (NJ) tree including 616 haplotypes (639 bp) representing 36 taxa of genus Phlebotomus and one outgroup of genus Aedes (1,000 replications; Kimura two-parameter distances).

Haplotype and nucleotide diversities, and distribution of Ph. argentipes

Based on the 231 generated sequences of Ph. argentipes from six districts in Nepal, we found 36 haplotypes, with a haplotype diversity of 0.876 ± 0.013, a nucleotide diversity of 0.0043 ± 0.0002, and an average number of nucleotide differences of 2.78 (Supplementary Table S2). The dataset included 34 variable sites (11 singletons and 23 parsimony informative sites) (Supplementary Fig. S1). Among these haplotypes, H_2 was the most frequent (n = 62), followed by H_6 (n = 38) and H_4 (n = 30) (Supplementary Table S3). Similarly, the district-wise distribution of these haplotypes is shown in Supplementary Table S4.

Median-joining analysis

Of 422 sequences of Ph. argentipes (231 generated and 191 mined from BOLD and GenBank), 82 haplotypes were identified and included in the median-joining analysis. Haplotype H_2 was the most frequent one (n = 179), occurring in the Morang, Sunsari and Saptari districts of Nepal, Bihar, West Bengal, Kerala and Pondicherry regions in India and Delft island in Sri Lanka. Haplotype H_6 was the second most frequent (n = 38) and occurred in three districts of Nepal. Haplotype H_4 (n = 32) was present in five districts in Nepal, and Bihar and Kerala states in India. Less frequent haplotypes in terminal nodes represented more recently derived ones (Fig. 6). Locations in Nepal, India and Sri Lanka where COI sequences of Ph. argentipes were generated, or available from previous work (BOLD, GenBank) are shown in Supplementary Fig. S2.

Fig. 6
figure 6

Median-joining network of Ph. argentipes (n = 422) from Nepal, India, Sri Lanka and Israel (i.e., for which sequences were available from online repositories) showing genetic relationships among COI haplotypes. The sizes of circles are proportional to haplotype frequency and vertical lines are proportional to the number of nucleotide substitutions separating the connected haplotypes. Circles were colored according to the geographical origin of the barcoded specimens (i.e., district and country, as displayed in the figure legend).

Discussion

The current study documents the distribution of the primary competent vector of L. donovani, Ph. argentipes, in most of the surveyed districts with reported VL cases in Nepal. Other potential vectors, Ph. (Adlerius) spp. and Ph. major s.l., were abundant in the Himalayan foothills (hills and mountainous regions). The DNA barcoding method successfully allocated sand flies to seven morphologically validated species, while five taxa were identified up to the genus level. Our results provide strong evidence supporting the DNA barcoding as a complementary method for identification of major vector species of L. donovani, Ph. argentipes. This is critically important for vector surveillance and control efforts aligned with sustaining the VL elimination in Nepal.

COI sequences of Ph. argentipes, Ph. papatasi, Se. babu, Se. iyengari, Se. punjabensis and Se. bailyi had high pairwise identity (97–100%) with available COI sequences of respective species in DNA online repositories. Phlebotomus major s.l. collected from Nepal had poor pairwise identity match with available sequences, but were morphologically confirmed. Such variation might result from genetic differentiation linked, for example, to isolation by distance, leading to the formation of distinct geographical clusters37,38. The other five taxa have poor consensus with the available sequences, showing the limitation of the online databases, which are resourced with the reference sequences of only 20% of the described sand fly species worldwide39. Also, the morphological identification of these remained unsuccessful, possibly due to insufficiently mounted specimens or poorly established keys, or complicated anatomical characteristics that were difficult to interpret.

In the Neighbor-Joining tree, haplotypes of Ph. argentipes as well as other phlebotomine species were grouped with high bootstrap values32,36,40. Congeneric as well as conspecific clustering in the NJ tree supported the morphological identifications. Likewise, conspecific sequences clustered at the species level with high bootstrap values on the NJ tree, including all phlebotomine vector species from Old World countries. Some supported intra-specific branching was also found to be geographically structured by region or country of collection. For example, Ph. kandelakii is divided into two supported clusters (98%), related to Turkey, and India and Azerbaijan, respectively. Three clusters were also observed in the widely distributed Ph. major s.l. species sampled in Nepal, Jordan and Turkey (root bootstrap value 79%)41,42,43,44. In addition, Ph. perfiliewi and Ph. transcaucasicus clustered together, supporting their species complex status45,46. A similar situation was observed between Ph. martini and Ph. celiae47. Additional investigations might be required to determine if some sequences mined from online repositories were not initially misidentified (Fig. 5, Supplementary PDF file S1).

Nucleotide diversity of Ph. argentipes population was very low in Nepal (Pi = 0.004), in contrast to the high genetic variation reported from Sri Lanka (Pi = 0.427), however, the haplotype diversity was found to be equivalent in both countries (Hd = 0.88)32. The median-joining network supported a demographic expansion of Ph. argentipes populations, with a star-like topology of haplotypes identified in Nepal and other countries. The most frequent haplotype, H_2, was reported from southern India, Sri Lanka and eastern Nepal. Some unique haplotypes are reported from hilly districts situated in the central and western part of Nepal (Palpa and Surkhet). Sri Lanka Island did not share the maximum of its prevalent haplotypes (except three collected from Delft Island; H_40, H_49 and H_50) with India and Nepal, representing a genetically distinct/isolated Ph. argentipes population48. Haplotype diversities of other important phlebotomine sand flies recorded from Nepal, i.e., Ph. (Adlerius) sp., Ph. major s.l. and Ph. papatasi were also very high (0.73 to 0.98), reflecting the representative populations from varying ecological regions49.

Based on the intra and interspecific K2P distance and NJ tree, species of interest were successfully delimited to their predicted group, leaving a few species unidentified. However, in molecular identification techniques, use of a single mtDNA marker is not ideal; therefore, analyzing multiple markers is recommended for the speciation of closely related species39.

In Nepal, phlebotomine fauna is least explored in terms of their diversity and distribution owing to the complexity of morphological identification. About 14 species of phlebotomine sand flies (eight species of Phlebotomus and six species of Sergentomyia) were reported from Nepal till 200014,50 and no further investigation on the sand fly diversity and distribution has ever been conducted since then. Over the last three decades, most investigations on sand flies were focused on the Ph. argentipes sand fly due to its significance in L. donovani transmission. In recent years, visceral as well as cutaneous forms of leishmaniasis have spread to wide geo-ecological regions (hills and mountains), even in areas that were once considered unsuitable for the transmission of the disease. Using morphology and DNA-based techniques (though on a small scale), we confirmed the occurrence of Ph. argentipes in most of the surveyed districts. There are significant implications of correctly identifying vector species for tailored vector control interventions. This study validated the use of DNA barcoding for the successful identification of vector sand flies, especially the females. The large number of sequences obtained from female Ph. argentipes was a part of the validation process of morphologically identified specimens subjected for Leishmania infection and blood meal analysis, as described elsewhere51. Thus, the method proves to be useful for epidemiological and vector surveillance activities, especially in situations where taxonomic expertise is unavailable.

The presence of potential vectors from Adlerius group (though species-level identification, both morphologically and molecularly, is still pending) and Ph. major s.l. from areas above 1,000 m asl with rocky terrain and warm humid conditions confirms the records of previous findings from Nepal and its neighboring countries14,21,44,52. These competent vectors in high altitude areas likely play a role in the Leishmania transmission, as it has been recorded from similar geo-ecological regions in the bordering state (Uttarakhand) in India9,53.

For other poorly identified specimens, most of them were single, and hence, it is suggested to collect more sand flies from the same areas where these were collected during previous survey and perform integrated taxonomic approach for the species level identification. The current study demonstrated the phlebotomine sand fly diversity and distribution based on the investigation primarily targeted for the collection of the known vector. A nationwide general survey of sand flies across diverse habitats and representative geo-ecological regions along defined altitudinal gradients is needed to obtain a more precise picture of phlebotomine sand fly diversity and distribution.

Importantly, the cost of the DNA barcoding method remains a limiting factor for its large-scale application. It is recommended to select 5–10% of the sand fly samples collected during vector surveillance activities for molecular identification to accurately determine the sand fly species.

Conclusion

The study demonstrated the presence of the incriminated vector in most of the areas with active VL cases, accompanied by other potential vectors at high altitudes. This finding advocates the necessity of systematic entomological surveillance to sustain disease elimination. The potential of the DNA barcoding method to identify major vector species was highly successful and can be utilized in epidemiological investigations and surveillance, for example, in situations where sand fly taxonomy experts are not available. The generated sequences during this study contribute to the enrichment of the public reference DNA barcode databases. The current investigation suggests advanced studies on sand fly biodiversity in Nepal, both at the morphological and molecular levels.

Methods

Sand fly collection and morphological identification

Sand flies were collected from 43 districts (15 endemic and 28 endemicity doubtful) with reported cases of VL, as part of epidemiological and entomological assessments coordinated through the National VL elimination program during 2017 and 2022. The entomological collections were part of a number of activities, including cross-sectional transmission assessment, longitudinal surveillance for seasonality studies and insecticide resistance monitoring in Ph. argentipes. The geo-ecological settings in the surveyed districts varied from lowlands – “Terai” – with a tropical savannah climate to high hills and mountains experiencing a temperate climate with dry winters and hot or warm summers54. Lowlands in Nepal (67–300 m asl) are primarily a fertile Gangetic plain area, rich in agricultural lands, dense vegetation and water bodies. Hilly districts encompass a wide range of geography (> 300 m – 2,500 m asl), including undulating terrain, deep valleys with scattered agricultural terraced lands and dense subtropical and coniferous forests. Mountainous districts are situated in the inner and high Himalayan region with altitudes ranging from roughly 2,500 m to more than 8,000 m asl and is characterized by a steep, rugged landscape with high peaks, deep valleys with scattered dwarf shrubs and alpine meadows55 (Fig. 1, Supplementary Table S1). In each surveyed district, we selected two or more villages, depending on the objectives of the individual studies. Sand flies were collected from households with or without cohabiting cattle, using the Center for Disease Control and Prevention (CDC) light traps installed inside the dwellings. In addition, we manually searched and aspirated resting sand flies from inside corners of rooms, cracks and crevices of walls of houses and cattle sheds, cattle tying poles and around the cattle feeding troughs. All sand flies were preserved in 80% ethanol and transferred to the entomology laboratory at B.P. Koirala Institute of Health Sciences (BPKIHS - Dharan, Nepal) for further laboratory processing. These sand flies were identified based on the morphological and anatomical characteristics of male and female genitalia, pharyngeal teeth, cibarium and antennal segments with the help of the regional sand fly species identification keys14,56 a stereoscope and a light microscope.

Descriptive analysis of sand fly distribution and diversity

Sand flies collected during the cross-sectional survey in each of the 43 districts were analyzed for distribution and diversity. Diversity of the sand fly species was proportionately represented in pie charts engraved in the map of Nepal. We fitted generalized linear models (GLM) with a negative binomial distribution in the entomological data to assess the association of the sand fly abundance in function of the explanatory variables, like ecological regions. The model was fitted because the sand fly abundance data were over-dispersed and showed a non-normal distribution, i.e., variance was larger than mean value. The calculation was done using the function ‘glm.nb’ from a R package “MASS”57. Results of the analysis are interpreted as an incidence rate ratio (IRR) and confidence interval (CI) at 95%.

DNA extraction and PCR amplification

DNA extraction and PCR amplification were performed in 2019. Sand flies were collected from only 12 districts from across the country during 2017–2019. Among these, we selected sand flies from eight districts (five endemic and three endemicity doubtful) that represented a wide ecological regions of the country (East, West, and also variation in topography) (Fig. 1) for molecular analysis. Genomic DNA was extracted from 316 sand fly specimens using the DNeasy® Blood and Tissue Kit (QIAGEN, Hilden, Germany), following the manufacturer’s protocol. A fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene (658 bp) was amplified with the universal primer pair: LCO1490–5′-GGTCAACAAATCATAAAGATATTGG-3′ and HCO2198–5′-TAAACTTCAGGGTGACCAAAAAATCA-3′58. Each amplification was performed in a volume of 25 µl containing 12.5 µl GoTaq Green master mix with 2 mM MgCl2 (Promega, USA), 0.4 µM of each of the primers (Biolegio, The Netherlands), 6.7 µl of PCR grade water (Himedia, India) and 5 µl of DNA template. The PCR profile was as follows: denaturation at 95 °C for 2 min, followed by 40 cycles of denaturation at 92 °C for 30 s, annealing at 50 °C for 45 s, extension at 72 °C for 60 s, and a final extension at 72 °C for 10 min. The PCR products, positive and negative controls were loaded on a 2% agarose gel, stained with ethidium bromide, and examined under a gel documentation system (BIORAD and SYNGENE). Positive amplicons were outsourced to BaseClear (The Netherlands) and Macrogen (South Korea) for purification and sequencing.

Sequence editing and data analysis

All generated sequences were checked and edited to resolve ambiguities in BioEdit Sequence Alignment Editor 7.0.5.359. Each of these sequences was checked for stop codons, and primer residues were trimmed off. Each of these edited sequences was saved as fasta data files and queried using the BOLD identification engine (www.boldsystems.org; Species Level Barcode Records option) and BLAST for GenBank (https://blast.ncbi.nlm.nih.gov; program option optimized for megablast). Multiple sequence alignment (ClustalW) and computation of nucleotide composition were performed in MEGA v.760. The number of haplotypes, polymorphic sites and nucleotide diversities were analyzed in DnaSP v.6 software61. The “Best Close Match” approach of identifying species based on DNA barcoding distances was used to estimate relative frequency of identification success and was calculated in R using the function “bestCloseMatch” available in the package “spider”62. This approach considers sequences with the smallest genetic distance to query all conspecific and within 95% of all intraspecific distances63. Mean genetic distance estimates, pairwise sequence divergence and tree construction60,62, including all generated sequences from the present investigation (n = 315), were performed based on the nucleotide substitutions Kimura-2-parameter (K2P) model with 1,000 bootstrap replications. A neighbor-joining tree was constructed from the identified haplotypes (n = 101), including both genera, Phlebotomus and Sergentomyia.

For further analyses, available COI sequences of Phlebotomus species – known vectors of Leishmania from Old World countries12,64 were mined from the public databases of BOLD and GenBank. Sequences less than 600 bp in length, species with singleton sequences and ambiguous sequences were discarded from the list. Generated Phlebotomus sequences (n = 276) and sequences from BOLD and GenBank that passed exclusion criteria (n = 1,124) were subsequently aligned (ClustalW) and trimmed to retain the overlapping standard barcode region. The mean genetic distance (K2P) between non-conspecific and conspecific sequences was computed in MEGA and R with the package “spider” using the function “sppDistMatrix”62. The barcoding gap assessment was based on the mean intraspecific and the minimum interspecific distances (the nearest neighbor)65. A haplotype Neighbor-Joining tree was further constructed, including phlebotomine vector species (BOLD/GenBank: 545 haplotypes, generated dataset: 70 haplotypes) and one outgroup taxon (Aedes aegypti; LC489421) (K2P; 1,000 bootstrap replications)66,67.

Finally, a median-joining network was built for Ph. argentipes, including the sequences generated in this study (n = 231) and available sequences from all other countries (n = 191), using the median-joining algorithm in NETWORK v.10.2 (fluxus-engineering.com)68.