Abstract
Anopheles mosquitoes are known as dominant vectors of malaria parasites and other viral and bacterial pathogens. Deciphering Anopheles genomes has opened a new era of research and allowed in-depth investigation of several molecular mechanisms involved in pathogen transmission. However, there is a lack of appropriate model systems to study specific vector-pathogen interactions at the molecular level. Even inbred mosquito colonies represent genetically heterogeneous population and require special care facilities. Insect cell cultures could serve as a useful alternative due to the ease of handling and maintenance, especially for genomic studies requiring millions of cells per experiment. Here, we provide chromatin contact maps and a dataset of epigenetic characteristics, including histone mark profiles and RNA-seq data, as well as transfection conditions and promoter specificity for the mosquito cell line MSQ43.
Similar content being viewed by others
Background & Summary
Quick uncontrollable spreading of insects causes simultaneous expansion of human and animal diseases all over the world. According to the Bradshow invasive insects cost a minimum of US$70.0 billion per year globally, while associated health costs exceed US$6.9 billion per year1. Mosquitoes, in particular, do a giant damage to human health worldwide. The list of diseases transmitted by mosquitoes includes malaria, dengue, West Nile virus, chikungunya, yellow fever, filariasis, tularemia, dirofilariasis, Japanese encephalitis, Saint Louis encephalitis, Western and Eastern equine encephalitis, Venezuelan equine encephalitis, Ross River fever, Barmah Forest fever, La Crosse encephalitis, O’nyong-nyong virus (ONNV), and Zika virus, as well as newly detected Keystone virus and Rift Valley fever2. In total, over one million people die from mosquito-borne diseases every year. Anopheles mosquitoes are responsible for global spreading of malaria within the human populations, while An. stephensi is known as a primary vector of malaria in South Asia and is capable of transmitting both P. falciparum and P. vivax parasites3 (Fig. 1a). Originated from Southeast Asia and the Arabian Peninsula it is nowadays known as a pervasive mosquito with an extensive geographical range, occupying territories from Northern Thailand through the Indian Subcontinent up to the Himalayan region and from the Persian Gulf area across the Middle East towards southern China4. Lately, An. stephensi was reported in the Lakshadweep Islands (2001)5, in countries in the Horn of Africa (2012)6, in Sri Lanka in 2016–20177, in Ethiopia in 2016–20188 and most recently in the Republic of Sudan (2019), which reflects its fast geographical expansion from Asian regions towards the African continent. The intervention of An. stephensi into North Africa where over 40% of sub-Saharan Africans live in urban environments is even more worrying given its ability to successfully survive in both rural and urban environments and exhibiting the resistance to several classes of insecticides9.
MSQ43 cell line and karyotyping. (a) An. stephensi bloodfeeding female (adult mosquito photo credit: CDC/James Gathany) is shown on the left panel and Anopheles stephensi eggs on the right panel; (b) Adhesive MSQ43 cell culture; (c) Diploid karyotype and chromosomes X, Y, 2, 3 are shown; (d) Percentage of tetraploid cells within the population varies between 3–7%; (f) Regular PCR for Y-, X, 2R-chromosomes genes, PCR product size is noted at the bottom part.
The mosquito transmission potential and vectorial capacity varies within Anopheles gender10,11. Biological reasons for that difference are still undercovered and need further scientific expertise. Mosquito immunity responses to the parasite invasion as well as molecular mechanisms activated during the infection and involved into mosquito-plasmodium interactions stand the challengeable questions. Recent studies have discovered several genomic loci which are probably involved in parasite transmission and mosquito immune response12,13,14,15,16,17,18. Midgut chromatin comparison between infected and non-infected mosquitoes detected tremendous discrepancies in transcriptional and epigenetics profiles17,19. As molecular technologies improve, they give rise to new opportunities for addressing key biological questions. Since functional genomic techniques often require tens of millions of cells per experiment, the usage of cell cultures remains the most practical approach. Immortalized insect cell lines have been introduced in the 1960s, and have been generally used for recombinant protein expression, while have recently emerged as a powerful tool for biomanufacturing for human medicine and veterinary science (reviewed in20). Additionally, dipterian cell cultures are known as reliable tools for studying vector-parasite interactions21 as well as insecticide efficiency and toxicity22. To date, 24 Anopheles cell lines are available and listed at ExPASy Cellosaurus database23 (http://web.expasy.org/cellosaurus), including six cell lines established from An. stephensi mosquito. The MSQ43 was originally established from a 1st stage larva of An. stephensi mosquito as an adhesive monolayered immortalized cell culture (Fig. 1b)24. Laboratory strains of the An. stephensi can be categorized into three ecological variants25, and MSQ43 was obtained from the variety myrosensis24. Since well-annotated biological models are in high demand, especially mosquito ones due to their tight molecular connection with the transmitting process of human diseases, MSQ43 cell line could serve as an appropriate model system for mosquito genomic and epigenetic studies.
Here, we comprehensively characterized the MSQ43 epigenetic and chromatin profiles and demonstrated transcriptional patterns based on RNA-seq data. Chromatin three-dimensional (3D) interactions were analysed using the Hi-C approach. We produced 78 millions unique reads, containing 64 millions Hi-C contacts. ChIP-seq experiments performed in 2 replicas include the set of five antibodies specific for active (H3K27Ac, H3K4me3) and repressive histone marks (H2AK119Ubiq, H3K27me3, H3K9me3). On average, 7–15 millions uniquely aligned reads were generated for each ChIP-seq library. To complement epignetic profiles, we generated an RNA-seq data set and compared it with embryonic data available from a recent study26. Finally, we optimized transfection conditions and accessed activity of several mammalian and Drosophila promoters frequently used in research laboratories, specifying conditions for exogenous genes delivery in MSQ43 cells.
Methods
Cell culture
Adhesive monolayered MSQ43 cells (Fig. 1b) were maintained at 28 C in 5% CO2 at approximately 60–80% confluence in MEM (Minimum Essential Medium) with Earle′s salts (Sigma Aldrich, #0275) containing 5% heat-inactivated fetal bovine serum (Sigma Aldrich, #F9665), 0.1% D-glucose (Sigma Aldrich, G7021), 1 mM NEAA (Sigma Aldrich, #M7145), 2 mM L-glutamine (Thermo Fisher Scientific, #35050038), 1x vitamin solution (Thermo Fisher Scientific, #11120052), 2.2 g/L sodium bicarbonate, and 1% Penicillin/Streptomycin (Sigma Aldrich,#P4333). TripleExpress (Thermo Fisher Scientific, #12604013) was used for harvesting/passaging. 10% DMSO in KSR (Thermo Fisher Scientific, #10828028) was used for freezing.
Transfection
Transfection was performed with Lipofectamine 3000 on cell lines MSQ43 and HEK293T (human cell lines used as a control) according to the manufacturer’s recommendations. To test promoter activity we employed circular Addgen plasmids with red and green fluorescent proteins: 40259 and 25917 (to test pCMV), 80945 and 80946 (pCAG), 129717 and 129719(pEF1a), 60511 (pRPBSA) and 112686 (pUbiquitin-63E), and linear construct obtained from Addgene plasmid 25917 after treatment with restriction NheI and NotI (pSFFV). After 24 h after transfection cells were collected for analysis on FACS (BD FACS Aria III) and fluorescent microscope (ZOE Fluorescent Cell Imaging System).
Karyotyping and chromosome number analysis
MSQ43 was established in 1969 from 1st instar larvae tissues of An. stephensi but karyotype was unspecified. We performed karyotype analysis according to standard cytogenetic protocol27. The metaphase spreads for karyotyping were prepared using an air-drying method. In brief, cells were treated with colcemid with a final concentration of 0.05 mg/ml for 1.5 h, then exposed to hypotonic solution, and fixed in methanol/glacial acid solution (3:1 ratio). The number of chromosomes was analyzed using DAPI staining. A total of >400 metaphase spreads were examined to determine the modal number.
We determined the chromosome plate as 2n = 6, displaying characteristics of male genotype (Fig. 1c–e). Regular PCR performed for ASTEI00671, ASTEI00780, ASTEI01072, ASTEI07753, Guy-1 genes from 2 R, X, and Y-chromosomes, respectively, confirmed that statement (Fig. 1e; set of primers is available at Table S1). Analysis of sequenced reads coverage demonstrated that Chr X accumulated almost the same amount of reads per 100 kb as autosomal arms (Table S2). That finding suggests that Chr X was duplicated and, more likely, two homologous X-chromosomes merged and formed single X (shown on the Fig. 1c). Interestingly, in MSQ43 cells DAPI-positive pericentric heterochromatin occupy substantially smaller portion of chromosome X than on chromоsome Y. This X-chromosome organization contrast our data with previously published karyotype of An. stephensi mosquitos from Indian strain28, where both X- and Y-chromosomes manifest as acrocentric chromosomes carrying equally large blocks of DAPI-positive pericentric heterochromatin.
In situ Hi-C for MSQ43
Hi-C was performed as technical duplicates from 5–10 mln cells per experiment as described previously29. In brief, cells were harvested with TripleExpress (Thermo Fisher Scientific, #12604013), turned into a single-cell suspension, washed twice in PBS, and fixed in 2% formaldehyde solution (Sigma Aldrich, #8775) in PBS for 15 minutes. Then, cells were pelleted by centrifugation (500–900 g, 5 min), quenched with 2 M glycine solution, washed twice with PBS, flash-frozen, and stored at −80 °С until the next use. The lysis step was performed for 60 minutes in total, incubating samples 30 minutes on ice and 30 minutes at RT with overhead mixing. Cells were washed with a lysis buffer and Neb 3.1 buffer and pelleted by centrifugation (900–1100 g, 7 min, 4 °С). 0.3% SDS solution in Neb 3.1 buffer was used to open the chromatin. SDS was quenched by adding triton X-100 to the final concentration 1.8% and incubation at 37 °С for 30 minutes. The reaction buffer was replaced with fresh Neb 3.1 containing 1.8% triton X-100 and 400U of DpnII (New England Biolabs, R0543M) were added for overnight incubation. Next day, to improve restriction efficiency the reaction was supplemented with additional 200U of DpnII. The overhangs were labeled with Biotin-14-dATP (Thermo Fisher Scientific, #19524016) and DNA molecules, located close in proximity in 3D-nuclear space, were ligated with 100U T4-ligase (local suppliers) in an overnight reaction. DNA was extracted by phenol/chloroform method, sonicated with M220 COVARIS instrument, and Hi-C-libraries were prepared for Illumina sequencing using Kapa Hyper Prep Kit (Roche, #KK8504) and a KAPA Single-Indexed Adapter Set A, (Roche, #KK8701). Biotin pull-down was performed with Dynabeads (Thermo Fisher Scientific, #65601) after the adapter ligation step. Final PCR was approached with 6–8 cycles of amplification. Libraries were sequenced on DNBSEQ sequencing systems (BGI, China).
Processing of Hi-C experiments
Raw sequencing reads were mapped to the reference genome AsteI4, and then filtered and processed further using the Juicer pipeline, modified as described previously30. Hi-C valid read pairs were collected from two biological replicates, merged after approving the high concordance between them, and filtered for a mapping quality ≥ 30. Complete Hi-C statistics can be found in Supplementary materials (Library statistics in Table S4).
Immunocytochemistry
Cells were grown at approximately 60% confluence and fixed in 3% paraformaldehyde for 30 minutes at RT. Cell membranes were penetrated with 0.1–0.2% Triton X-100 for 5 minutes. Nonspecific protein interactions were blocked with 5% BSA/PBS solution for 30 minutes at RT. Primary antibodies in dilution buffer (1% BSA, 0.1% tween, PBS) were added and incubated overnight at 4 °С. Next day, after 3 washes in 0.2% Tween/PBS, secondary antibodies were added and incubated in dilution buffer for 2 h at RT. After 3 washes in 0.2% tween/PBS, nuclei were stained with DAPI. Specific signal was visualized under confocal microscope LSM 780 NLO (Zeiss).
Chromatin immunoprecipitation (ChIP-seq)
20–30 mln MSQ43 cells were fixed in 1% paraformaldehyde for 15 minutes at RT with overhead mixing. After quenching with glycine cells were lysed for 20 minutes in lysis buffer (10 mM Tris-HCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate, protease inhibitors) with 0.5% SDS. Chromatin was sonicated using a Bandelin Sonopulse machine with 75% power within 8 cycles of 30/90 sec ON/OFF in the same buffer with SDS. The total chromatin concentration was determined by Qubit, 20–40 μg were used per one ChIP. Before incubation with specific antibodies, chromatin was diluted with a lysis buffer up to 0.1–0.2% SDS and pre-cleared by incubation with Protein A magnetic beads (New England Biolabs, S1425S) for 2 h at 4 C with slow rotation. During this time, another aliquot of Protein A magnetic beads was washed in PBS, combined with 5 μg target antibodies, and incubated for 2 h at 4 C with slow rotation. Beads were removed and pre-cleared chromatin was immunoprecipitated with antibody/magnetic beads complexes and incubated overnight at 4 °С with slow rotation. Next day, the beads were thoroughly washed in a series of buffers (Buffer 1: 10 mM Tris-HCl, 1 mM EDTA, 1% Triton X-100, 0.1% SDS, 0.1% sodium deoxycholate, protease inhibitors; Buffer 2: 500 mM NaCl, 10 mM Tris-HCl, 1 mM EDTA, 1% Triton X-100, 0.1% SDS, 0.1% sodium deoxycholate, protease inhibitors; Buffer 3: 0.25 M LiCl, 10 mM Tris-HCl, 1 mM EDTA, 0.5% NP-40, 0.5% sodium deoxycholate; Buffer TE/Triton: 10 mM Tris-HCl, 1 mM EDTA, 1% Triton X-100; TE buffer: 10 mM Tris-HCl, 1 mM EDTA). Cross-links were removed and DNA was eluted in 100 μL elution buffer (10 mM Tris-HCl, 1 mM EDTA, 1% SDS) by incubation at 65 °С for 14 hours. After treatment with RNAse A (New England Biolabs, T3018) and Proteinase K (New England Biolabs, P8107S), magnetic beads were removed. DNA was extracted using ChIP DNA Clean & Concentrator columns (Zymo Research, D5205). ChIP-seq libraries were prepared for sequencing using Kapa Hyper Prep Kit (Roche, #KK8504) and a KAPA Single-Indexed Adapter Set A, (Roche, #KK8701). Libraries were sequenced on DNBSEQ sequencing platform.
Processing of ChIP-seq data
The quality of raw reads was analyzed using FASTQC31 and multiQC32 Illumina adapters were cut using the Cutadapt33 program. Then, raw reads were processed including trimming, aligning, peak calling, filtering, and quality controlling using ENCODE ChIP-seq AQUAS pipeline34. We used AsteI4 as a reference genome. Consistency between replicates was analyzed using functions makeTagDirectory and analyzeRepeats.pl from HOMER Software35 for computing RPKM values. The obtained RPKM values were correlated using python library scipy.stat for Spearman’s correlation calculation and matplotlib library for hexbin graph plotting.
RNA-seq
Cells were grown at approximately 60% confluence, washed twice with 1x PBS, harvested with TripleExpress solution (Thermo Fisher Scientific, #12604013), counted, and transferred to 1.5 ml tubes. Total RNA was extracted using Aurum™ Total RNA Mini Kit (Bio-Rad, #7326820). 1–2 mln cells were lysed followed by purification on silica membrane in a spin-column format using a spin-mediated protocol, including DNA digestion for 25 minutes. Concentration was measured with a Qubit 4 spectrophotometer using Qubit RNA HS Assay kit (Thermo Fisher Scientific, #Q32852), then samples were sent for RNA sequencing to the BGI facility (Hong Kong).
Processing of RNA-seq data
Raw reads were processed at usegalaxy.org35 using FASTQC31, multiQC32 and Trimmomatic36 tools for quality control. Trimmed reads were aligned to the reference genome AsteI4 using Hisat237. Transcriptome assembly was done using Stringtie38 and Stringtie-merge tools. Normalized FPKM values were generated using Stringtie tool and gtf file created by Stringtie-merge. Consistency between replicates and PCA analysis were done using python. Comparison of RNA-seq data between replicas and between tissues was performed using FPKM values.
Technical Validation
ChIP-seq data validation and quality control
We described epigenome of MSQ43 cell line by producing chromatin immunoprecipitation–sequencing (ChIP-seq) maps for a comprehensive set of active (H3K27Ac), enhancer-related (H3K4me3), PRC1-mediated (H2AK119Ubiq), and PRC2-mediated (H3K27me3) histone modifications, as well as for H3K9me3 - epigenetic mark of facultative heterochromatin Fig. 2.
ChIP-seq characteristics. (a–e) Antibodies validation by immunostaining; (f–j) Row-linked heatmaps showing RPKM normalized number of reads across a 10 kb genomic interval in 20 bp bins relative to the TSS. Heatmaps were generated from merged biological ChIP-seq replicas for all selected antibodies. Each row represents a single gene. Black and white reflect high and low read densities, respectively; (k–o) Metaplots showing ChIP-seq signal distribution across a 8 kb genomic interval in 20 bp bins in relation to the TSS position in A- and B-compartments (red and blue colors, respectively).
First, we validated antibodies of choice to be compatible with mosquito cells. The list of antibodies used in the study can be found in Table S3. We carried out immunostaining experiments using the protocol described in Methods section and all antibodies were approved to be used in chromatin immunoprecipitation experiment (Fig. 2a–e). Then, we performed immunoprecipitation experiments following the protocol, described in Methods. Quality validation of the raw sequencing datasets was performed by FastQC and MultiQC programs. The average base quality score for all ChIP-seq samples was in range of 30–50 (green zone). On average 40 million reads were generated for each ChIP-seq sample with a read length ~150 bp. Raw reads were aligned to AsteI4 by Hisat2 tool with mapping rate ranging from 41 to 81%.
The consistency between replicates was analyzed using RPKM normalized reads by HOMER in gene bodies. We calculated Spearman correlation coefficient for every pair of replicates obtaining R~0.9 for all samples except H3K27me3, where correlation between replicas was slightly lower (R~0.8).
To study connections between distribution of chromatin marks and gene expression we built row-linked heatmaps showing RPKM normalized number of reads across a 10 kb genomic interval in 20 bp bins relative to the TSS for A and B compartments separately (Fig. 2f–o). We can clearly see that genes in B-compartments reflect epigenetic characteristics of repressive histone marks. H3K9me3, H3K27me3, H2AK119Ubiq signals were broadly enriched near the TSS in B-compartments while H3K27Ac and H3K4me3 were abundant near TSS located in A-compartments.
RNA-seq profiling and quality control
We generated a gene expression profile for MSQ43 cell line in three replicas Fig. 3. The average base quality score for all RNA-seq samples was in range of 30–50 (green zone). On average, 40 million reads were generated for an individual RNA-seq sample with a read length of 100–150 bp. Raw RNA-seq reads were aligned to AsteI4 by Hisat2 tool with a high mapping rate about 80%.
RNA-seq characteristics. (a,b) Scatter plots showing Spearman correlation coefficient above 0.975 for MSQ43 replicas and 0.593 for MSQ43/embryos pair. (c) Scattergram illustrating principal component analysis (PCA) where PC1 and PC2 explain 95.6% and 2.6% variability among the gene expression pattern, respectively. (d) Dendrogram visualizing clustering between RNA-seq replicas. (e) Heatmap showing clusterization between replicas of RNA-seq data obtained from MSQ43 cell line and embryos respectively. All data presented based on RPKM values.
The reproducibility of biological replicas was analysed in several ways. Spearman correlation analysis showed a correlation coefficient above 0.975 for each pair (Fig. 3a shows a representative example for one pair of replicas), revealing the high reliability of RNA-seq data. To provide baseline for between-replica correlation and access specificity of RNA-seq results, we compared the MSQ43 RNA-seq data with previously reported transcriptome of An. stephensi embryo. In this comparison, Spearman correlation coefficient was lower (0.593) validating the difference between embryos and MSQ43 transcriptome (Fig. 3b). Additional test for RNA-seq replicas concordancy was done by PCA analysis, when 3 replicas produced from MSQ43 were clustering at the one side/pole of the PCA plot while embryo replicas were anchored at the different one (Fig. 3c). The first principal component (PC1) explained 95.6% and the second (PC2) −2.4% of gene difference between MSQ43 and embryonic data. Concording with these results, cell-type specific clustering was observed when RNA-seq data of MSQ43 and embryonic cells were subjected to the hierarchial clustering analysis (Fig. 3d,e). Overall, these results suggest high quality of the produced RNA-seq data.
Usage Notes
Cell lines are a convenient model because they allow simple and efficient perturbation using exogenous expression of genetic vectors. Such vectors can be used to overexpress specific coding or non-coding RNA, deliver components required for targeted genome editing, or creating reported systems and biosensors. One of the important characteristics of a cell line is its ability to transform using various genetic constructs. First we decided to test the activity of promoters in various genetic constructs available in our laboratory. The most widely used methods of cell transformation are the use of special chemicals (Lipofectamine) or electroporation. Using different transformation methods will allow us to choose the most effective one since not all cell lines are transformed by lipofectamine with high efficiency. In MSQ43 we observed the activity of the CMV (Fig. S2a, Table S5). and Ubiquitin-63E promoters (Fig. S2b, Table S5). When using the MSQ43 cell line for Anopheles genomics research, several limitations should be considered. First, as with all immortalized cell lines, MSQ43 may exhibit chromatin organization, transcriptomic profiles, and chromosomal alterations that differ from those of primary tissues. Second, evolutionary divergence in chromatin architecture may constrain the applicability of MSQ43 for studying Anopheles species distantly related to An. stephensi. Nevertheless, MSQ43 remains a suitable model for investigating all three ecological variants of An. stephensi, as the absence of post-copulatory reproductive barriers among them25 suggests conserved epigenetic regulation.
Code availability
The code used for data analysis is available on GitHub: https://github.com/Asirix1/Anopheles_project; https://github.com/kundajelab/chipseq_pipeline; https://github.com/open2c/coolpuppy; https://meme-suite.org/meme/.
References
Bradshaw, C. J. A. et al. Massive yet grossly underestimated global costs of invasive insects. Nature Communications 7, 12986 (2016).
Vector-borne diseases. https://www.who.int/news-room/fact-sheets/detail/vector-borne-diseases.
Sinka, M. E. et al. The dominant Anopheles vectors of human malaria in the Asia-Pacific region: occurrence data, distribution maps and bionomic précis. Parasites Vectors 4, 89 (2011).
Sinka, M. E. et al. A new malaria vector in Africa: Predicting the expansion range of Anopheles stephensi and identifying the urban populations at risk. Proc. Natl. Acad. Sci. USA. 117, 24900–24908 (2020).
Sharma, S. K. & Hamzakoya, K. K. Geographical Spread of Anopheles stephensi Vector of Urban Malaria, and Aedes aegypti, Vector of Dengue/DHF, in the Arabian Sea Islands of Lakshadweep, India. WHO Regional Office for South-East Asia. (2001).
Faulde, M. K., Rueda, L. M. & Khaireh, B. A. First record of the Asian malaria vector Anopheles stephensi and its possible role in the resurgence of malaria in Djibouti, Horn of Africa. Acta Tropica 139, 39–43 (2014).
Gayan Dharmasiri, A. G. et al. First record of Anopheles stephensi in Sri Lanka: a potential challenge for prevention of malaria reintroduction. Malar J 16, 326 (2017).
Carter, T. E. et al. Genetic diversity of Anopheles stephensi in Ethiopia provides insight into patterns of spread. Parasites Vectors 14, 602 (2021).
Ishtiaq, F., Swain, S. & Kumar, S. S. Anopheles stephensi (Asian Malaria Mosquito). Trends in Parasitology 37, 571–572 (2021).
Budiansky, S. Creatures of Our Own Making. Science 298, 80–86 (2002).
Neafsey, D. E. et al. The Evolution of the Anopheles 16 Genomes Project. G3 Genes|Genomes|Genetics 3, 1191–1194 (2013).
Osta, M. A., Christophides, G. K. & Kafatos, F. C. Effects of Mosquito Genes on Plasmodium Development. Science 303, 2030–2032 (2004).
Riehle, M. M. et al. Natural Malaria Infection in Anopheles gambiae Is Regulated by a Single Genomic Control Region. Science 312, 577–579 (2006).
Blandin, S. A. et al. Dissecting the Genetic Basis of Resistance to Malaria Parasites in Anopheles gambiae. Science 326, 147–150 (2009).
Mitri, C., Thiery, I., Bourgouin, C. & Paul, R. E. L. Density-dependent impact of the human malaria parasite Plasmodium falciparum gametocyte sex ratio on mosquito infection rates. Proc. R. Soc. B. 276, 3721–3726 (2009).
Crawford, J. E. et al. Evidence for Population-Specific Positive Selection on Immune Genes of Anopheles gambiae. G3 Genes|Genomes|Genetics 2, 1505–1519 (2012).
Ruiz, J. L. et al. Chromatin changes in Anopheles gambiae induced by Plasmodium falciparum infection. Epigenetics & Chromatin 12, 5 (2019).
Ruiz, J. L., Ranford-Cartwright, L. C. & Gómez-Díaz, E. The regulatory genome of the malaria vector Anopheles gambiae: integrating chromatin accessibility and gene expression. NAR Genomics and Bioinformatics 3, lqaa113 (2021).
Gómez-Díaz, E., Rivero, A., Chandre, F. & Corces, V. G. Insights into the epigenomic landscape of the human malaria vector Anopheles gambiae. Front. Genet. 5 (2014).
Rubio, N. R., Fish, K. D., Trimmer, B. A. & Kaplan, D. L. Possibilities for Engineered Insect Tissue as a Food Source. Front. Sustain. Food Syst. 3, 24 (2019).
Sandiford, S. L. et al. Cytoplasmic Actin Is an Extracellular Insect Immune Factor which Is Secreted upon Immune Challenge and Mediates Phagocytosis and Direct Killing of Bacteria, and Is a Plasmodium Antagonist. PLoS Pathog 11, e1004631 (2015).
Mak, M. et al. Triangulation of methods using insect cell lines to investigate insecticidal mode‐of‐action. Pest Management Science 77, 492–501 (2021).
Bairoch, A. The Cellosaurus, a Cell-Line Knowledge Resource. J Biomol Tech 29, 25–38 (2018).
Pudney, M. & Varma, M. G. R. Anopheles stephensi var. mysorensis: Establishment of a larval cell line (Mos. 43). Experimental Parasitology 29, 7–12 (1971).
Subbarao, S. K., Vasantha, K., Adak, T., Sharma, V. P. & Curtis, C. F. Egg‐float ridge number in Anopheles stephensi: ecological variation and genetic analysis. Medical Vet Entomology 1, 265–271 (1987).
Lukyanchikova, V. et al. Anopheles mosquitoes reveal new principles of 3D genome organization in insects. Nat Commun 13, 1960 (2022).
Prokhorovich, M. A. et al. Cultures of hESM human embryonic stem cells: Chromosomal aberrations and karyotype stability. Bull Exp Biol Med 144, 126–129 (2007).
Jiang, X. et al. Genome analysis of a major urban malaria vector mosquito, Anopheles stephensi. Genome Biol 15, 459 (2014).
Salnikov, P. et al. Structural variants in the Epb41l4a locus: TAD disruption and Nrep gene misregulation as hypothetical drivers of neurodevelopmental outcomes. Sci Rep 14, 5288 (2024).
Gridina, M. et al. A cookbook for DNase Hi-C. Epigenetics & Chromatin 14, 15 (2021).
Andrews, S. FastQC: a quality control tool for high throughput sequence data (2010).
Ewels, P., Magnusson, M., Lundin, S. & Käller, M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048 (2016).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet j. 17, 10 (2011).
Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012).
The Galaxy Community. et al. The Galaxy platform for accessible, reproducible, and collaborative data analyses: 2024 update. Nucleic Acids Research 52, W83–W94 (2024).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33, 290–295 (2015).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP532727 (2025).
Acknowledgements
This work was supported by the grant of the state program of the «Sirius» Federal Territory «Scientific and technological development of the «Sirius» Federal Territory» (Agreement №26-03, 27/09/2024) (Hi-C libraries preparation, data analysis, manuscript preparation). An. stephensi MSQ43 cell line was initially obtained through Malaria Research and Reference Reagent Resource Center (MR4) stocks and BEI Resources, NIAID, NIH. All computations were performed using nodes of the high-throughput cluster of the Novosibirsk State University (supported by the Ministry of Science and Higher Education of the Russian Federation, grant #FSUS-2024-0018), and bioinformatics resource center of the Institute of Cytology and Genetics (state project FWNR-2022-0019). We acknowledge the Center for Shared Use of Flow Cytometry SB RAS for providing FACS sorting facilities.
Author information
Authors and Affiliations
Contributions
V.F. and I.V.S. conceived and supervised the study. V.L. cultivated MSQ43 cells and performed Hi-C, ChIP-seq, and RNA-seq with help from M.G. and A.K. M.N. performed Hi-C data processing and technical validation with help from A.P. P.B. performed ChIP-seq and RNA-seq data processing and technical validation. M.N. and V.L. generated genome assembly and analyzed chromosomal rearrangements. A.K. optimized transfection conditions. All the authors contributed to the manuscript preparation.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Lukyanchikova, V., Nuriddinov, M., Khabarova, A. et al. Chromatin landscape, transcriptomic and ChIP-seq profiling of Anopheles stephensi MSQ43 cell line. Sci Data 12, 1566 (2025). https://doi.org/10.1038/s41597-025-05721-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-05721-2





