Background & Summary

Bacillus thuringiensis (Bt) insecticidal proteins used in sprayable formulations and transgenic crops are the most promising alternatives to synthetic insecticides. However, the evolution of resistance in the field, as well as laboratory insect populations is a serious roadblock to this technology. Achaea janata, a major castor crop pest in India, is controlled using Bt-based formulation1 comprising of Cry1 (Cry1Aa, Cry1Ab, and Cry1Ac) and Cry2 (Cry2Aa and Cry2Ab) genes2. Recent studies from our group reported extensive changes at the cellular and molecular level in the midgut of A. janata exposed to a sublethal dose of the Bt formulation3,4,5. Since a decade, reports of resistance against Bt toxins and their mechanisms have been emerging6,7. Long term exposure to Cry toxin formulations promotes tolerance in larvae which eventually leads to resistance6,7,8. Development of Bt resistance could be due to alterations in proteolytic cleavage of the Cry toxin, altered receptor binding or enhanced midgut regeneration responses9,10,11. With the advent of next generation sequencing technology it is now possible to characterize the entire repertoire of transcripts under different conditions and predict pathways involved in various molecular mechanisms. The RNA sequencing study presented here generated the first de novo transcriptome assembly of castor semilooper, Achaea janata (Noctuidae: Lepidoptera), and compared gene expression signatures between toxin-exposed susceptible and tolerant larvae. This article, is a first step in determining the primary basis for Cry tolerance in the pest, which could facilitate new long term management strategies.

Methods

Toxin administration and sample preparation

Wild population of A. janata larvae, unexposed to pesticides, was field collected from the Indian Institute of Oil Seed Research, Hyderabad, India. Further, the larvae were reared and maintained on castor leaves at 27 ± 2 °C, 14:10 h (light: dark) photoperiod and 60–70% relative humidity for three generations at the insectary of School of Life Sciences, University of Hyderabad, India. In the present de novo transcriptome analysis for the sublethal toxin exposure, 1/10 of LD50 was used (Group ii) (Fig. 1), while for the generation of a tolerant population (Group iv) (Fig. 1) an LD50 dose of DOR Bt-1 formulation was administered1. Larval batches (n = 100) designated as Cry-susceptible larvae and control larvae were exposed to toxin-coated and distilled water coated leaves respectively. The midgut was isolated from 15 randomly selected surviving larvae from each batch after every 12 h till 48 h. In earlier study we noticed that larvae probably sense the toxin and avoid feeding on toxin coated leaves after a short exposure. Hence, to eliminate any effect induced by starvation, an additional batch (Group iv) of 3rd instar larvae was maintained on moist filter paper and collected for the midgut isolation every 12 h till 48 h. All the midgut dissections were carried out in ice-cold insect Ringer solution (130 mM NaCl, 0.5 mM KCl, and 0.1 mM CaCl2). The experiment was performed in duplicates. For the Cry tolerant larval population, larvae (n = 100) in each generation were exposed to LD50 dose and the surviving insects were maintained for larval development, pupal molting, adult emergence and egg laying. The larvae hatched from the eggs were collected and reared till 3rd larval instar larvae and exposed to LD50 Bt dose once again. This schedule was carried out for fifteen generations. The batch (n = 100) of Cry tolerant larvae thus generated were exposed to toxin-coated leaves and the midguts were isolated from randomly selected fifteen larvae after 24 h. Total RNA was isolated from the midgut samples using Trizol-based method. The RNA was quantified using NanoDropTM 8000 spectrophotometer and the quality was assessed using 1% formaldehyde denaturing agarose gel.

Fig. 1
figure 1

Flow chart showing the methodology used for the present study.

Library preparation

Illumina 2 × 150 pair end libraries were prepared as follows. Briefly, mRNA was enriched from isolated total RNA and fragmented. The fragmented mRNA was used for first-strand cDNA synthesis, followed by second-strand generation, A-tailing and adapter ligation. Adapter ligated products were purified and PCR amplification was carried out. PCR amplified cDNA libraries were assessed for quality and quantity using DNA High Sensitivity Assay Kit (Agilent Technologies).

Quality assessment prior to cluster generation and sequencing

The amplified libraries were analyzed using Bioanalyzer 2100 and High Sensitivity DNA chip (Agilent Technologies). After obtaining the Qubit concentration for each of the libraries, it was loaded on Illumina platform (2 × 150 bp chemistry) for cluster generation and sequencing. Data was generated on Illumina HiSeq. 2500 system and paired-end sequencing allowed the template fragments to be sequenced in both the forward and reverse directions. The library molecules bind to complementary adapter oligos on paired-end flow cell. The adapters were designed to allow selective cleavage of the forward strands after re-synthesis of the reverse strand during sequencing. The copied reverse strand was then used to sequence from the opposite end of the fragment. Total RNA was subjected to pair-end library preparation with Illumina TruSeq Stranded Total RNA Library Preparation Kit. The mean size of the libraries was between 357 bp to 567 bp for the 28 samples. The libraries were sequenced and high quality data was generated for ~ 3.05 GB data per sample (Online-only Table 1).

Sequence analysis

Illumina 2 × 150 pair end libraries were prepared using the Illumina TruSeq stranded mRNA Library Preparation Kit and as per the firm’s protocol (Illumina Inc.). The amplified libraries were analyzed on the Bioanalyzer 2100 with a High Sensitivity DNA chip (Agilent Technologies). The de novo master assembly was generated using “SOAP-denovo-Trans (v1.03)” assembler (Short Oligonucleotide Analysis Package)12. For each data set, raw quality was assessed and filtered with Trimmomatic (v.0.36)13. Transcripts were clustered using the CD-HIT (Cluster Database at High Identity with Tolerance) package14. The predicted proteins of CDS (Coding sequence) were subjected to similarity search against NCBI’s non-redundant (nr) database using the BLASTP (Basic Local Alignment Search Tool) algorithm.

Data Records

The total raw sequencing data from 28 samples (14 biological replicates, where the sequencing experiment was performed twice and the replicates are derived from different pool of larvae and they are biologically independent samples) was used for assembly in the present study. They have been deposited in the NCBI SRA database, with identifier SRP1867015 and accession numbers SRR8617834, SRR8617835, SRR8617836, SRR8617837, SRR8617838, SRR8617839, SRR8617840, SRR8617841, SRR8617842, SRR8617843, SRR8617844, SRR8617845, SRR8617846, SRR8617847, SRR8617848, SRR8617849, SRR8617850, SRR8617851, SRR8617852, SRR8617853, SRR8617854, SRR8617855, SRR8617856, SRR8617857, SRR8617858, SRR8617859, SRR8617860 and SRR8617861, under BioProject PRJNA523326 and BioSample SAMN09241884. This Transcriptome Shotgun Assembly project has been deposited at DDBJ/ENA/GenBank under the accession GHGZ0000000016. The version described in this paper is the first version, GHGZ01000000.

Technical Validation

SOAPdenovo-Trans assembler was used to generate de novo transcriptome assembly from four experimental sets of midgut samples viz. Group (i) susceptible larvae exposed to medium (water), Group (ii) susceptible larvae exposed to 1/10 of LD50 dosage of DOR Bt-1 formulation, Group (iii) susceptible larvae subjected to starvation and Group (iv) tolerant larvae exposed to LD50 dosage of DOR Bt-1 formulation (reared for 15 generations) (Fig. 1). A total of 1,74,066 transcripts were generated for master assembly with a transcriptome length of 10,02,47,510 bps (base pairs). A total of 1,36,818 unigenes were reported using CD-HIT and 35,559 coding sequences were predicted by Transdecoder. The top-hit species distribution revealed that majority (23%) of the CDS aligned with Spodoptera litura followed by Helicoverpa armigera and Heliothis virescens all of which belong to family Noctuidae in the Lepidoptera order.

Transcriptome assembly

The de novo master assembly of high quality reads of 28 processed samples was accomplished using “SOAP-denovo-Trans (v1.03)” assembler12. For each data set, raw quality (phred40) was assessed and filtered with Trimmomatic (v.0.36) using the parameters ILLUMINACLIP:adapter.fasta:2:30:8 MINLEN:40 to remove adaptor sequence and filter by quality score13. An average of 19 million clean reads were obtained. Statistics of high quality reads with total reads, base count and data size are summarized in Online-only Table 1 and statistics of assembled transcripts as well as length distribution is presented in Table 1.

Table 1 Statistics of assembled transcripts and transcript length distribution.

Clustering

To filter the redundancy or the noise, it was required to select one representative transcript for transcripts clusters. Transcripts were clustered using CD-HIT (Cluster Database at High Identity with Tolerance) package14. CD-HIT-EST v4.6.1 was used to remove the shorter redundant transcripts when they were 100% covered by other transcripts with more than 90% identity. The non-redundant clustered transcripts were then designated as unigenes (Table 2). CDS were predicted from the unigene sequences with Transdecoder at default parameters which resulted in the identification of 35,559 CDS (Table 3).

Table 2 Statistics of unigenes and length distribution.
Table 3 Statistics and length distribution of the predicted CDS.

Annotation

The predicted proteins of CDS were subjected to similarity search against NCBI’s non-redundant (nr) database using the BLASTP algorithm. Out of total 35,559 proteins, 32,561 proteins were captured with hits and 2,998 with no hits (Annotation of each transcript of the assembled transcriptome)17. The top-hit species distribution revealed that majority of the hits were found to be against the species Spodoptera litura followed by Helicoverpa armigera and Heliothis virescens (Fig. 2). Simultaneously, all protein sequences were searched for similarity against NR, UniProt (Universal Protein Resources), KOG (EuKaryotic Orthologous Groups) and Pfam database using BLASTP with an e-value threshold of 1e−5. The BLAST result of four databases has resulted in Fig. 3.

Fig. 2
figure 2

Top-hit species distribution of most closely related insect species demonstrated using a horizontal bar graph.

Fig. 3
figure 3

Venn diagram representation of annotated protein in different databases.

Differential expression

In this work we compared the control and Cry toxin tolerant larval transcript map reads for the differential expression analysis. Analysis of count data was done using DESeq. 2 in RStudio platform18. Differential expression analysis shows significant differences in the tolerant larval population as compared to the susceptible population (Differential expression analysis)17. Out of 35,559 CDS analysed, 320 CDS show significant variation (padj < 0.05). Few of these genes like (i) gi|1131919362| Ca2+-binding protein, RTX toxin-related, (ii) gi|1199381583| superoxide dismutase [Cu-Zn] 2-like, (iii) gi|315139350| serine protease 63, (iv) gi|1274141826| trypsin, alkaline C-like and (v) gi|1274136486| apolipophorins isoform X2 were shown to be upregulated, while (i) gi|123995301| ribosomal protein SA, (ii) gi|744619941| predicted: 60 S ribosomal protein L8, (iii) gi|45219787| ribosomal protein S3A, (iv) gi|1344818460| alanine aminotransferase 1-like and (v) gi|501300966| ubiquitin were downregulated.