A chromosome-level genome assembly of the cabbage aphid Brevicoryne brassicae

Wu, Jun; Li, Guomeng; Lin, Zhimou; Zhang, Yangzhi; Yu, Wenyuan; Hu, Rong; Zhan, Shuai; Chen, Yazhou

doi:10.1038/s41597-025-04501-2

Download PDF

Data Descriptor
Open access
Published: 28 January 2025

A chromosome-level genome assembly of the cabbage aphid Brevicoryne brassicae

Scientific Data volume 12, Article number: 167 (2025) Cite this article

3186 Accesses
2 Citations
Metrics details

Subjects

Abstract

The cabbage aphid, Brevicoryne brassicae, is a major pest on Brassicaceae plants, causing significant yield losses annually. However, the lack of genomic resources has hindered progress in understanding this pest at the molecular level. Here, we present a high-quality, chromosomal-level genome assembly for B. brassicae, based on PacBio HiFi long-read sequencing and Hi-C data. The final assembled genome size was 429.99 Mb, with a scaffold N50 of 93.31 Mb. Notably, 96.19% of the assembled sequences were anchored to eight chromosomes. The genome covered 99.24% of BUSCO genes and 95.16% of CEGMA genes, indicating a high level of completeness. By integrating high-coverage transcriptome data, we annotated 22,671 protein-coding genes and 3,594 lncRNA genes. Preliminary comparative genomic analyses focused on genes related to host colonization, such as chemosensory- and detoxification-related genes, as well as cross-kingdom lncRNA Ya. In summary, this study presents a contiguous and complete genome for B. brassicae, which will advance our understanding of the molecular mechanisms underlying its host adaptation, pest behavior, and interaction with Brassicaceae plants.

Chromosome-level genome assembly of vetch aphid Megoura crassicauda (Hemiptera: Aphididae)

Article Open access 30 September 2025

A chromosome-level genome assembly of the Brontispa longissima

Article Open access 14 September 2024

The chromosome-level genome assembly of Aphidoletes aphidimyza Rondani (Diptera: Cecidomyiidae)

Article Open access 17 July 2024

Background & Summary

Brevicoryne brassicae, commonly known as the cabbage aphid, is a notorious pest that specializes in plants of the Brassicaceae family, including crops like rapeseeds, cabbage, and broccoli. The B. brassicae causes damage to the plants directly through sap sucking from phloem tissues as well as indirectly by transmitting several plant viruses, which collectively result in significant yield losses to many Brassicaceae crops worldwide. B. brassicae is a nonhost-alternating species, meaning its entire life cycle is completed on the herbaceous plants that typically serve as secondary hosts for host-alternating aphids¹. The life cycle includes a sexual generation and several asexual generations. During the winter, B. brassicae produces sexual forms and overwinters in the egg stage. In warm seasons and regions, the life cycle simplifies to parthenogenetic reproduction. The winged females (alates) (Fig. 1a) emerge when the population density increases and the host quality declines. Alates migrate to distant crops and produce offspring via parthenogenesis, leading to population expansion exponentially and the escalation of aphid damage in the fields².

Since the first aphid genome, the genome of Acyrthosiphon pisum, was published in 2010³, now dozens of aphid genomes have become available, including important agricultural pests such as Myzus persicae^4,5, Aphis gossypii⁶, Diuraphis noxia⁷, and valuable recourse insects like Schlechtendalia chinensis⁸. These genomes have greatly facilitated research on these aphids, leading to a deeper understanding of molecular mechanisms of aphid biology. In contrast, studies on B. brassicae have been limited, largely due to the lack of genomic resources. Although a genome of B. brassicae is available⁹, the quality and annotation need to be further improved. Therefore, we constructed a high-quality B. brassicae genome at the chromosomal level using PacBio HiFi long reads and high-throughput chromosomal conformation capture (Hi-C) data. We annotated the genome for both protein-coding genes and long non-coding RNA (lncRNA) transcripts, and performed phylogenic and evolutionary analysis with different aphid genomes. Our efforts will offer substantial support for a deeper understanding of B. brassicae and future studies into aphids.

After quality control and filtering, we obtained a total of 29.00 Gb (~67.44 × depth) of PacBio long reads and 42.78 Gb (~99.49 × depth) of Illumina short reads. These reads were assembled into 131 contigs with an N50 length of 16.79 Mb (Table 1). Chromosome scaffolding based on Hi-C data resulted in eight chromosomes that contained 96.19% of scaffold sequences. The chromosome number was confirmed by karyotype analysis (Fig. 1b). The final assembled genome is eight chromosomes with a total size of 429.99 Mb (Fig. 1c, Table 1). Chromosome lengths ranged from 16.74 Mb to 125.10 Mb (Fig. 1d). The genome assembly is highly accurate in terms of gene content, with 99.24% of Hemiptera BUSCO (Benchmarking Universal Single-Copy Ortholog) genes and 95.16% of CEGMA (Core Eukaryotic Genes Mapping Approach) genes being present (Table 1), indicating a comprehensive representation of the gene set expected for this taxonomic group. Altogether, the assembly of B. brassicae genome is contiguous, accurate, and complete.

Table 1 Statistics for the genome assembly and annotation of B. brassicae. N.A.: not available.

Full size table

We used a phylogenomic approach to assess the phylogenetic relationships among B. brassicae and other 14 hemipteran insects. Phylogenetic analysis revealed that B. brassicae diverged from D. noxia approximately 49.9 million years ago (MYA) and from other Macrosiphini species about 53.9 MYA (Fig. 2a). We also identified chromosome 1, the largest one (125.10 Mb) in the B. brassicae genome, as the X chromosome since it showed massive synteny to the X chromosomes of M. persicae and A. pisum¹⁰ (Fig. 2b).

Using chromosome-level genome assemblies, we annotated protein-coding genes and lncRNA transcripts using evidence from 90.31 Gb of RNA sequencing (RNA-seq) data (63.60 Gb un-stranded and 26.71 Gb stranded). In total, 22,671 protein-coding genes and 3,594 lncRNA genes were annotated (Table 1). Among them, 22 lncRNA genes in B. brassicae genome were identified as homologous of Ya1 in M. persicae, previously known as a virulence factor¹¹. Conservation of protein-coding sequences is high among different aphid species, and lncRNA sequences tend to be more divergent (Fig. 2a). We also identified 141.19 Mb of repeating sequences accounting for 32.84% of the genome assembly (Table 2). Additionally, we annotated 154 loci of miRNAs, 332 loci of tRNA, 837 loci of rRNAs, and 124 loci of snRNAs (Table 3).

Table 2 Statistics of the transposable elements in the genome of B. brassicae.

Full size table

Table 3 Genomic annotation of miRNA, tRNA, rRNA, and snRNA loci in the genome of B. brassicae.

Full size table

We annotated chemosensory-related genes for gustatory receptors (GRs), odorant receptors (ORs), ionotropic receptors (IRs), and odorant-binding proteins (OBPs) as well as chemosensory proteins (CSPs). The B. brassicae genome encodes 33 GRs, 25 ORs, 17 IRs, 10 OBPs, and 10 CSPs. In addition, we annotated detoxification-related genes and identified 48 genes for cytochrome P450 (P450), 28 for carboxyl/choline esterase (CCE), 21 for glutathione-S-transferase (GST), 49 for UDP-Glycosyltransferase (UGT), 77 for ATP-binding cassette transporters (ABC), and 7 for myrosinases (MYR) (Fig. 2c).

This study presents the high-quality chromosome-level genome assembly of B. brassicae and comprehensive annotations, which provides an invaluable genomic resource for understanding the genetic, evolutionary, and ecological aspects of the cabbage aphid and further offers the possibility to implement integrated pest management of this pest.

Methods

Sample collection and genome sequencing

The B. brassicae nymphs were collected from rapeseed fields in Lanzhou (35°56′39.062″ N, 104°8′49.009″ E), Gansu province, China. Subsequently, these aphids were reared on the Brassica napus variety Zhongshuang11 in a growth chamber set to 22°C and a 16/8 h light/dark cycle in our laboratory. Genomic DNA was extracted from 50 adult insects using the CTAB (cetyltrimethylammonium bromide) method¹² for Illumina, PacBio, and Hi-C sequencing. 1.5% agarose gel electrophoresis and NanoDrop 2000C spectrophotometer were used to validate the quality of the genomic DNA. Briefly, the fragmented genomic DNA sample with a size of 350 bp was end-polished, A-tailed, and then ligated with the full-length adapter following the manuscript of SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences, Menlo Park, CA, USA). Libraries with 350 bp inserts were constructed and sequenced on the Illumina NovaSeq 6000 platform. Raw reads were subjected to quality check by removing adapter sequences and low-quality reads, ultimately 42.78 Gb of clean data were obtained for subsequent analysis.

For PacBio HiFi sequencing, the genomic DNA was sheared into ~15 kb fragments using g-Tubes (Covaris, Woburn, MA, USA) and then were purified by 0.45 × AMPure PB beads (Beckman Coulter, Brea, CA, USA). The cleaned DNA fragments were constructed to SMRT bell libraries as described above. Fragments with sizes of 15–18 kb were selected using BluePippin (Sage Science, Beverly, MA, USA). After annealing the primers and binding Sequel DNA polymerase to SMRT bell templates, the libraries were sequenced using one SMRT cell 1 M on the Sequel System (Biomarker Technologies). In total, 29.00 Gb of subreads with an average read length of 10.45 kb were obtained, producing an overall 67.44 × coverage of the B. brassicae genome. The Hi-C technique was used to achieve chromosome-level assembly, by identifying the contacts between different regions of chromatin filaments. The Hi-C library was constructed following the standard library preparation protocol and sequenced on the Illumina NovaSeq 6000 platform. In total, 60.78 Gb of paired-end clean reads with 150-bp were obtained.

RNA extraction and transcriptome sequencing

Parthenogenetic aphids were collected for RNA-seq. 30 first-instar nymphs, 15 apterous adults, and 15 alate adults collected from rapeseed plants in the lab were used separately for RNA extraction. Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). RNA quality was evaluated using 1.5% agarose gel electrophoresis and the concentration was measured by a NanoDrop 2000C spectrophotometer (Thermo Fisher Scientific, Pittsburgh, PA, USA). RNA integrity was quantified by an Agilent 5400 Fragment Analyzer (Agilent, Santa Clara, CA, USA) following the manufacturer’s instructions. RNA-seq libraries were constructed using the NEBNext® Ultra™ RNA Library Prep Kit (NEB, Ipswich, MA, USA) following the manufacturer’s instructions. Libraries were then sequenced on the Illumina NovaSeq 6000 platform, and 63.60 Gb un-stranded and 26.71 Gb stranded 150-bp paired-end reads were obtained and used for gene prediction.

Genome estimation and assembly

PacBio HiFi long-read data were used to generate a contig-level assembly of the B. brassicae genome. WTDBG2 v2.5¹³ was used to generate a preliminary assembly and Pilon v1.23¹⁴ was used for short-read correcting. Then, the B. brassicae genome assembly was generated, consisting of 131 contigs with a total length of 429.99 Mb and a contig N50 of 16.79 Mb. 29.00 Gb of Hi-C clean data with low-quality reads and adaptor sequences removed were mapped to the draft B. brassicae genome by BWA v0.7.10¹⁵ with the default parameters. Invalid read pairs, including dangling ends, re-ligation, self-cycle, and dumped pairs were further assessed and eliminated from uniquely aligned read pairs by HiC-Pro v2.10.0¹⁶. Valid interaction pairs for scaffold correction were processed by LACHESIS v2e27abb¹⁷ with the default parameters to cluster, order, and orient the contigs onto chromosomes. The draft assembly was examined for contamination by manually inspecting taxon-annotated GC content coverage plots, generated using BlobTools v1.0.1^18,19. Ultimately, eight chromosomes with a scaffold N50 of 93.31 Mb were constructed, covering a span of 429.99 Mb and representing 96.19% of the draft genome assembly.

Genomic repeat annotation

Repeat sequences mainly include tandem and interspersed repeats, the latter being primarily transposable elements (TEs). The TE sequences were annotated by a combination of homology-based and de novo approaches. Initially, a de novo repeat library based on the assembly sequences was generated by using RepeatModeler v2.0.2a²⁰, LTR_FINDER²¹, and RepeatScout²² with default parameters. Subsequently, the predicted repeats were further classified by the PASTE Classifier v1.0²³ and were combined with the database of Dfam v3.2²⁴ to construct a species-specific TE library without redundancy. The library was used as the database for the identification of the TE sequences on the assembly genome by homology searching by RepeatMasker v4.10²⁵ and Repeatproteinmask²⁵. Ultimately, 141.19 Mb of TE sequences were identified, accounting for 32.84% of the genome assembly. Long terminal repeats (LTR) were the largest category of transposable elements, representing 17.47% of the genome, followed by DNA transposons, representing 8.81% of the genome, unknown repeated sequences and long interspersed nuclear elements (LINE), accounting for 6.62% and 1.51% of the whole genome (Table 2).

Protein-coding gene annotation

An integrated approach based on B. brassicae transcriptome and protein homologs from other aphids was used for predicting protein-coding genes on the reference genome being masked with repeats. The RNA-seq data from pooled wingless/winged asexual females and nymphs was used. Reads were mapped to the reference genome using HISAT2 v2.2.1²⁶ with default parameters and processed by SAMtools v0.1.18²⁷. The alignment results were provided for Braker1²⁸, which generated a transcriptome-based gene set. Furthermore, protein-coding genes of related species, including M. persicae, A. pisum, and A. gossypii, and the model insect Drosophila melanogaster, were filtered with isoforms and provided for Braker2²⁹, which generated another homolog-based gene set. The two independent gene sets were compared at both exon and transcript levels to generate a consensus gene set. To do this, unique models non-overlapped to each other were selected first, while the models with the disparity between the two approaches were further checked based on evidence of homolog alignment and transcriptome to reserve the best one.

Phylogenetic tree construction and genome synteny analyses

We identified 484 single-copy genes using OrthoFinder v2.5.4³⁰ based on protein sequences from 15 aphid genomes, including 7 Macrosiphini species (M. cerasi⁹, M. perisicae⁵, Sitobion avenae³¹, Metopolophium dirhodum³², A. pisum¹⁰, B. brassicae, and D. noxia⁷), 5 Aphidini species (Rhopalosiphum maidis³³, Rhopalosiphum padi⁹, Aphis fabae⁹, A. gossypii⁶, and Aphis glycines⁹), one species from Chaitophorinae (Sipha flava³⁴), Lachnini (Cinara cedri³⁵), and Phylloxeridae (Daktulosphaira vitifoliae³⁶). The protein sequences of the single-copy genes were concatenated and aligned automatically by OrthoFinder and generated a multiple sequence alignment file, which was used for phylogenetic analysis. For the phylogenetic tree reconstruction, ProTest v3.2³⁷ was used first and found “JTT + I + G4” to be the best model, which was later used in the maximum likelihood phylogenetic tree reconstruction using RAxML v8.2.12³⁸. To estimate divergence dates, we utilized the topology derived from the maximum likelihood (ML) analysis of first and second position nucleotides as the input tree. We incorporated a calibration point of 23.9 million years ago (MYA) between Metopolophium dirhodum and Acyrthosiphon pisum³⁹. This calibration was employed as the minimum age in soft-bound uniform priors, which were then applied in a Bayesian MCMCTree molecular dating analysis by using PAML (Phylogenetic Analysis by Maximum Likelihood), with the requirement that the sites be present in at least 95% of the taxa⁴⁰. We used iTOL v6⁴¹ for tree visualization.

For the genome synteny analysis, the 1:1:1 orthologs among B. brassicae, M. perisicae, and A. pisum genomes were extracted from OrthoFinder’s result and fed to MCScanX_h⁴², which was used with “-b 2” option to get the inter-species collinearity among B. brassicae, M. perisicae, and A. pisum. SynVisio⁴³ was used to visualize the genome synteny.

Annotation of long non-coding RNAs

The process of identifying lncRNA genes in the B. brassicae genome was divided into three main steps. Firstly, the reads of stranded RNA-seq were mapped to the B. brassicae genome. The raw reads were subjected to quality control using Fastp v0.23.4⁴⁴ with default parameters to ensure data integrity for downstream analyses. The processed reads were aligned to B. brassicae genome using HISAT2 v2.2.1²⁶. The aligned reads were assembled into transcripts by StringTie v2.2.1⁴⁵. The Gffread v0.12.7⁴⁶ with the parameters “-V -H -U -N -P -J -M -K -Q -Y -Z -F–keep-exon-attrs” was used for extracting the assembled transcript sequences. Secondly, LGC (Long Genomic Region Classifier) v1.0⁴⁷ was used for identifying transcripts with non-coding features based on the relationship between ORF (open reading frame) Length and GC content. Meanwhile, assembled transcripts were subjected to the CPC2 (Coding Potential Calculator 2)⁴⁸ to calculate the coding potential. The intersection of the results from LGC and CPC2 was the putative lncRNAs. Thirdly, the putative lncRNA transcripts were screened by rFAM v14.3⁴⁹ to eliminate housekeeping RNAs, such as rRNA, tRNA, and snoRNA.

To annotate the Ya gene family in the B. brassicae genome, the sequence of M. persicae Ya1 (MpYa1) was used as a query. BLASTn⁵⁰ was utilized to perform sequence alignment of the MpYa1 sequence against the annotated lncRNA transcripts with an E-value cutoff of less than 10^-5 and an 80% similarity threshold. The final alignment resulted in the identification of 22 Ya genes in the B. brassicae.

Gene family identification

To annotate the detoxification- and chemosensory-related genes of B. brassicae, amino acid sequences of those genes reported in other aphid species were used as the query in the Diamond blast v0.8.29⁵¹ to identify putative homologies with E-value less than 10^-5. The identified sequences were further validated by annotation of domains using PfamScan⁵² and annotation by the Protein BLAST tool in the National Center for Biotechnology Information (NCBI)⁵³. The sequences of the detoxification-related genes, including cytochrome P450 (P450), carboxylesterases (CCE), glutathione S-transferases (GST), UDP-glucuronosyltransferases (UGT), ATP-binding cassette transporters (ABC), and myrosinase (MYR) genes, were downloaded from the InsectBase v2.0⁵⁴. The sequences of the chemosensory-related genes, including gustatory receptors (GRs), olfactory receptors (ORs), ionotropic receptors (IRs), odorant-binding proteins (OBPs), and chemosensory proteins (CSPs) were obtained from published papers^55,56.

Karyotype analysis

The number of chromosomes was confirmed by karyotype analysis, indicating that diploidic B. brassicae has 16 chromosomes (2n = 16). The Gurr’s Giemsa R66 chromosome staining method⁵⁷ was used. Briefly, chromosome squash preparations are made from young embryos dissected from parthenogenetic adult aphids. The embryos were treated in 0.75% of potassium chloride, and then fixed in a freshly prepared mixture of absolute methanol and glacial acetic acid (3:1 in volume) for 10 minutes. Next, the embryos were carefully transferred onto a pin’s tip, subsequently were moved to a clean microscope slide with a small drop of 45% propionic acid (5 minutes), squashed with a coverslip then dried for 24 hours at room temperature. HCl solution (0.2 M) is applied dropwise for 30 minutes at room temperature, followed by rinsing with distilled water and immersion in a 5% saturated Ba(OH)₂ solution. The sample is then treated in a 60 °C constant temperature water bath for 3 minutes. After the treatment, it is briefly processed in HCl solution (0.2 M) to interrupt the reaction with the strong base, rinsed with distilled water, and air-dried at room temperature. Subsequently, the sample is stained with 5% Giemsa stain (pH 7.0) for 30 minutes, air-dried at room temperature, and examined and photographed under an optical microscope.

Data Records

The genome sequencing data (PacBio, Illumina and HiC) of B. brassicae have been submitted to the Sequence Read Archive (SRA) at the NCBI with accession numbers SRR28892727⁵⁸, SRR28892726⁵⁹, SRR28892725⁶⁰ under the BioProject of PRJNA1099426. The assembled genome is deposited under the same BioProject at NCBI (JBHUPR000000000.1)⁶¹. The RNA-seq data generated in this study have been deposited in the SRA at the NCBI under the BioProject accession number PRJNA1104693 and this submission includes a total of 9 un-stranded RNA-seq data with accession numbers SRR28829023⁶², SRR28829022⁶³, SRR28829021⁶⁴, SRR28829020⁶⁵, SRR28829019⁶⁶, SRR28829018⁶⁷, SRR28829017⁶⁸, SRR28829016⁶⁹ and SRR28829015⁷⁰ and 3 stranded RNA-seq data with accession numbers SRR28892655⁷¹, SRR28892654⁷² and SRR28892653⁷³. The B. brassicae genome assembly FASTA and GFF files, the annotation GTF files of protein-coding genes, the annotation files including PFAM, KEGG and GO, the annotation files of several regulatory elements including transposable element, lncRNA and miRNA, the annotation files of tRNA, rRNA, and snRNA loci, and the protein sequences of detoxification- and chemosensory-related genes have been deposited in the Figshare database (https://doi.org/10.6084/m9.figshare.25583814.v3)⁷⁴.

Technical Validation

Assessing the validity of gene prediction and annotation

The number of chromosomes was confirmed by karyotype analysis, indicating that diploidic B. brassicae has 16 chromosomes (2n = 16). The BUSCO v5.7.0⁷⁵ was used for completeness analysis. The complete BUSCOs under genome mode were used to assess the genome completeness against the database Hemiptera. A total of 99.24% complete BUSCOs were identified, including 97.61% single-copy BUSCOs, 1.63% duplicated BUSCOs, 0.36% fragmented BUSCOs, and 0.40% missing BUSCOs. Similar results were achieved when protein mode was used. A total of 98.65% complete BUSCO were identified, including 96.73% single-copy BUSCO, 1.91% duplicated BUSCO, 0.72% fragmented BUSCO, and 0.64% missing BUSCO. 95.16% of completeness of CEGs was identified based on 248 ultra-conserved CEGs.

Code availability

No specific script was used in this work. All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatic software.

References

Pal, M. & Singh, R. Biology and ecology of the cabbage aphid, Brevicoryne brassicae (Linn.) (Homoptera: Aphididae): a review. J Aphidol. 27, 59–78 (2013).
MATH Google Scholar
Hughes, R. H. Population dynamics of the cabbage aphid, Brevicoryne brassicae (L.). J. Anim. Ecol. 32, 393–424 (1963).
Article MATH Google Scholar
The International Aphid Genomics Consortium. Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol. 8, e1000313 (2010).
Article PubMed Central Google Scholar
Mathers, T. C. et al. Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species. Genome Biol. 18, 27 (2017).
Article PubMed PubMed Central Google Scholar
Mathers, T. C. et al. Chromosome-scale genome assemblies of aphids reveal extensively rearranged autosomes and long-term conservation of the X chromosome. Mol. Biol. Evol. 38, 856–875 (2020).
Article PubMed Central Google Scholar
Zhang, S. et al. Chromosome-level genome assemblies of two cotton-melon aphid Aphis gossypii biotypes unveil mechanisms of host adaption. Mol Ecol Resour. 22, 1120–1134 (2022).
Article PubMed MATH CAS Google Scholar
Nicholson, S. J. et al. The genome of Diuraphis noxia, a global aphid pest of small grains. BMC Genom. 16, 429 (2015).
Article Google Scholar
Wei, H. Y. et al. Chromosome-level genome assembly for the horned-gall aphid provides insights into interactions between gall-making insect and its host plant. Ecol. Evol. 12, e8815 (2022).
Article PubMed PubMed Central Google Scholar
Mathers, T. C. et al. Aphidinae comparative genomics resource (Version v2) [Data set]. Zenodo. (2022).
Li, Y., Park, H., Smith, T. E. & Moran, N. A. Gene family evolution in the pea aphid based on chromosome-level genome assembly. Mol. Biol. Evol. 36, 2143–2156 (2019).
Article PubMed PubMed Central MATH CAS Google Scholar
Chen, Y. et al. An aphid RNA transcript migrates systemically within plants and is a virulence factor. Proc. Natl. Acad. Sci. USA. 117, 12763–12771 (2020).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Chen, H., Rangasamy, M., Tan, S. Y., Wang, H. & Siegfried, B. D. Evaluation of five methods for total DNA extraction from western corn rootworm beetles. PLoS One. 5, e11963 (2010).
Article ADS PubMed PubMed Central Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods. 17, 155–158 (2020).
Article PubMed MATH CAS Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 25, 1754–1760 (2009).
Article PubMed PubMed Central MATH CAS Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Article PubMed PubMed Central MATH Google Scholar
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Article PubMed PubMed Central MATH CAS Google Scholar
Kumar, S. et al. Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots. Front. Genet. 4, 237 (2013).
Article PubMed PubMed Central MATH Google Scholar
Laetsch, D. R. & Blaxter, M. L. BlobTools: interrogation of genome assemblies. F1000Research. 6, 1287 (2017).
Article MATH Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA. 117, 9451–9457 (2020).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics. 21, i351–i358 (2005).
Article PubMed CAS Google Scholar
Hoede, C. et al. PASTEC: an automatic transposable element classification tool. PloS One. 9, e91929 (2014).
Article ADS PubMed PubMed Central MATH Google Scholar
Wheeler, T. J. et al. Dfam: a database of repetitive DNA based on profile hidden Markov models. Nucleic Acids Res. 41, D70–D82 (2012).
Article PubMed PubMed Central MATH Google Scholar
Tempel, S. Using and Understanding RepeatMasker. Methods Mol. Biol. 859, 29–51 (2012).
Article PubMed MATH CAS Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article PubMed PubMed Central CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25, 2078–2079 (2009).
Article PubMed PubMed Central MATH Google Scholar
Hoff, K. J., Lange, S., Lomsadze, A., Borodovsky, M. & Stanke, M. BRAKER1: unsupervised RNA-seq-based genome annotation with GeneMark-ET and AUGUSTUS. Bioinformatics. 32, 767–769 (2016).
Article PubMed CAS Google Scholar
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: Automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform. 3, lqaa108 (2021).
Article PubMed PubMed Central Google Scholar
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 1–14 (2019).
Article MATH Google Scholar
Byrne, S. et al. Genome sequence of the English grain aphid, Sitobion avenae and its endosymbiont Buchnera aphidicola. G3-Genes Genom Genet. 12, jkab418 (2021).
Article Google Scholar
Zhu, B. et al. A high-quality chromosome-level assembly genome provides insights into wing dimorphism and xenobiotic detoxification in Metopolophium dirhodum (Walker). Res Sq. 1–24 (2022).
Chen, W. B. et al. Genome sequence of the corn leaf aphid (Rhopalosiphum maidis Fitch). Gigascience. 8, giz033 (2019).
Article PubMed PubMed Central Google Scholar
Smith, T. E., Li, Y., Perreau, J. & Moran, N. A. Elucidation of host and symbiont contributions to peptidoglycan metabolism based on comparative genomics of eight aphid subfamilies and their Buchnera. PLoS Genet. 18, e1010195 (2022).
Article PubMed PubMed Central CAS Google Scholar
Julca, I. et al. Phylogenomics identifies an ancestral burst of gene duplications predating the diversification of aphidomorpha. Mol. Biol. Evol. 37, 730–756 (2019).
Article PubMed Central MATH Google Scholar
Li, Z. et al. Phylloxera and aphids show distinct features of genome evolution despite similar reproductive modes. Mol. Biol. Evol. 40, msad271 (2023).
Article PubMed PubMed Central CAS Google Scholar
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 27, 1164–1165 (2011).
Article PubMed CAS Google Scholar
Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 30, 1312–1313 (2014).
Article PubMed PubMed Central MATH CAS Google Scholar
Hardy, N. B., Peterson, D. A. & Von Dohlen, C. D. The evolution of life cycle complexity in aphids: Ecological optimization or historical constraint? Evolution. 69, 1423–1432 (2015).
Article PubMed Google Scholar
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Article PubMed MATH CAS Google Scholar
Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. gkae268 (2024).
Wang, Y. P. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40 (2012).
Bandi, V. K. 2020. SynVisio: a multiscale tool to explore genomic conservation. Thesis. Saskatoon, Saskatchewan, Canada: University of Saskatchewan. (2020).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 34, i884–i890 (2018).
Article PubMed PubMed Central MATH Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article PubMed PubMed Central MATH CAS Google Scholar
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Res. 9, 304 (2020).
Article MATH Google Scholar
Wang, G. et al. Characterization and identification of long non-coding RNAs based on feature relationship. Bioinformatics. 35, 2949–2956 (2019).
Article PubMed MATH CAS Google Scholar
Kang, Y. J. et al. CPC2: A fast and accurate coding potential calculator based on sequence intrinsic features. Nucleic Acids Res. 45, W12–W16 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49, D192–D200 (2021).
Article PubMed MATH CAS Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article PubMed MATH CAS Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 12, 59–60 (2015).
Article PubMed MATH CAS Google Scholar
Paysan-Lafosse, T. et al. InterPro in 2022. Nucleic Acids Res. 51, D418–D427 (2023).
Article PubMed CAS Google Scholar
Mcginnis, S. & Madden, T. L. BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res. 32, 20–25 (2004).
Article MATH Google Scholar
Yin, C. et al. InsectBase: a resource for insect genomes and transcriptomes. Nucl Acids Res. 44, D801–D807 (2016).
Article PubMed CAS Google Scholar
Robertson, H. M., Robertson, E. C. N., Walden, K. K. O., Enders, L. S. & Miller, N. J. The chemoreceptors and odorant binding proteins of the soybean and pea aphids. Insect Biochem. Mol. Biol. 105, 69–78 (2019).
Article PubMed MATH CAS Google Scholar
Kuang, Y. et al. Candidate odorant-binding protein and chemosensory protein genes in the turnip aphid Lipaphis erysimi. Arch. Insect Biochem. 113, e22022 (2023).
Article CAS Google Scholar
Blackman, R. L. Chromosome numbers in the Aphididae and their taxonomic significance. Syst. Entomol. 5, 7–25 (1980).
Article MATH Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892727 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892726 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892725 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc:JBHUPR000000000.1 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829023 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829022 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829021 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829020 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829019 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829018 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829017 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829016 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28829015 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892655 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892654 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR28892653 (2024).
Chen, Y. Annotated reference genome of Brevicoryne brassicae. figshare https://doi.org/10.6084/m9.figshare.25583814.v3 (2024).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31, 3210–3212 (2015).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This project is funded by the National Key Research and Development Program of China (project No. 2023YFF1000703), and supported by Hubei Hongshan Laboratory (project No. 2022hszd026 to YC), the Startup Foundation for Advanced Talents at HZAU to YC, and the First Class Discipline Construction Funds of College of Plant Science and Technology, Huazhong Agricultural University (project No. 2022ZKPY003 to YC), and the Wuhan Yingcai Talent Program to YC.

Author information

These authors contributed equally: Jun Wu, Guomeng Li.

Authors and Affiliations

Hubei Hongshan Laboratory, Wuhan, 430070, China
Jun Wu, Guomeng Li, Zhimou Lin, Yangzhi Zhang, Wenyuan Yu, Rong Hu & Yazhou Chen
Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
Jun Wu, Guomeng Li, Zhimou Lin, Yangzhi Zhang, Wenyuan Yu, Rong Hu & Yazhou Chen
CAS Key Laboratory of Insect Developmental and Evolutionary Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China
Shuai Zhan
CAS Center for Excellence in Biotic Interactions, University of Chinese Academy of Sciences, Beijing, China
Shuai Zhan

Authors

Jun Wu
View author publications
Search author on:PubMed Google Scholar
Guomeng Li
View author publications
Search author on:PubMed Google Scholar
Zhimou Lin
View author publications
Search author on:PubMed Google Scholar
Yangzhi Zhang
View author publications
Search author on:PubMed Google Scholar
Wenyuan Yu
View author publications
Search author on:PubMed Google Scholar
Rong Hu
View author publications
Search author on:PubMed Google Scholar
Shuai Zhan
View author publications
Search author on:PubMed Google Scholar
Yazhou Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.C. conceived and led the research, Y.C., G.L., J.W., Z.L. and Y.Z. were involved in sample collection, preparation and genome assembly. S.Z., Y.C., J.W. and G.L. contributed to gene prediction and annotation, data analysis. W.Y. conducted the karyotype analysis. Y.C. and R.H. contributed to data management. Y.C. and J.W. wrote the manuscript and all authors read, revised and approved the final version of the manuscript. Y.C. supervised the project.

Corresponding authors

Correspondence to Shuai Zhan or Yazhou Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, J., Li, G., Lin, Z. et al. A chromosome-level genome assembly of the cabbage aphid Brevicoryne brassicae. Sci Data 12, 167 (2025). https://doi.org/10.1038/s41597-025-04501-2

Download citation

Received: 08 May 2024
Accepted: 20 January 2025
Published: 28 January 2025
Version of record: 28 January 2025
DOI: https://doi.org/10.1038/s41597-025-04501-2

This article is cited by

Chromosome-level genome assembly of soybean aphid
- Shaolong Qiu
- Ningning Wu
- Jixing Xia
Scientific Data (2025)
A chromosome-level genome assembly of the greenbug, Schizaphis graminum
- Huihui Zhang
- Chuting Shi
- Zewen Liu
Scientific Data (2025)