Background & Summary

Cauliflower (Brassica oleracea var. Botrytis L.) is a cultivated variety of B. oleracea, belonging to the family Brassicaceae, it was domesticated from wild cabbage through selective breeding1,2,3. In recent years, cauliflower production has faced increasing challenges from multiple diseases, including black rot (Xanthomonas campestris pv. campestris), downy mildew (Hyaloperonospora parasitica), black spot (Alternaria brassicicola), soft rot (Pectobacterium spp.), various viral infections, and clubroot (Plasmodiophora brassicae), which are known to adversely affect both yield and quality4,5,6. Therefore, there is a critical need to enrich cauliflower germplasm diversity and develop cultivars with enhanced resistance to pests and diseases. However, prolonged artificial selection in breeding programs has progressively reduced the genetic diversity of resistance genes in current breeding lines, limiting the development of highly resistant cauliflower cultivars3. Identification of disease-resistant materials and associated resistance genes is essential for breeding cauliflower varieties with enhanced resistance7. In particular, the leaf cuticular wax layer has been reported to serve as a key physical barrier against pathogen entry, playing an important role in plant defense8,9. Among various plant defense mechanisms, the cuticular wax layer on leaves has been shown to act as a physical barrier that reduces pathogen invasion, and its association with disease resistance has been reported in maize10, tomato11, cassava12, and rapeseed13. In addition, plants with higher leaf wax content exhibit greater drought resistance than those with lower wax accumulation14,15,16. Leaf wax is primarily composed of compounds such as fatty acids, alcohols, ketones, and esters, and its biosynthesis is regulated by complex metabolic pathways, as shown in previous studies17,18. Modulating the expression of wax biosynthetic genes enhances cuticular wax deposition on the leaf surface, thereby strengthening the plant’s physical barrier against environmental stresses.

To date, most genomic studies on cuticular wax biosynthesis have focused on Arabidopsis thaliana, where over 190 genes involved in wax biosynthesis and transport have been identified18,19,20, and AtCER1 and AtCER4 have been well characterized as key genes involved in wax biosynthesis in leaves21,22,23,24. In Brassica rapa, the CER1 gene has been implicated in the regulation of cuticular wax biosynthesis25. In Brassica napus, another member of the Brassicaceae family, the wax-related gene BnWIN2CO1 shows significantly different expression between wild-type and wax-deficient mutant plants26. This gene participates in several signaling pathways, including those associated with biotic and abiotic stress responses and abscisic acid biosynthesis27,28,29. However, studies on wax-related regulatory genes in cauliflower have not yet been reported.

Long non-coding RNA (lncRNA) are transcripts longer than 200 bp that lack a long open reading frame (ORF) and have no protein-coding potential. In plants, lncRNA serves as key regulators of gene expression, and participate in diverse biological processes such as growth, development, and responses to environmental stress30,31. It exerts its regulatory functions through multiple mechanisms, such as transcriptional and post-transcriptional regulation, as well as epigenetic modifications involving DNA methylation, antisense transcription and histone modification32,33. In cauliflower, lncRNA plays regulatory roles in transcription and may also participate in post-transcriptional gene regulation through interactions with miRNA34. However, the regulatory mechanism of lncRNA in wax deficiency in cauliflower has not yet been reported. In this study, based on the breeding process of cauliflower germplasm resources, a wax-deficient mutant plant and its sister lines were selected. This study provides scientific evidence for the localization and cloning of wax-related genes, establishes a theoretical foundation for elucidating the disease resistance mechanisms associated with cauliflower wax, and offers valuable insights for future research on genetic mechanisms, genetic improvement and breeding strategies.

Methods

Sample collection

Two cauliflower lines were obtained from the Institute of Tropical Eco-agriculture, Yunnan Academy of Agricultural Sciences, Yuanmou, Yunnan, China. The wax-deficient mutant cauliflower (12-2-1, WL) and the wild-type (12-2-2, YL) are sister cauliflower varieties (Fig. 1). All seedlings were grown in a greenhouse under a 14 h light/10 h dark cycle at 25 °C during the light period and 20 °C during the dark period. Leaf tissues were collected from 30-day-old seedlings and immediately frozen in liquid nitrogen, then stored at −80 °C, with three biological replicates per sample.

Fig. 1
figure 1

Wild type (YL) and mutant type (WL) cauliflower plants. White bars represent 5 cm.

RNA extraction and library preparation

Total RNA was extracted using Trizol Reagent (Invitrogen), and its concentration, purity, and integrity were assessed with a NanoDrop spectrophotometer (Thermo Scientific). Three micrograms of high-quality RNA were used for library preparation. mRNA was enriched using poly-T oligo-attached magnetic beads and fragmented in Illumina proprietary buffer under elevated temperature. First-strand cDNA was synthesized with random primers and SuperScript II, followed by second-strand synthesis using DNA Polymerase I and RNase H. After end repair and 3′ adenylation, Illumina PE adapters were ligated. Fragments of 400–500 bp were selected using the AMPure XP system (Beckman Coulter, Beverly, CA, USA), and the adapter-ligated DNA was enriched by 15 cycles of PCR. Final libraries were purified, quantified using an Agilent Bioanalyzer 2100 (Fig. S1), and sequenced on the Illumina NovaSeq 4000 platform (150 bp paired-end reads) by Panomix Biomedical Tech Co., Ltd. (Suzhou, China).

Quality control

Raw data were subjected to quality control and filtration using Fastp (version 0.24.0)35. The first 13 bases from the 5′ end of forward and reverse reads were removed, low-quality bases (Q < 20) were trimmed, and reads shorter than 15 bp were discarded using the default settings. Finally, the quality of the trimmed and filtered reads was reassessed with FastQC (version 0.11.9) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) (Fig. S2). All downstream analyses were performed based on high-quality clean reads.

Transcriptome analysis workflow

The clean reads were then aligned to the reference genome (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000695525.1/) using Hisat2 (version 2.2.1)36. Then, the SAM files output from Hisat2 were sorted and converted to BAM files using Samtools (version 1.21). Finally, the gene read counts were calculated using FeatureCounts program (version 2.1.0)37, and gene expression levels (FPKM, Fragments Per Kilobase of transcript per Million mapped reads) were calculated using Perl scripts. These genes were retained if they had FPKM > 0.5 in at least two replicates of a sample.

A total of 43.13 Gb of raw sequencing data was generated, from which more than 42.38 Gb of high-quality clean reads were retained after quality filtering, corresponding to approximately 285 million reads. The filtered reads exhibited quality scores of over 98% at Q20 and over 95% at Q30, with GC content ranging from 46.56% to 47.41% (Table 1).

Table 1 Summary of total reads, total bases, Q20, Q30, GC content and mapped rate of all RNA-seq samples.

A bioinformatics pipeline for lncRNA

To identify lncRNA transcripts, novel transcript assemblies were generated and subsequently merged (-F 0 -T 0) using Stringtie (version 2.2.3)38 and Gffcompare (version 0.12.9)39. Additionally, multiple filtering criteria were applied to retain accurate lncRNA loci information. First, new transcripts classified as “i, u, p, x” with no coding potential were selected. Second, transcripts shorter than 200 bp were excluded. To predict the protein-coding potential of candidate lncRNA transcripts, six computational approaches were used: CPC2 (score < 0.5)40, CNCI (score < 0)41 and LGC42, and the open reading frames (ORF) length using OrfPredictor (ORF < 300 bp)43, and BLAST against the Pfam (https://ftp.ebi.ac.uk/pub/databases/Pfam/)44 and SwissProt (https://ftp.ebi.ac.uk/pub/databases/uniprot)45 protein databases that were used to predict candidate lncRNA transcripts protein-coding ability. Finally, expression levels were also considered, and lncRNA with an FPKM > 0.5 in at least two samples were retained.

A total of 1,092 lncRNA were identified using CPC2, CNCI, LGC, and OrfPredictor, in combination with homology searches against the Pfam and SwissProt protein databases (Fig. 2, 3a). Firstly, the length of lncRNA was shorter compared to mRNA (Fig. 3b). Then, the average transcript length of lncRNA was 1,112.65 bp (median: 445 bp), whereas the average length of mRNA was 1,687.56 bp (median: 1,493 bp). In addition, lncRNA contained fewer exons than mRNA, with a maximum exon number of 6 (Fig. 3c).

Fig. 2
figure 2

Number of lncRNA identified by different software tools.

Fig. 3
figure 3

Characterization analysis of lncRNA. (a) Characterization comparison of lncRNA and mRNA. From outside to inside: mRNA, 1092 lncRNA, lncRNA of WL, lncRNA of YL and DELs. (b) Comparison of mRNA and lncRNA transcript length. (c) Number of mRNA and lncRNA.

Data Records

The raw sequencing data have been deposited in NCBI under the BioProject accession number PRJNA118890446, Sequence Read Archive (SRA) accessions SRR31445557, SRR31445558, SRR31445559, SRR31445560, SRR31445561, SRR31445562. In the NCBI database, the sample names are labeled as MT and WT. For clarity, in our study, WL represents the MT sample (WL1, WL2 and WL3 corresponding to MT1, MT2 and MT3, respectively), while YL represents the WT sample (YL1, YL2 and YL3 corresponding to WT1, WT2 and WT3, respectively).

Technical Validation

Biological repetition

In this study, to improve the reliability of the results, samples were collected in three biological replicates.

Assessment of the quality of mRNA and lncRNA

In this study, a transcriptome assembly of cauliflower was performed. In total, 24,529 expressed genes were identified and included for subsequent analyses (Fig. 4 and Table S1). To assess the reliability and reproducibility of the transcriptome data, correlation analysis and principal component analysis (PCA) were conducted based on the expression profiles of all detected genes (Fig. 5a,b). Interestingly, the expressed genes in the WL and YL samples exhibited obvious dispersion. Similarly, we performed correlation and PCA analyses on the 1,092 expressed lncRNA (Table S2), the results demonstrated high reproducibility among the samples (Fig. 5c), with PCA1 and PCA2 accounting for 28.2% and 21.4% of the total variation, respectively (Fig. 5d). In summary, the expression levels of mRNA and lncRNA were significantly different between the WL and YL samples.

Fig. 4
figure 4

Expression levels of 24,529 mRNA under different samples.

Fig. 5
figure 5

Principal component analysis (PCA) and correlation analysis. (a) Correlation heatmap of mRNA. (b) PCA analysis of mRNA in WL and YL samples. (c) Correlation heatmap of lncRNA. (d) PCA analysis of lncRNA in WL and YL samples.

Functional annotation and enrichment analysis

Further information on the differentially expressed analysis, functional annotation and enrichment analysis of mRNA and lncRNA, including detailed methods, analysis results and datasets has been uploaded to Figshare database (https://doi.org/10.6084/m9.figshare.27727356)47.

Usage Notes

This dataset46 was generated from a collection of wax-deficient cauliflower lines. Under natural growth conditions, wild-type cauliflower leaves have a wax layer (YL, 12-2-2), whereas the wax-deficient cauliflower (WL, 12-2-1) is a naturally occurring mutant identified during breeding selection without any artificial intervention. The wax layer plays a crucial role in cauliflower growth and development, particularly in plant immunity and defense mechanisms. This dataset provides valuable insights into the relationship between wax deficiency and plant resistance during the natural breeding process of cauliflower.