Abstract
Maize leaf and sheath spot disease caused by Epicoccum sorghinum is an emerging disease of maize in China. To disentangle the molecular pathogenesis, we sequenced the genome and infection transcriptomes of the E. sorghinum strain NJC07. The genome was sequenced on Oxford Nanopore GridION and Illumina NovaSeq 6000, producing a near-complete gapless nuclear genome assembly of 32.69 Mb at 285.20-fold depth, comprising 23 contigs (including 12 full-length chromosomes) with an N50 contig number/length of 6/1.66 Mb, and a complete mitochondrial genome assembly of 61.24 kb. The nuclear genome contains 11,779 protein-coding genes, including those predicted to encode potential virulence/pathogenicity factors, such as effectors and carbohydrate-active enzymes. Temporal RNA-Seq analysis revealed that 4,058 of the 11,779 genes were induced during maize infection, with a subset potentially implicated in fungal invasion and colonization of maize plants. Together, the genomic and transcriptomic data generated in the study provide a valuable foundation for the functional analysis of virulence and pathogenicity factors, offering critical insights into the molecular mechanisms driving E. sorghinum pathogenesis on maize.
Similar content being viewed by others
Background and Summary
Maize (Zea mays L.) is one of the most important staple food crops, along with rice and wheat. With an annual production of 1.16 billion metric tons, it ranks second only to wheat in terms of acreage. Maize provides about 20% of the daily caloric intake for people in Africa and Mesoamerica. Besides being a primary food crop, it is also used as animal feed and a source of biofuel and biodegradable plastics1.
Fungal diseases are among the risk factors that threaten maize production worldwide and account for an estimated yield loss of 10.6% in North America2 and 12% in Asia2,3,4. Recently, Chen et al. reported Epicoccum sorghinum Sacc. (syn. Phoma sorghina) causing maize leaf and sheath spot disease in China, the second-largest producer of maize5. E. sorghinum is an ascomycete fungus in the sorghum grain-mold complex6 that causes significant economic losses by impairing crop yields, and reducing seed viability and kernel weight7. This fungus produces a mycotoxin called tenuazonic acid, which makes sorghum grains unfit for consumption as food or feed8,9. The pathogen appears to employ a necrotrophic infection strategy to colonize maize plants through wounds, causing leaf and sheath spot disease. It has a broad host range, and in the last five years, E. sorghinum has been reported to cause leaf spot disease on several economically significant crops in China, including rice (Oryza sativa), wheat (Triticum aestivum) and Chinese flowering cabbage (Brassica rapa ssp. parachinensis)10,11,12.
To unravel the molecular pathogenesis of the maize-infecting strain NJC07 of E. sorghinum, we sequenced its genome on the Oxford Nanopore GridION and Illumina NovaSeq. 6000 platforms, generating 908,875 long reads (totaling 6,493,948,057 bp) and 23,800,000 paired-end 150 bp reads (totaling 3,570,000,000 bp), respectively. The Illumina paired-end read dataset carries 3,104,162,667 k-mers of 17 bp length with an average k-mer depth of 88, suggesting an estimated genome size of 35,276,412 bp. Based on the predicted genome size, the Oxford Nanopore and Illumina reads were estimated to have provided 184.10- and 101.20-fold genomic coverage, respectively. The Oxford Nanopore reads were assembled into contigs, which were subsequently corrected for sequencing errors using the Illumina paired-end reads, producing a 32,694,839 bp nuclear genome distributed among 22 contiguous sequences (contigs), with an N50 contig length of 1,661,624 bp and an N90 contig length of 1,010,772 bp. Therefore, the assembled genome represents 93.08% of the NJC07 genome (Table 1). Notably, the genome assembly contains 12 telomere-to-telomere sequenced contigs (contig 4 [2,552,342 bp], contig 6 [2,123,232 bp], contig 9 [1,661,624 bp], contig 11 [1,559,590 bp], contig 12 [1,324,888 bp], contig 14 [1,357,136 bp], contig 15 [1,151,000 bp], contig 16 [1,058,833 bp], contig 17 [1,066,072 bp], contig 18 [1,094,033 bp], contig 19 [1,010,772 bp] and contig 20 [936,509 bp]), confirmed by the presence of telomeric repeats ([CCCTAA]n or [TTAGGG]n) at their 5’- and 3’-ends, thus indicating full-length chromosomes (Fig. 1A). Additionally, we identified contig 23 (61,235 bp) in the genome assembly, representing the full-length mitochondrial genome of NJC07. The mitochondrion genome was annotated using MFannot (Université de Montréal) with parameters specific to yeast mitochondrion genes, revealing a total of 83 genes, including 24 conserved protein-coding genes (2 atp6, 2 cob, 2 cox1, 2 cox2, 2 cox3, 2 nad1, 2 nad2, 2 nad3, 2 nad4, 2 nad4L, 2 nad5 and 2 nad6), 4 ribosomal genes (rnl, rns and 2 rnpB) and 51 tRNAs (Fig. 1B, Supplementary Table 1).
The Epicoccum sorghinum strain NJC07 genome. (A) Circos plot illustrating the nuclear genome (32,694,839 bp), with tracks from the outermost (A) to innermost (H) representing various genomic and transcriptome features: contigs, GC content, gene content, effectors, biosynthetic gene clusters, differentially expressed genes (DEGs) at 48 hours post-inoculation (hpi) and DEGs at 96 hpi. Contigs marked with asterisks represent full-length chromosomes sequenced from telomere to telomere. Effectors are denoted by red dots (≤3% cysteine residues) and blue dots (cysteine-rich effectors with >3% cysteine residues). CAZymes are represented by red dots (cellulolytic), brown dots (hemicellulolytic), purple dots (ligninolytic), blue dots (pectinolytic), teal dots (starch-degrading) and green dots (inulin-degrading). DEGs at 48 and 96 hpi are shown as red dots (upregulated), green dots (downregulated), and non-DEGs are denoted by grey dots. (B) Circos plot depicting the mitochondrial genome (61,235 bp), with tracks from the outermost to innermost showing contig 23, GC content and gene content.
The assembled nuclear genome of NJC07 (32.69 Mb) is comparable in size to the sorghum-infecting strain USPMTOX48 (391 contigs; 33.4 Mb)13 and the sugarcane-infecting strain BS2-1 (22 contigs; 33.12 Mb)14 of E. sorghinum. To assess the genetic diversity within E. sorghinum, we conducted genome similarity and synteny analyses among NJC07, USPMTOX48 and BS2-1 infecting distinct hosts. We created a comprehensive alignment of the whole genomes of these three strains using CLC Genomics Workbench 20.0.4 with 50 bp seeds and their extensions with HOXD scoring matrix15 and ≥100-bp alignment block. Pairwise distance matrices among the genomes were computed from the alignment blocks and expressed as alignment percentage (AP) and average nucleotide identity (API). NJC07 shares 99.04% (AP) of its genome with BS2-1, exhibiting a 93.89% nucleotide similarity (API), and shares 91.48% (AP) of its genome with UPSMTOX48, displaying an 82.08% nucleotide similarity (API) (Fig. 2A). Whole-genome synteny analysis revealed that although the genomes have a similar architecture, they display a few genomic arrangements (Fig. 2B). To map these genome rearrangements, pairwise whole-genome dot plots were constructed. The NJC07/BS2-1 whole-genome plot shows that both genomes are collinear across their lengths; however, two genomic rearrangements were observed. NJC07-contig 1 (5,848,000 bp) aligns with contig 16 (2,497,106 bp), contig 19 (2,072,022 bp) and contig 18 (1,920,476 bp) of BS2-1, which carry the 5’ telomeric repeats, suggesting that the extremely large contig 1 contains at least three chromosome sequences (Fig. 2C). An inversion between the 3’-end of NJC07-contig 18 (full-length chromosome; 1,094,033 bp) and full-length BS2-1-contig 7 (473,309 bp) was also detected, attributed to a recombination event likely facilitated by repetitive DNA elements or transposons present at the 3’-end, suggesting a mechanism of genomic plasticity that may contribute to the adaptive evolution of these organisms (Fig. 2C). The NJC07/UPSMTOX48 whole-genome dot plot illustrates that both genomes are collinear across their lengths, except for several small-length INDELS and inversions (Fig. 2D).
Comparative genome analysis of the Epicoccum sorghinum strains infecting maize (NJC07), sorghum (USPMTOX48) and sugarcane (BS2-1). (A) average nucleotide identity (i.e., the average percentage of identical nucleotides within the aligned regions of two genomes; upper half diagonal) and alignment percentage (i.e., the average percentage of the aligned regions of two genomes; lower half diagonal); (B) genome synteny analysis of the three strains. Colored regions within the contigs illustrate local alignment synteny blocks. Lines of matching colors connect synteny blocks across the three genomes. (C) The whole-genome dot plot shows collinearity between the NJC07 and BS2-1 genomes. The blue diagonal line indicates collinearity, while interruptions along the line represent genomic rearrangements, such as inversions. (D) The whole-genome dot plot displays collinearity between the NJC07 and USPMTOX48 genomes.
The NJC07 genome contains 52.04% GC content and relatively lower repetitive DNA elements (689,318 bp or 2.10% of the genome; e.g., tandem repeats [SSR] and interspersed repeats [transposons, SINE and LINE]) (Supplementary Table 2). Genome annotation, i.e., mapping coding and non-coding genes in a genome, requires masking these repetitive DNA elements in the genome assembly to ensure accurate gene prediction. Therefore, we masked the NJC07 genome using RepeatMasker with the RepBase library of fungal-specific repetitive DNA elements. The masked NJC07 genome was annotated for coding genes utilizing three independent approaches, followed by their reconciliation. In the first approach, gene models were predicted ab initio using five software programs, resulting in 9,030 (Augustus), 8,308 (Genscan), 13,237 (GeneID), 10,446 (GlimmerHMM) and 10,715 (SNAP) gene models. In the second approach, gene models were predicted using GeMoMa by leveraging homology to the proteins encoded by individual exons of the genes from the evolutionarily closely related species E. nigrum, identifying 11,017 gene models. In the third approach, RNA-Seq data originating from NJC07 mycelia and NJC07-infected maize leaf tissues were utilized through PASA, which exploited spliced alignments of the transcripts onto the NJC07 genome to model gene structures, and TransDecoder, which identifies coding regions within genes based on transcript sequences. PASA and TransDecoder yielded 26,460 and 17,114 gene models, respectively. Eventually, the gene models predicted by these three approaches were reconciled using EVidenceModeler, resulting in a final gene count of 11,779 (Table 1, Supplementary Table 3, Supplementary Figure 1). Further, the masked NJC07 genome was annotated for non-coding genes using tRNAscan-SE and Infernal, identifying 168 tRNAs (Supplementary Table 4) and 60 rRNAs (Supplementary Table 5).
To assess the genic completeness of the NJC07 genome, we performed the Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis using 290 BUSCO genes specific to fungi. The analysis showed that the 286 BUSCO genes were present among 11,779 genes, indicating that the NJC07 assembled genome carries 98.62% genic content and, therefore, represents a high-quality genome assembly, with near-completeness at the genic level (Table 1). The genic region (22,392,904 bp) represents 68.20% of the genome (32,835,252 bp), with an average gene size of 1,899 bp; the exonic (31,243 exons; 20,916,969 bp) and intronic (19,456 introns; 1,475,935 bp) regions represent, respectively, 63.70% and 4.49% of the genome, with 2.65 exons (an average exon length = 669.49 bp) and 1.65 introns (average exon length = 75.86 bp) per gene. Interestingly, despite carrying the same GC content (52%) and sharing over 91% of their genomes, the maize-infecting strain NJC07 possesses 2,291 and 3,916 more genes than the sorghum-infecting strain USPMTOX48 of E. sorghinum (9,495) and sugarcane-infecting strain BS2-1 (7,870) of E. sorghinum13,14. This discrepancy in gene numbers might be attributed to the fact that the USPMTOX48 and BS2-1 genomes were annotated de novo.
To globally map the transcription regulation of the NJC07 genes (11,779 genes) during maize infection, we performed RNA-Seq of the E. sorghinum -infected maize leaf tissues collected at 48 and 96 hpi, as well as E. sorghinum vegetative mycelia. Comparative RNA-Seq analysis of the infected tissues with the mycelia revealed the differential expression of 7,794 genes during in planta infection (log2fc ≥ ± 1.5, q < 0.05), with 6,202 differentially expressed genes (DEGs; 52.63%; Fig. 3A) at 48 hpi and 6,922 DEGs (58.74%; Fig. 3B) at 96 hpi (Fig. 1A, Supplementary Table 6). Of the 7,794 DEGs, 5,330 were shared by 48 and 96 hpi (Fig. 3C). Among DEGs, 3,357 and 3,658 (representing 4,058 genes) were upregulated at 48 and 96 hpi, respectively (log2fc ≥ 1.5, q < 0.05) (Fig. 1A, Fig. 3A–C, Supplementary Table 6).
Infection transcriptomes of the Epicoccum sorghinum-maize interaction. Volcano plots showing differentially expressed genes (DEGs) during 48 (A) and 96 (B) hours post-inoculation (hpi). Red and blue dots indicate the up- and down-regulation of DEGs during maize infection by E. sorghinum (log2fc = 1.5, q < 0.05). (C) Venn diagram depicting unique and overlapping DEGs at 46 and 96 hpi.
Methods
Fungal and plant material
The E. sorghinum strain NJC07 was isolated from the infected maize leaf sheaths in 2020 from a maize field located in Nantong municipality, Jiangsu Province, China6. The strain was routinely maintained on potato dextrose agar (PDA) plates under the 12-h cycle of light and darkness at 25°C. The leaf and sheath spot-susceptible maize inbred line B73 was cultivated in a growth chamber as described previously16.
Genome sequencing
Fungal mycelia were scraped by flooding a five-day-old 9 cm PDA culture plate of NJC07 with five ml of ddH2O, and one ml of the harvested mycelia was transferred to a 500 ml flask containing 100 ml of liquid complete medium, supplemented with 50 µg/ml of chloramphenicol (Lablead Biotech, Beijing, China) in an orbital shaker (set at 120 rpm) for two days under continuous light at 25°C. Mycelial balls were collected by filtering the liquid culture through a one-layered Miracloth (EMD Millipore, Billerica, USA). Genomic DNA (gDNA) was extracted from the mycelial balls using ENZA Fungal DNA Kit (Omega Bio-Tek, Norcross, USA). The quantity and quality of gDNA were assessed on QuantiFluor Fluorescence Quantitative System (Promega, Madison, USA) using Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, Waltham, USA). The integrity of gDNA was checked by running the 200 ng gDNA sample on a 1% agarose gel.
For Illumina sequencing, one µg high-quality gDNA was sheared using Covaris E220 (Covaris, Brighton, UK). The resulting gDNA fragments were subjected to size selection (~400 bp) through BluePippin (Sage Science, Beverly, USA). Next, the gDNA fragments were end-repaired (blunting), followed by 3’ dA-tailing using the NEBNext End Repair/dA-Tailing module (New England Biolabs, Ipswich, USA). The Illumina adapters were then ligated to the 400 bp end-repaired gDNA fragments to generate a paired-end library using TruSeq DNA LT Kit (Illumina, San Diego, USA). The quality and size of the library were evaluated on BioAnalyzer 2100 using the Agilent 1000 DNA chip (Agilent Technologies, Palo Alto, USA). Five nM of the paired-end library was loaded on NovaSeq. 6000 (Illumina, San Diego, USA), generating 2 × 150 bp sequence reads.
For Oxford Nanopore sequencing, one µg high-quality, high molecular weight gDNA was fragmented using Covaris E220 (Covaris, Brighton, UK). The large gDNA fragments were selected from the one µg gDNA fragment sample using BluePippin (Sage Science, Beverly, USA). The ligation of sequencing adapters to these large end-repaired and 3’ dA-tailed large gDNA fragments was performed using the SQK-LSK109 ligation kit following the manufacturer’s instructions (Oxford Nanotechnology, Oxford, UK). After the adapter ligation, the gDNA library was purified using AMPure XP beads, followed by quantification on QuantiFluor Fluorescence Quantitative System (Promega, Madison, USA) using Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, Waltham, USA). The library was sequenced on GridION (Oxford Nanotechnology, Oxford, UK), generating long reads up to 157.84 kb.
Genome size estimation and genome assembly
The paired-end reads (Illumina reads) were processed using AdapterRemoval v217 to eliminate adapters and low-quality bases. SOAPec v2.018 was employed to rectify identified sequencing errors based on the k-mer frequency set to 17 in the paired-end reads. Trimmomatic v0.3619 was used to clip low-quality reads, applying a sliding window of five nucleotides to remove bases with a Phred quality score ≤ 30. Jellyfish v2.020 was used to count K-mers (set to 17 bp) in the high-quality paired-end reads, thereby estimating the size of the E. sorghinum NJC07 genome (genome size = [genome length - k-mers] +1).
The long-read assemblers Falcon21 and Canu22 were used to de novo assemble the long Oxford Nanopore sequence reads, which were corrected using the Illumina paired-end reads by Pilon v1.1823.
Prediction of the repetitive elements
We used LTR_FINDER v1.05, MITE-Hunter v1.0.0r, RepeatScout v1.0.5 and PILER-DF v2.4 to catalog transposable DNA elements in the E. sorghinum NJC07 genome, based on structural homology and de novo prediction. The identified transposable DNA elements were classified using PASTEClassifier v1.0 and merged with the RepBase library of fungal-specific repetitive DNA elements24. Finally, RepeatMasker v4.0.625 utilizing the RepBase library was employed to predict the transposable DNA elements and mask the genome.
Genome annotation: Protein-coding gene models
We used three approaches to annotate the E. sorghinum NJC07 masked genome: ab initio-based prediction using Augustus v2.426, Genscan27, GeneID v1.428, GlimmerHMM v3.0.429 and SNAP v2006.07.2830; evidence (RNA-Seq) based prediction using PASA v2.0.231 and TransDecoder v2.032; and homology (E. nigrum)-based prediction using GeMoMa v1.3.133. The gene models predicted from these approaches were integrated using EVidenceModeler v2.0.232. Their predicted proteins were BLASTed against the seven databases, GO, KEGG, KOG, Pfam, Swiss-Prot, TrEMBL and NCBI-Nr, to annotate the gene models using an e-value of 1e-5 functionally.
Evaluation of genome completeness
The E. sorghinum NJC07 nuclear genome carries 11,779 protein-coding gene models. These gene models were subjected to Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis using BUSCO v2.034 with the fungi_odb9 database containing 290 BUSCOs specific to fungi. The proportion of mapped BUSCOs was used as an indicator of the genome’s completeness.
Genome annotation: Non-coding gene models
tRNAscan-SE35 and Infernal36 were used to predict tRNAs and rRNAs in the E. sorghinum NJC07 genome. The non-coding gene models other than rRNA and tRNA were predicted by BLASTing the E. sorghinum NJC07 genome against the Rfam database.
RNA-Seq
The third leaves of the V3-stage B73 plants were inoculated with 6 mm mycelial plugs taken from the leading edges of seven-day-old PDA culture plates of NJC07, following a randomized complete block design with three biological replications. The inoculated plants were placed in a growth chamber set to 25°C, with a 16-h light/dark photoperiod and 100% relative humidity. The infected tissues were harvested at 48 and 96 hpi and, along with the NJC07 mycelia, were used in RNA-Seq.
Total RNA was isolated from 100 mg flash-frozen B73 infected leaf tissues and mycelia using the RNAprep Pure Plant Kit following the manufacturer’s instructions (Tiangen, Beijing, China). RNA concentration and purity were estimated on NanoDrop 2000 (Thermo Fisher Scientific, Wilmington, USA). RNA integrity was evaluated on Agilent Bioanalyzer 2100 utilizing the RNA Nano 6000 Assay Kit (Agilent Technologies, Palo Alto, USA). mRNA was isolated from one µg of the total RNA (RNA integrity number > 7; rRNA 28S/18S ratio ≥ 1; OD260/280 ≥ 1.89; and OD260/230 ≥ 2.2) using oligo(dT)-tethered magnetic beads. mRNA fragmentation was carried out in NEBNext First Strand Synthesis Reaction Buffer using Covaris E220 (Covaris, Brighton, UK). First-strand cDNAs were synthesized using the fragmented mRNAs as the template, random hexamers as primers and M-MuLV Reverse Transcriptase. Second-strand cDNA synthesis was then performed by adding PCR buffer, dNTPs, RNase H and DNA polymerase I. The cDNAs were purified using AMPure XP beads (Beckman Coulter, Beverly, USA). The cDNA fragments were blunted using exonuclease, followed by 3’ dA-tailing using the NEBNext End Repair/dA-Tailing module (New England Biolabs, Ipswich, USA) and NEBNext adaptor ligation. To selectively enrich cDNA fragments of around 240 bp in length, the end-repaired, adapter-ligated cDNA fragments were size-selected using the AMPure XP system. Next, the USER enzyme (New England Biolabs, Ipswich, USA) was added to the size-selected cDNA fragments. PCR amplification was performed utilizing Phusion high-fidelity DNA polymerase and universal PCR primers (carrying barcodes and annealing to the ligated adaptors). Finally, the PCR products were purified employing the AMPure XP system, and the libraries were assessed for quality on Agilent Bioanalyzer 2100. The resulting nine paired-end libraries (5 nM/library) were pooled and loaded onto NovaSeq X Plus (Illumina, San Diego, USA), producing 2 × 150 bp sequence reads.
Raw paired-end reads (2 × 150 bp) were processed and clipped using AdapterRemoval v217 and Trimmomatic v0.3619, respectively. The high-quality paired-end reads (minimum length ≥ 50 with an average Phred quality score ≥ 33 on a sliding window of five nucleotides) were mapped onto the E. sorghinum NJC07 genome using HISAT v2.0.437, and the resulting SAM files were converted to BAM files using SAMtools38. Transcripts were assembled using StringTie v2.2.139, and DESeq. 240 was employed to estimate transcript abundance expressed in fragments per kilobase per million mapped reads (FPKM). The gene expressions (Log2 fold changes) were determined from the FPKM values of genes at 48 and 96 hpi relative to the vegetative mycelia.
Data Records
The E. sorghinum NJC07 genome assembly is available at NCBI GenBank under the accession GCA_040893085.141. The nuclear genome is distributed among 22 contigs (contigs 1 through 22; 32,694,839 bp), while the mitochondrial genome is organized into a single contig (contig 23; 61,235 bp). The high-quality RNA-Seq reads originating from the sequencing of E. sorghinum NJC07 mycelia and in planta fungal (E. sorghinum NJC07) cDNAs have been deposited in the NCBI-Sequence Read Archive under the following accessions: SRR29843002 (48 hpi, Es-48hpi-BR2_1/Es-48hpi-BR2_2)42, SRR29843003 (48 hpi, Es-48hpi-BR1_1/Es-48hpi-BR1_2)43, SRR29843004 (Vegetative mycelia, Es-myc-BR3_1/Es-myc-BR3_2)44, SRR29843005 (Vegetative mycelia, Es-myc-BR2_1/Es-myc-BR2_1)45, SRR29843006 (96 hpi, Es-96hpi-BR3_1/Es-96hpi-BR3_2)46, SRR29843007 (96 hpi, Es-96hpi-BR2_1/Es-96hpi-BR2_2)47, SRR29843008 (96 hpi, Es-96hpi-BR1_1/Es-96hpi-BR1_2)48, SRR29843009 (48 hpi, Es-48hpi-BR3_1/Es-48hpi-BR3_2)49 and SRR29843010 (Vegetative mycelia, Es-myc-BR1_1/Es-myc-BR1_2)50.
Technical Validation
We employed Trimmomatic v0.3619 and SOAPec v2.018 to trim low-quality Illumina reads and correct sequencing errors, respectively. Jellyfish v2.020 was utilized to count 17 bp k-mers within the high-quality paired-end reads, facilitating the estimation of the genome size of E. sorghinum NJC07. The genome size was calculated using the formula: genome size = (genome length − k-mers) + 1. The completeness of the E. sorghinum NJC07 genome assembly was assessed using BUSCO analysis with version 2.034, with the fungi_odb9 database, which includes 290 BUSCOs specific to fungi. For RNA-Seq, samples were collected following a randomized complete block design with three biological replications to ensure statistical robustness and reproducibility of the results.
Code availability
The software programs utilized in this study were referenced in the Methods section. If detailed parameters for the software were not specified, default parameters were used.
References
Ma, W. et al. Molecular genetics of anthracnose resistance in maize. J. Fungi. 8(5), e540 (2022a).
Mueller, D. S. et al. Corn yield loss estimates due to diseases in the United States and Ontario, Canada, from 2016 to 2019. Plant Health Prog. 21, 238–247 (2020).
Ma, W. et al. First report of Didymella glomerata causing Didymella leaf blight on maize. Plant Dis. 106(9), 2522 (2022b).
Mahuku, G. Maize pathology in Asia: opportunities and challenges for breeding disease-resistant maize. Proceedings of the Asian Regional Maize Workshop 10, 361–366 (2010).
Chen, T. et al. First report of Epicoccum sorghinum causing Leaf sheath and leaf Spot on Maize in China. Plant Dis. 105(11), 3741 (2021).
Navi, S. S. et al. Effects of wetness duration and grain development stages on sorghum grain mold infection. Plant Dis. 89, 872–878 (2005).
Forbes, G.A., Bandyopadhyay, R., Garcia, G. A review of sorghum grain mold. ICRISAT (1992).
Shephard, G. S. et al. Reversed-phase high-performance liquid chromatography of tenuazonic acid and related tetramic acids. J. Chromatogr. 566, 195–205 (1991).
Shi, X. et al. Occurrence and management of the emerging pathogen Epicoccum sorghinum. Plant Dis. https://doi.org/10.1094/PDIS-03-24-0711-FE (2024).
Liu, L. M. et al. First report of leaf spot disease on rice caused by Epicoccum sorghinum in China. Plant Dis. 104(10), 2735 (2020).
Li, Q. et al. Assessing genetic resistance in wheat to black point caused by six fungal species in the Yellow and Huai wheat area of China. Plant Dis. 104(12), 3131–3134 (2020).
Yu, L. et al. First report of leaf spot caused by Epicoccum sorghinum on Chinese flowering cabbage (Brassica parachinensis) in China. Plant Dis. 103(11), 2966 (2019).
Oliveira, R. C. et al. Draft genome sequence of sorghum grain mold fungus Epicoccum sorghinum, a producer of tenuazonic acid. Genome Announc. 5, e01495–16 (2017).
Bao, Y. et al. Genome sequence of Phoma sorghina var. saccharum that causes sugarcane twisted leaf disease in China. Mol. Plant-Microbe Interact. 33(9), 1092–1094 (2020).
Chiaromonte, F., Yap, V.B. & Miller, W. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 115–126 (2002).
Ma, W. et al. CRISPR/Cas9-mediated deletion of large chromosomal segments identifies a minichromosome modulating the Colletotrichum graminicola virulence on maize. Int. J. Biol. Macromol. 245, 125462 (2023).
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes. 9, 88 (2016).
Luo, R. B. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 1(1), 18 (2012).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30(15), 2114–2120 (2014).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27(6), 764–770 (2011).
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 13, 1050–1054 (2016).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Walker, B. J. et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 9(11), e112963 (2014).
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. http://repeatmasker.org (accessed 29 March 2021).
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 32, W309–W312 (2005).
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Current Protoc Bioinformatics. 4, 3 (2007).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20, 2878–2879 (2004).
Korf, I. Gene finding in novel genomes. BMC Bioinformatics. 5, 59 (2004).
Campbell, M. A. et al. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics. 7, 327 (2006).
Haas, B. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19), 3210–3212 (2015).
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44(W1), W54–57 (2016).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29(22), 2933–2935 (2013).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12(4), 357–360 (2015).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16), 2078–2079 (2009).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295 (2015).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq. 2. Genome Biol. 15, 550 (2014).
Bhadauria, V. Epicoccum sorghinum strain NJC07, whole genome sequencing project. Genbank https://identifiers.org/ncbi/insdc.gca:GCA_040893085.1 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR29843002 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843003 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843004 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843005 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843006 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843007 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843008 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843009 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843010 (2024).
Acknowledgements
This research was made possible through the generous support of the National Natural Science Foundation of China (Grant No. 32172363; V.B.) and the Chinese Universities Scientific Fund (Grant No. 10092004; V.B.). The funders had no roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
V.B. designed the research; P.L. provided the E. sorghinum strain NJC07; V.B. performed the research and data analysis; G.L. and X.G. participated in the sequencing of the E. sorghinum strain NJC07; V.B. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bhadauria, V., Li, G., Gao, X. et al. Near-complete genome and infection transcriptomes of the maize leaf and sheath spot pathogen Epicoccum sorghinum. Sci Data 12, 261 (2025). https://doi.org/10.1038/s41597-025-04564-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-04564-1