Near-complete genome and infection transcriptomes of the maize leaf and sheath spot pathogen Epicoccum sorghinum

Bhadauria, Vijai; Li, Guangjun; Gao, Xinying; Laborda, Pedro

doi:10.1038/s41597-025-04564-1

Download PDF

Data Descriptor
Open access
Published: 13 February 2025

Near-complete genome and infection transcriptomes of the maize leaf and sheath spot pathogen Epicoccum sorghinum

Vijai Bhadauria ORCID: orcid.org/0000-0003-1982-0853^1,2,
Guangjun Li¹,
Xinying Gao¹ &
…
Pedro Laborda³

Scientific Data volume 12, Article number: 261 (2025) Cite this article

2475 Accesses
3 Citations
Metrics details

Subjects

Abstract

Maize leaf and sheath spot disease caused by Epicoccum sorghinum is an emerging disease of maize in China. To disentangle the molecular pathogenesis, we sequenced the genome and infection transcriptomes of the E. sorghinum strain NJC07. The genome was sequenced on Oxford Nanopore GridION and Illumina NovaSeq 6000, producing a near-complete gapless nuclear genome assembly of 32.69 Mb at 285.20-fold depth, comprising 23 contigs (including 12 full-length chromosomes) with an N₅₀ contig number/length of 6/1.66 Mb, and a complete mitochondrial genome assembly of 61.24 kb. The nuclear genome contains 11,779 protein-coding genes, including those predicted to encode potential virulence/pathogenicity factors, such as effectors and carbohydrate-active enzymes. Temporal RNA-Seq analysis revealed that 4,058 of the 11,779 genes were induced during maize infection, with a subset potentially implicated in fungal invasion and colonization of maize plants. Together, the genomic and transcriptomic data generated in the study provide a valuable foundation for the functional analysis of virulence and pathogenicity factors, offering critical insights into the molecular mechanisms driving E. sorghinum pathogenesis on maize.

Machine-learning predicts genomic determinants of meiosis-driven structural variation in a eukaryotic pathogen

Article Open access 10 June 2021

Large-scale genomic analyses with machine learning uncover predictive patterns associated with fungal phytopathogenic lifestyles and traits

Article Open access 11 October 2023

A high-quality genome assembly of Morinda officinalis, a famous native southern herb in the Lingnan region of southern China

Article Open access 01 June 2021

Background and Summary

Maize (Zea mays L.) is one of the most important staple food crops, along with rice and wheat. With an annual production of 1.16 billion metric tons, it ranks second only to wheat in terms of acreage. Maize provides about 20% of the daily caloric intake for people in Africa and Mesoamerica. Besides being a primary food crop, it is also used as animal feed and a source of biofuel and biodegradable plastics¹.

Fungal diseases are among the risk factors that threaten maize production worldwide and account for an estimated yield loss of 10.6% in North America² and 12% in Asia^2,3,4. Recently, Chen et al. reported Epicoccum sorghinum Sacc. (syn. Phoma sorghina) causing maize leaf and sheath spot disease in China, the second-largest producer of maize⁵. E. sorghinum is an ascomycete fungus in the sorghum grain-mold complex⁶ that causes significant economic losses by impairing crop yields, and reducing seed viability and kernel weight⁷. This fungus produces a mycotoxin called tenuazonic acid, which makes sorghum grains unfit for consumption as food or feed^8,9. The pathogen appears to employ a necrotrophic infection strategy to colonize maize plants through wounds, causing leaf and sheath spot disease. It has a broad host range, and in the last five years, E. sorghinum has been reported to cause leaf spot disease on several economically significant crops in China, including rice (Oryza sativa), wheat (Triticum aestivum) and Chinese flowering cabbage (Brassica rapa ssp. parachinensis)^10,11,12.

To unravel the molecular pathogenesis of the maize-infecting strain NJC07 of E. sorghinum, we sequenced its genome on the Oxford Nanopore GridION and Illumina NovaSeq. 6000 platforms, generating 908,875 long reads (totaling 6,493,948,057 bp) and 23,800,000 paired-end 150 bp reads (totaling 3,570,000,000 bp), respectively. The Illumina paired-end read dataset carries 3,104,162,667 k-mers of 17 bp length with an average k-mer depth of 88, suggesting an estimated genome size of 35,276,412 bp. Based on the predicted genome size, the Oxford Nanopore and Illumina reads were estimated to have provided 184.10- and 101.20-fold genomic coverage, respectively. The Oxford Nanopore reads were assembled into contigs, which were subsequently corrected for sequencing errors using the Illumina paired-end reads, producing a 32,694,839 bp nuclear genome distributed among 22 contiguous sequences (contigs), with an N₅₀ contig length of 1,661,624 bp and an N₉₀ contig length of 1,010,772 bp. Therefore, the assembled genome represents 93.08% of the NJC07 genome (Table 1). Notably, the genome assembly contains 12 telomere-to-telomere sequenced contigs (contig 4 [2,552,342 bp], contig 6 [2,123,232 bp], contig 9 [1,661,624 bp], contig 11 [1,559,590 bp], contig 12 [1,324,888 bp], contig 14 [1,357,136 bp], contig 15 [1,151,000 bp], contig 16 [1,058,833 bp], contig 17 [1,066,072 bp], contig 18 [1,094,033 bp], contig 19 [1,010,772 bp] and contig 20 [936,509 bp]), confirmed by the presence of telomeric repeats ([CCCTAA]_n or [TTAGGG]_n) at their 5’- and 3’-ends, thus indicating full-length chromosomes (Fig. 1A). Additionally, we identified contig 23 (61,235 bp) in the genome assembly, representing the full-length mitochondrial genome of NJC07. The mitochondrion genome was annotated using MFannot (Université de Montréal) with parameters specific to yeast mitochondrion genes, revealing a total of 83 genes, including 24 conserved protein-coding genes (2 atp6, 2 cob, 2 cox1, 2 cox2, 2 cox3, 2 nad1, 2 nad2, 2 nad3, 2 nad4, 2 nad4L, 2 nad5 and 2 nad6), 4 ribosomal genes (rnl, rns and 2 rnpB) and 51 tRNAs (Fig. 1B, Supplementary Table 1).

Table 1 The Epicoccum sorghinum strain NJC07 nuclear genome assembly features.

Full size table

The assembled nuclear genome of NJC07 (32.69 Mb) is comparable in size to the sorghum-infecting strain USPMTOX48 (391 contigs; 33.4 Mb)¹³ and the sugarcane-infecting strain BS2-1 (22 contigs; 33.12 Mb)¹⁴ of E. sorghinum. To assess the genetic diversity within E. sorghinum, we conducted genome similarity and synteny analyses among NJC07, USPMTOX48 and BS2-1 infecting distinct hosts. We created a comprehensive alignment of the whole genomes of these three strains using CLC Genomics Workbench 20.0.4 with 50 bp seeds and their extensions with HOXD scoring matrix¹⁵ and ≥100-bp alignment block. Pairwise distance matrices among the genomes were computed from the alignment blocks and expressed as alignment percentage (AP) and average nucleotide identity (API). NJC07 shares 99.04% (AP) of its genome with BS2-1, exhibiting a 93.89% nucleotide similarity (API), and shares 91.48% (AP) of its genome with UPSMTOX48, displaying an 82.08% nucleotide similarity (API) (Fig. 2A). Whole-genome synteny analysis revealed that although the genomes have a similar architecture, they display a few genomic arrangements (Fig. 2B). To map these genome rearrangements, pairwise whole-genome dot plots were constructed. The NJC07/BS2-1 whole-genome plot shows that both genomes are collinear across their lengths; however, two genomic rearrangements were observed. NJC07-contig 1 (5,848,000 bp) aligns with contig 16 (2,497,106 bp), contig 19 (2,072,022 bp) and contig 18 (1,920,476 bp) of BS2-1, which carry the 5’ telomeric repeats, suggesting that the extremely large contig 1 contains at least three chromosome sequences (Fig. 2C). An inversion between the 3’-end of NJC07-contig 18 (full-length chromosome; 1,094,033 bp) and full-length BS2-1-contig 7 (473,309 bp) was also detected, attributed to a recombination event likely facilitated by repetitive DNA elements or transposons present at the 3’-end, suggesting a mechanism of genomic plasticity that may contribute to the adaptive evolution of these organisms (Fig. 2C). The NJC07/UPSMTOX48 whole-genome dot plot illustrates that both genomes are collinear across their lengths, except for several small-length INDELS and inversions (Fig. 2D).

The NJC07 genome contains 52.04% GC content and relatively lower repetitive DNA elements (689,318 bp or 2.10% of the genome; e.g., tandem repeats [SSR] and interspersed repeats [transposons, SINE and LINE]) (Supplementary Table 2). Genome annotation, i.e., mapping coding and non-coding genes in a genome, requires masking these repetitive DNA elements in the genome assembly to ensure accurate gene prediction. Therefore, we masked the NJC07 genome using RepeatMasker with the RepBase library of fungal-specific repetitive DNA elements. The masked NJC07 genome was annotated for coding genes utilizing three independent approaches, followed by their reconciliation. In the first approach, gene models were predicted ab initio using five software programs, resulting in 9,030 (Augustus), 8,308 (Genscan), 13,237 (GeneID), 10,446 (GlimmerHMM) and 10,715 (SNAP) gene models. In the second approach, gene models were predicted using GeMoMa by leveraging homology to the proteins encoded by individual exons of the genes from the evolutionarily closely related species E. nigrum, identifying 11,017 gene models. In the third approach, RNA-Seq data originating from NJC07 mycelia and NJC07-infected maize leaf tissues were utilized through PASA, which exploited spliced alignments of the transcripts onto the NJC07 genome to model gene structures, and TransDecoder, which identifies coding regions within genes based on transcript sequences. PASA and TransDecoder yielded 26,460 and 17,114 gene models, respectively. Eventually, the gene models predicted by these three approaches were reconciled using EVidenceModeler, resulting in a final gene count of 11,779 (Table 1, Supplementary Table 3, Supplementary Figure 1). Further, the masked NJC07 genome was annotated for non-coding genes using tRNAscan-SE and Infernal, identifying 168 tRNAs (Supplementary Table 4) and 60 rRNAs (Supplementary Table 5).

To assess the genic completeness of the NJC07 genome, we performed the Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis using 290 BUSCO genes specific to fungi. The analysis showed that the 286 BUSCO genes were present among 11,779 genes, indicating that the NJC07 assembled genome carries 98.62% genic content and, therefore, represents a high-quality genome assembly, with near-completeness at the genic level (Table 1). The genic region (22,392,904 bp) represents 68.20% of the genome (32,835,252 bp), with an average gene size of 1,899 bp; the exonic (31,243 exons; 20,916,969 bp) and intronic (19,456 introns; 1,475,935 bp) regions represent, respectively, 63.70% and 4.49% of the genome, with 2.65 exons (an average exon length = 669.49 bp) and 1.65 introns (average exon length = 75.86 bp) per gene. Interestingly, despite carrying the same GC content (52%) and sharing over 91% of their genomes, the maize-infecting strain NJC07 possesses 2,291 and 3,916 more genes than the sorghum-infecting strain USPMTOX48 of E. sorghinum (9,495) and sugarcane-infecting strain BS2-1 (7,870) of E. sorghinum^13,14. This discrepancy in gene numbers might be attributed to the fact that the USPMTOX48 and BS2-1 genomes were annotated de novo.

To globally map the transcription regulation of the NJC07 genes (11,779 genes) during maize infection, we performed RNA-Seq of the E. sorghinum -infected maize leaf tissues collected at 48 and 96 hpi, as well as E. sorghinum vegetative mycelia. Comparative RNA-Seq analysis of the infected tissues with the mycelia revealed the differential expression of 7,794 genes during in planta infection (log2fc ≥ ± 1.5, q < 0.05), with 6,202 differentially expressed genes (DEGs; 52.63%; Fig. 3A) at 48 hpi and 6,922 DEGs (58.74%; Fig. 3B) at 96 hpi (Fig. 1A, Supplementary Table 6). Of the 7,794 DEGs, 5,330 were shared by 48 and 96 hpi (Fig. 3C). Among DEGs, 3,357 and 3,658 (representing 4,058 genes) were upregulated at 48 and 96 hpi, respectively (log2fc ≥ 1.5, q < 0.05) (Fig. 1A, Fig. 3A–C, Supplementary Table 6).

Methods

Fungal and plant material

The E. sorghinum strain NJC07 was isolated from the infected maize leaf sheaths in 2020 from a maize field located in Nantong municipality, Jiangsu Province, China⁶. The strain was routinely maintained on potato dextrose agar (PDA) plates under the 12-h cycle of light and darkness at 25°C. The leaf and sheath spot-susceptible maize inbred line B73 was cultivated in a growth chamber as described previously¹⁶.

Genome sequencing

Fungal mycelia were scraped by flooding a five-day-old 9 cm PDA culture plate of NJC07 with five ml of ddH₂O, and one ml of the harvested mycelia was transferred to a 500 ml flask containing 100 ml of liquid complete medium, supplemented with 50 µg/ml of chloramphenicol (Lablead Biotech, Beijing, China) in an orbital shaker (set at 120 rpm) for two days under continuous light at 25°C. Mycelial balls were collected by filtering the liquid culture through a one-layered Miracloth (EMD Millipore, Billerica, USA). Genomic DNA (gDNA) was extracted from the mycelial balls using ENZA Fungal DNA Kit (Omega Bio-Tek, Norcross, USA). The quantity and quality of gDNA were assessed on QuantiFluor Fluorescence Quantitative System (Promega, Madison, USA) using Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, Waltham, USA). The integrity of gDNA was checked by running the 200 ng gDNA sample on a 1% agarose gel.

For Illumina sequencing, one µg high-quality gDNA was sheared using Covaris E220 (Covaris, Brighton, UK). The resulting gDNA fragments were subjected to size selection (~400 bp) through BluePippin (Sage Science, Beverly, USA). Next, the gDNA fragments were end-repaired (blunting), followed by 3’ dA-tailing using the NEBNext End Repair/dA-Tailing module (New England Biolabs, Ipswich, USA). The Illumina adapters were then ligated to the 400 bp end-repaired gDNA fragments to generate a paired-end library using TruSeq DNA LT Kit (Illumina, San Diego, USA). The quality and size of the library were evaluated on BioAnalyzer 2100 using the Agilent 1000 DNA chip (Agilent Technologies, Palo Alto, USA). Five nM of the paired-end library was loaded on NovaSeq. 6000 (Illumina, San Diego, USA), generating 2 × 150 bp sequence reads.

For Oxford Nanopore sequencing, one µg high-quality, high molecular weight gDNA was fragmented using Covaris E220 (Covaris, Brighton, UK). The large gDNA fragments were selected from the one µg gDNA fragment sample using BluePippin (Sage Science, Beverly, USA). The ligation of sequencing adapters to these large end-repaired and 3’ dA-tailed large gDNA fragments was performed using the SQK-LSK109 ligation kit following the manufacturer’s instructions (Oxford Nanotechnology, Oxford, UK). After the adapter ligation, the gDNA library was purified using AMPure XP beads, followed by quantification on QuantiFluor Fluorescence Quantitative System (Promega, Madison, USA) using Quant-iT PicoGreen dsDNA Assay Kit (Thermo Fisher Scientific, Waltham, USA). The library was sequenced on GridION (Oxford Nanotechnology, Oxford, UK), generating long reads up to 157.84 kb.

Genome size estimation and genome assembly

The paired-end reads (Illumina reads) were processed using AdapterRemoval v2¹⁷ to eliminate adapters and low-quality bases. SOAPec v2.0¹⁸ was employed to rectify identified sequencing errors based on the k-mer frequency set to 17 in the paired-end reads. Trimmomatic v0.36¹⁹ was used to clip low-quality reads, applying a sliding window of five nucleotides to remove bases with a Phred quality score ≤ 30. Jellyfish v2.0²⁰ was used to count K-mers (set to 17 bp) in the high-quality paired-end reads, thereby estimating the size of the E. sorghinum NJC07 genome (genome size = [genome length - k-mers] +1).

The long-read assemblers Falcon²¹ and Canu²² were used to de novo assemble the long Oxford Nanopore sequence reads, which were corrected using the Illumina paired-end reads by Pilon v1.18²³.

Prediction of the repetitive elements

We used LTR_FINDER v1.05, MITE-Hunter v1.0.0r, RepeatScout v1.0.5 and PILER-DF v2.4 to catalog transposable DNA elements in the E. sorghinum NJC07 genome, based on structural homology and de novo prediction. The identified transposable DNA elements were classified using PASTEClassifier v1.0 and merged with the RepBase library of fungal-specific repetitive DNA elements²⁴. Finally, RepeatMasker v4.0.6²⁵ utilizing the RepBase library was employed to predict the transposable DNA elements and mask the genome.

Genome annotation: Protein-coding gene models

We used three approaches to annotate the E. sorghinum NJC07 masked genome: ab initio-based prediction using Augustus v2.4²⁶, Genscan²⁷, GeneID v1.4²⁸, GlimmerHMM v3.0.4²⁹ and SNAP v2006.07.28³⁰; evidence (RNA-Seq) based prediction using PASA v2.0.2³¹ and TransDecoder v2.0³²; and homology (E. nigrum)-based prediction using GeMoMa v1.3.1³³. The gene models predicted from these approaches were integrated using EVidenceModeler v2.0.2³². Their predicted proteins were BLASTed against the seven databases, GO, KEGG, KOG, Pfam, Swiss-Prot, TrEMBL and NCBI-Nr, to annotate the gene models using an e-value of 1e-5 functionally.

Evaluation of genome completeness

The E. sorghinum NJC07 nuclear genome carries 11,779 protein-coding gene models. These gene models were subjected to Benchmarking Universal Single-Copy Ortholog (BUSCO) analysis using BUSCO v2.0³⁴ with the fungi_odb9 database containing 290 BUSCOs specific to fungi. The proportion of mapped BUSCOs was used as an indicator of the genome’s completeness.

Genome annotation: Non-coding gene models

tRNAscan-SE³⁵ and Infernal³⁶ were used to predict tRNAs and rRNAs in the E. sorghinum NJC07 genome. The non-coding gene models other than rRNA and tRNA were predicted by BLASTing the E. sorghinum NJC07 genome against the Rfam database.

RNA-Seq

The third leaves of the V3-stage B73 plants were inoculated with 6 mm mycelial plugs taken from the leading edges of seven-day-old PDA culture plates of NJC07, following a randomized complete block design with three biological replications. The inoculated plants were placed in a growth chamber set to 25°C, with a 16-h light/dark photoperiod and 100% relative humidity. The infected tissues were harvested at 48 and 96 hpi and, along with the NJC07 mycelia, were used in RNA-Seq.

Total RNA was isolated from 100 mg flash-frozen B73 infected leaf tissues and mycelia using the RNAprep Pure Plant Kit following the manufacturer’s instructions (Tiangen, Beijing, China). RNA concentration and purity were estimated on NanoDrop 2000 (Thermo Fisher Scientific, Wilmington, USA). RNA integrity was evaluated on Agilent Bioanalyzer 2100 utilizing the RNA Nano 6000 Assay Kit (Agilent Technologies, Palo Alto, USA). mRNA was isolated from one µg of the total RNA (RNA integrity number > 7; rRNA 28S/18S ratio ≥ 1; OD260/280 ≥ 1.89; and OD260/230 ≥ 2.2) using oligo(dT)-tethered magnetic beads. mRNA fragmentation was carried out in NEBNext First Strand Synthesis Reaction Buffer using Covaris E220 (Covaris, Brighton, UK). First-strand cDNAs were synthesized using the fragmented mRNAs as the template, random hexamers as primers and M-MuLV Reverse Transcriptase. Second-strand cDNA synthesis was then performed by adding PCR buffer, dNTPs, RNase H and DNA polymerase I. The cDNAs were purified using AMPure XP beads (Beckman Coulter, Beverly, USA). The cDNA fragments were blunted using exonuclease, followed by 3’ dA-tailing using the NEBNext End Repair/dA-Tailing module (New England Biolabs, Ipswich, USA) and NEBNext adaptor ligation. To selectively enrich cDNA fragments of around 240 bp in length, the end-repaired, adapter-ligated cDNA fragments were size-selected using the AMPure XP system. Next, the USER enzyme (New England Biolabs, Ipswich, USA) was added to the size-selected cDNA fragments. PCR amplification was performed utilizing Phusion high-fidelity DNA polymerase and universal PCR primers (carrying barcodes and annealing to the ligated adaptors). Finally, the PCR products were purified employing the AMPure XP system, and the libraries were assessed for quality on Agilent Bioanalyzer 2100. The resulting nine paired-end libraries (5 nM/library) were pooled and loaded onto NovaSeq X Plus (Illumina, San Diego, USA), producing 2 × 150 bp sequence reads.

Raw paired-end reads (2 × 150 bp) were processed and clipped using AdapterRemoval v2¹⁷ and Trimmomatic v0.36¹⁹, respectively. The high-quality paired-end reads (minimum length ≥ 50 with an average Phred quality score ≥ 33 on a sliding window of five nucleotides) were mapped onto the E. sorghinum NJC07 genome using HISAT v2.0.4³⁷, and the resulting SAM files were converted to BAM files using SAMtools³⁸. Transcripts were assembled using StringTie v2.2.1³⁹, and DESeq. 2⁴⁰ was employed to estimate transcript abundance expressed in fragments per kilobase per million mapped reads (FPKM). The gene expressions (Log2 fold changes) were determined from the FPKM values of genes at 48 and 96 hpi relative to the vegetative mycelia.

Data Records

The E. sorghinum NJC07 genome assembly is available at NCBI GenBank under the accession GCA_040893085.1⁴¹. The nuclear genome is distributed among 22 contigs (contigs 1 through 22; 32,694,839 bp), while the mitochondrial genome is organized into a single contig (contig 23; 61,235 bp). The high-quality RNA-Seq reads originating from the sequencing of E. sorghinum NJC07 mycelia and in planta fungal (E. sorghinum NJC07) cDNAs have been deposited in the NCBI-Sequence Read Archive under the following accessions: SRR29843002 (48 hpi, Es-48hpi-BR2_1/Es-48hpi-BR2_2)⁴², SRR29843003 (48 hpi, Es-48hpi-BR1_1/Es-48hpi-BR1_2)⁴³, SRR29843004 (Vegetative mycelia, Es-myc-BR3_1/Es-myc-BR3_2)⁴⁴, SRR29843005 (Vegetative mycelia, Es-myc-BR2_1/Es-myc-BR2_1)⁴⁵, SRR29843006 (96 hpi, Es-96hpi-BR3_1/Es-96hpi-BR3_2)⁴⁶, SRR29843007 (96 hpi, Es-96hpi-BR2_1/Es-96hpi-BR2_2)⁴⁷, SRR29843008 (96 hpi, Es-96hpi-BR1_1/Es-96hpi-BR1_2)⁴⁸, SRR29843009 (48 hpi, Es-48hpi-BR3_1/Es-48hpi-BR3_2)⁴⁹ and SRR29843010 (Vegetative mycelia, Es-myc-BR1_1/Es-myc-BR1_2)⁵⁰.

Technical Validation

We employed Trimmomatic v0.36¹⁹ and SOAPec v2.0¹⁸ to trim low-quality Illumina reads and correct sequencing errors, respectively. Jellyfish v2.0²⁰ was utilized to count 17 bp k-mers within the high-quality paired-end reads, facilitating the estimation of the genome size of E. sorghinum NJC07. The genome size was calculated using the formula: genome size = (genome length − k-mers) + 1. The completeness of the E. sorghinum NJC07 genome assembly was assessed using BUSCO analysis with version 2.0³⁴, with the fungi_odb9 database, which includes 290 BUSCOs specific to fungi. For RNA-Seq, samples were collected following a randomized complete block design with three biological replications to ensure statistical robustness and reproducibility of the results.

Code availability

The software programs utilized in this study were referenced in the Methods section. If detailed parameters for the software were not specified, default parameters were used.

References

Ma, W. et al. Molecular genetics of anthracnose resistance in maize. J. Fungi. 8(5), e540 (2022a).
Article MathSciNet Google Scholar
Mueller, D. S. et al. Corn yield loss estimates due to diseases in the United States and Ontario, Canada, from 2016 to 2019. Plant Health Prog. 21, 238–247 (2020).
Article MATH Google Scholar
Ma, W. et al. First report of Didymella glomerata causing Didymella leaf blight on maize. Plant Dis. 106(9), 2522 (2022b).
Article MATH Google Scholar
Mahuku, G. Maize pathology in Asia: opportunities and challenges for breeding disease-resistant maize. Proceedings of the Asian Regional Maize Workshop 10, 361–366 (2010).
Google Scholar
Chen, T. et al. First report of Epicoccum sorghinum causing Leaf sheath and leaf Spot on Maize in China. Plant Dis. 105(11), 3741 (2021).
MATH Google Scholar
Navi, S. S. et al. Effects of wetness duration and grain development stages on sorghum grain mold infection. Plant Dis. 89, 872–878 (2005).
Article CAS PubMed MATH Google Scholar
Forbes, G.A., Bandyopadhyay, R., Garcia, G. A review of sorghum grain mold. ICRISAT (1992).
Shephard, G. S. et al. Reversed-phase high-performance liquid chromatography of tenuazonic acid and related tetramic acids. J. Chromatogr. 566, 195–205 (1991).
Article CAS PubMed MATH Google Scholar
Shi, X. et al. Occurrence and management of the emerging pathogen Epicoccum sorghinum. Plant Dis. https://doi.org/10.1094/PDIS-03-24-0711-FE (2024).
Liu, L. M. et al. First report of leaf spot disease on rice caused by Epicoccum sorghinum in China. Plant Dis. 104(10), 2735 (2020).
Article MATH Google Scholar
Li, Q. et al. Assessing genetic resistance in wheat to black point caused by six fungal species in the Yellow and Huai wheat area of China. Plant Dis. 104(12), 3131–3134 (2020).
Article PubMed MATH Google Scholar
Yu, L. et al. First report of leaf spot caused by Epicoccum sorghinum on Chinese flowering cabbage (Brassica parachinensis) in China. Plant Dis. 103(11), 2966 (2019).
Article MATH Google Scholar
Oliveira, R. C. et al. Draft genome sequence of sorghum grain mold fungus Epicoccum sorghinum, a producer of tenuazonic acid. Genome Announc. 5, e01495–16 (2017).
Article PubMed PubMed Central Google Scholar
Bao, Y. et al. Genome sequence of Phoma sorghina var. saccharum that causes sugarcane twisted leaf disease in China. Mol. Plant-Microbe Interact. 33(9), 1092–1094 (2020).
Article CAS PubMed MATH Google Scholar
Chiaromonte, F., Yap, V.B. & Miller, W. Scoring pairwise genomic sequence alignments. Pac. Symp. Biocomput. 115–126 (2002).
Ma, W. et al. CRISPR/Cas9-mediated deletion of large chromosomal segments identifies a minichromosome modulating the Colletotrichum graminicola virulence on maize. Int. J. Biol. Macromol. 245, 125462 (2023).
Article CAS PubMed MATH Google Scholar
Schubert, M., Lindgreen, S. & Orlando, L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res. Notes. 9, 88 (2016).
Article PubMed PubMed Central Google Scholar
Luo, R. B. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience. 1(1), 18 (2012).
Article PubMed PubMed Central MATH Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 30(15), 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 27(6), 764–770 (2011).
Article PubMed PubMed Central MATH Google Scholar
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods. 13, 1050–1054 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central MATH Google Scholar
Walker, B. J. et al. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE. 9(11), e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed MATH Google Scholar
Smit, A., Hubley, R. & Green, P. RepeatMasker Open-3.0. http://repeatmasker.org (accessed 29 March 2021).
Stanke, M. & Morgenstern, B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 32, W309–W312 (2005).
Article MATH Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed MATH Google Scholar
Blanco, E., Parra, G. & Guigó, R. Using geneid to identify genes. Current Protoc Bioinformatics. 4, 3 (2007).
Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 20, 2878–2879 (2004).
Article CAS PubMed MATH Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics. 5, 59 (2004).
Article PubMed PubMed Central MATH Google Scholar
Campbell, M. A. et al. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics. 7, 327 (2006).
Article PubMed PubMed Central MATH Google Scholar
Haas, B. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Article CAS PubMed MATH Google Scholar
Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).
Article PubMed PubMed Central Google Scholar
Simão, F. A. et al. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 31(19), 3210–3212 (2015).
Article PubMed MATH Google Scholar
Lowe, T. M. & Chan, P. P. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44(W1), W54–57 (2016).
Article CAS PubMed PubMed Central MATH Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 29(22), 2933–2935 (2013).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12(4), 357–360 (2015).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 25(16), 2078–2079 (2009).
Article PubMed PubMed Central MATH Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33(3), 290–295 (2015).
Article CAS PubMed PubMed Central MATH Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq. 2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central MATH Google Scholar
Bhadauria, V. Epicoccum sorghinum strain NJC07, whole genome sequencing project. Genbank https://identifiers.org/ncbi/insdc.gca:GCA_040893085.1 (2024).
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRR29843002 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843003 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843004 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843005 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843006 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843007 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843008 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843009 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR29843010 (2024).

Download references

Acknowledgements

This research was made possible through the generous support of the National Natural Science Foundation of China (Grant No. 32172363; V.B.) and the Chinese Universities Scientific Fund (Grant No. 10092004; V.B.). The funders had no roles in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations

Department of Plant Pathology, College of Plant Protection, China Agricultural University, Beijing, 100193, China
Vijai Bhadauria, Guangjun Li & Xinying Gao
Ministry of Agriculture and Rural Affairs-Key Laboratory of Crop Pest Monitoring and Green Management, China Agricultural University, Beijing, 100193, China
Vijai Bhadauria
School of Life Sciences, Nantong University, Nantong, 226019, China
Pedro Laborda

Authors

Vijai Bhadauria
View author publications
Search author on:PubMed Google Scholar
Guangjun Li
View author publications
Search author on:PubMed Google Scholar
Xinying Gao
View author publications
Search author on:PubMed Google Scholar
Pedro Laborda
View author publications
Search author on:PubMed Google Scholar

Contributions

V.B. designed the research; P.L. provided the E. sorghinum strain NJC07; V.B. performed the research and data analysis; G.L. and X.G. participated in the sequencing of the E. sorghinum strain NJC07; V.B. wrote the paper.

Corresponding author

Correspondence to Vijai Bhadauria.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figure 1

Supplementary Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Bhadauria, V., Li, G., Gao, X. et al. Near-complete genome and infection transcriptomes of the maize leaf and sheath spot pathogen Epicoccum sorghinum. Sci Data 12, 261 (2025). https://doi.org/10.1038/s41597-025-04564-1

Download citation

Received: 30 July 2024
Accepted: 30 January 2025
Published: 13 February 2025
Version of record: 13 February 2025
DOI: https://doi.org/10.1038/s41597-025-04564-1