Background & Summary

Fungi from the Pyricularia genus can infect a variety of grasses1. Among them, Pyricularia oryzae (syn. Magnaporthe oryzae)2 is known to infect a wide range of grass hosts, and cause devastating blast disease on economically important crops, such as rice and wheat3. The pathogenicity of blast fungus was primarily determined by effectors4,5,6. The gain and loss of effector genes, along with genome structural variations associated with host adaptation, are largely mediated by the transposable elements (TEs)7,8,9. A high-quality reference genome would facilitate understanding of genome plasticity partially mediated by TEs, tracing the evolution trajectories associated with host jump and host adaptation of these pathogenic fungi, while also benefiting durable control of the blast disease10. In the past decades, the genomes of hundreds of the field isolates had been sequenced and analyzed11,12,13,14. However, genomes assembled using Illumina short reads are not qualified for identification of structure variations, particularly complex regions enriched with repetitive elements7. Following the first assembly of Guy11 genome using PacBio long reads from our lab7, several high-quality genomes from the Pyricularia genus have been reported15,16,17,18, but only one telomere-to-telomere genome of the rice blast isolate P131 had been published recently19. To date, seven core chromosomes have been identified in the Pyricularia isolates16,19. Most of these studies focused on the rice- and wheat-infecting isolates, while the genome structure of blast fungus infecting other hosts remains largely unknown. Previously, the genome of P1609, a P. penniseti isolate infecting the Cenchrus grass JUJUNCAO20, was assembled into 53 contigs using the long-read PacBio technology. Comparative genomic analysis between P. penniseti and P. oryzae revealed a rapid divergence in the repertoire of pathogenicity-associated genes21.

In this study, we generated a telomere-to-telomere and near-complete genome assembly of a new P. penniseti isolate JC-1, which showed higher performance in conidiation and protoplast production than P1609. The genome of JC-1 was sequenced using the Oxford Nanopore Ultra-Long protocol, generating 27.7 Gb of reads (~650 × genome coverage; Table 1), and was assembled with Canu v2.222. The assembly was polished with 4.45 Gb of Illumina short reads (~110 × genome coverage; Table 1). The final assembly includes 10 chromosomes, spanning 42.1 Mb (Tables 2, 3; Fig. 1). Of the 10 chromosomes, all contain telomere repeats at both ends, except for chr1, which is missing a telomere on the left end. This missing telomere is likely due to the assembly challenges posed by the highly repetitive nucleolar organizer region (NOR)23, which is known for its tandem repeats of ribosomal RNA (rRNA) genes24 (Fig. 1, Track A). In the final assembly, 31 copies of 18S rDNA spanning about 250 kb were identified. The genome of the P. penniseti isolate JC-1 contains eight chromosomes sharing high collinearity with the seven core chromosomes of the P. oryzae isolate 70-15, indicative of eight core chromosomes in JC-1 genome (chr1-8; Fig. 1). Notably, the size of chr5 and chr7 (~2.0 Mb) was much less than the other core chromosomes of JC-1 (Figs. 1, 2). In addition, we also identified two small assembled chromosomes, which share low collinearity with the core chromosomes of P. oryzae isolate 70-15 and referred to as supernumerary chromosomes (mini1 and mini2; Fig. 1). Therefore, the P. penniseti isolate JC-1 assembly contains eight core chromosomes and two supernumerary mini-chromosomes. To further confirm whether the eight core chromosomes are common in the P. penniseti isolates, we performed Pulsed Field Gel Electrophoresis (PFGE) to assess the karyotype of four P. penniseti isolates collected from different areas (JC-1 to JC-4) and the P. oryzae isolate Guy11 without mini-chromosome. The PFGE result showed that all the four P. penniseti isolates displayed two bands (between 1.81–2.35 Mb) representing the two small core chromosomes, as well as the two supernumerary mini-chromosomes (mini1 and mini2) with varied sizes in each isolate (Table 3; Fig. 2). The completeness of the P. penniseti isolate JC-1 genome assembly was estimated to be 97.7% using single-copy, conserved genes (benchmarking universal single-copy orthologs (BUSCO; Table 2). The assembled genome encoded 12,156 genes and contained 4.54% repetitive sequences (Tables 24). The content of repetitive sequences in JC-1 was lower than that in rice-infecting P. oryzae isolates (>10%), but was comparable to non-rice-infecting P. oryzae isolates (~5%)9. The JC-1 assembly showed a high contiguity for contigs N50 (6.6 Mb; Table 2) and 19 telomeres (Fig. 1). The two supernumerary mini-chromosomes contained fewer genes and significantly more repetitive sequences than the core chromosomes (Fig. 3a,b). The JC-1 assembly should serve as a new high-quality reference genome for the application of comparative and functional genomics in genome evolution, structural variation, and host adaptation among Pyricularia isolates infecting diverse hosts.

Table 1 Summary of P. penniseti isolate JC-1 sequencing data.
Table 2 Genome assembly and annotation statistics of the two P. penniseti isolates JC-1 and P1609.
Table 3 Summary of P. penniseti isolate JC-1 chromosomes and genes.
Fig. 1
figure 1

Genome features of Pyricularia penniseti isolate JC-1. Circos plot showing genomic features of P. oryzae isolate 70-15 (left) and P. penniseti isolate JC-1 (right). A, assembled nuclear chromosomes; B, distribution of repetitive sequences; C, gene density; D, GC content.

Fig. 2
figure 2

PFGE image of small-size chromosomes of the four Pyricularia penniseti isolates (JC-1 to JC-4) and the P. oryzae isolate Guy11. The Hansenula wingei chromosomal DNA was used as size marker. The red arrowheads indicate the two small-size core chromosomes.

Table 4 Comparison of repeat sequences between the two P. penniseti isolates JC-1 and P1609.
Fig. 3
figure 3

Genomic features of the mini chromosomes of Pyricularia penniseti isolate JC-1. Distribution of (a) genes and (b) repetitive sequences through the core- and mini- chromosomes.

Methods

Fungal strains

The Pyricularia penniseti strain JC-1 was isolated from the leaf spot lesion on JUJUNCAO (Cenchrus fungigraminus; syn. Pennisetum giganteum Z. X. Lin)18, and is stored at the Fujian Universities Key Laboratory for Plant-Microbe Interaction, Fuzhou, Fujian Province, China. For fugal growth, the JC-1 strain was incubated on solid complete medium (CM) at 28 °C in the dark.

Sampling and DNA extraction

The P. penniseti isolate JC-1 was cultured in liquid complete medium (CM) at 110 rpm, 28 °C for 3 days. The mycelia were collected and washed twice using ddH2O. Genomic DNA was extracted from vegetative mycelia and used for genome sequencing as described previously by Zhong et al.14.

Nanopore and Illumina Whole Genome Sequencing

Genomic DNA was extracted using the GP1 method (Novogene, Beijing, China) and 100 kb size selection was performed using the SageHLS HMW library system (Sage Science) high pass protocol. An ultra-long library protocol was prepared following the SQK-LSK114 protocol (Oxford Nanopore Technologies, Oxford, UK). A tatal of 400 ng of DNA libraries were loaded to R10.4.1 flow cell and sequenced on the PromethION platform (Oxford Nanopore Technologies) at the Novogene Bioinformatics Technology Co., Ltd (Beijing, China). For Illumina sequencing, the library was constructed and sequenced on Illumina NovaSeq6000 platform at Novogene Bioinformatics Technology Co., Ltd (Beijing, China).

Genome assembly

The raw nanopore long reads of JC-1 were assembled using Canu v2.224 with the following parameters: useGrid = false genomeSize = 45 m minReadLength = 5000 minOverlapLength = 2000 corOutCoverage = 60 correctedErrorRate = 0.1 corPartitionMin = 10000 maxInputCoverage = 100. Total of 11 assembled contigs were polished with Illumina short-read sequencing data using NextPolish v1.4.125 for three rounds.

Evaluation of the genome assembly

To detect telomeres on the chromosome, sequence TTAGGG/CCCTAA (as reported byBrigati et al.26) was aligned to JC-1 assembly using the TIDK v.0.2.027 with the following parameters: tidk explore–minmum 5–maximum 12 genome.fa tidk search–string TTTAGGG–dir outdir–output output genome. For visualization, the following parameters were used: tidk plot –tsv windows.tsv. The genome assembly quality was evaluated through the BUSCO (Benchmarking Universal Single-Copy Orthologs) v5.5.028 tool with the “ascomycota_odb10” lineage as a reference dataset.

Generation of annotations

Protein-coding genes were annotated using the Braker 2.0 v2.1.629, which integrates both ab initio gene predictions generated by AUGUSTUS v3.4.030 and GeneMark-EP +31, as well as homology evidence from fungi protein sequences in the OrthoDB fungal database. All high-confidence protein-coding genes predicted by Braker 2.0 were used for statistic and comparative genome analysis in this study.

An ab initio transposable element (TE) library was constructed using RepeatModeler v1.0.832 with default parameters. RepeatMasker v3.3.033 was applied to perform a homology-based repeat search throughout the whole JC-1 genome using the constructed TE library.

Pulsed Field Gel Electrophoresis (PFGE)

Protoplast plug was prepared using the CHEF (Contour-clamped homogeneous electric field) Genomic DNA Plug Kits (Bio-Rad, California, USA) according to the manufacturer’s instruction. In brief, the mycelia was collected and digested with 10 mg/mL Lysing Enzymes in 1 M sorbitol at 30 °C, 85 rpm for 3 h. The digested product was filtered through sterile Nytex nylon mesh that has a 25 µm pore size34 and centrifuged at 4,500 rpm, 4 °C for 10 min to collect the protoplasts. The protoplasts were washed with SE buffer (1 M sorbitol, 50 mM EDTA) and adjusted to a concentration of 1 × 109 protoplast/ml, then mixed with 2% low melting agarose gel (Bio-Rad, California, USA) and transferred to modules to form protoplast plugs. The protoplast plugs were transferred to a 10 ml tube containing proteinase K buffer and incubate overnight at 50 °C in water bath without agitation. After four wash with 1 × wash buffer at 25 °C, the plugs were immersed in 0.5 × TBE and stored at 4 °C. CHEF gel electrophoresis was conducted according to Orbach et al.34 with minor modifications. In brief, chromosomes were separated using a CHEF-DRII System (Bio-Rad, California, USA) on 1% Certified Megabase Agarose in 0.5 × TBE buffer at 2 V/cm, 14 °C, with a switching interval of 900 s for 96 h. The 0.5 × TBE buffer was replaced every two days.

Classification of mini- and core-chromosome assemblies

To classify the mini- and core-chromosome assemblies, macrosynteny relationships between JC-1 and the 70-15 (P. oryzae) reference genome were identified and plotted based on the results of MCScanX35 with default parameters. Finally, gene frequency, distribution of TEs, and collinear gene pairs between JC-1 and 70-15 were visualized using advanced Circos plots generated with TBtools-II v2.08636.

Data Records

The raw Oxford illumina data and Nanopore sequencing data have been deposited in the National Center for Biotechnology Information (NCBI) under the BioProject (PRJNA1146787) with accession number of SRR3020789937 and SRR3020790038, respectively. The genome assembly was deposited under the same BioProject at NCBI, under the accession number JBGNXE000000000.139.

Technical Validation

Quality control of the Nanopore Ultra-Long reads was performed using NanoPack2 (https://doi.org/10.1093/bioinformatics/btad311). The N50 read lenth for the Nanopore Ultra-Long reads length was 100 kb, with an average Q score of 13.2. The Illumina sequencing reads were found to have a GC content of 49.06%, with 96.96% and 92.00% of reads having quality scores of 20 and 30, respectively. The chromosome-level genome assembly has a size of 42.1 Mb, and the contig N50 length is 6.64 Mb. The genome completeness was evaluated using BUSCO (Benchmarking Universal Single-Copy Orthologs) v5.5.028 with the ascomycota_odb10 database. The results showed the following BUSCO statistics: 97.7% complete, 97.2% complete single-copy, 0.5% duplicated, 0.3% fragmented, and 2.0% missing (Table 2). Telomeres were detected on each chromosome using TIDK v.0.2.027 employing the telomeric sequence (TTAGGG/CCCTAA)26, and the results show that each chromosome contains telomeres. In conclusion, these results indicate high completeness of our JC-1 genome assembly.