Background & Summary

Zoysia macrostachya Franch. & Sav. is a perennial grass species in the Poaceae family and the Chloridoideae subfamily1. Zoysia is a genus of warm-season grasses widely cultivated for turf applications, including golf courses, residential lawns, and commercial landscapes2,3. As of 2015, an estimated 10,375 ha of zoysiagrass were planted in the US, with 78% of it used on golf courses in the transition zone and 17% in the Southeast region4. This species is native to the coastal regions of Korea, Japan, and East Asia, where it thrives in saline environments5. Like other species in the Zoysia genus, Z. macrostachya exhibits traits of a recretohalophyte, featuring specialized salt glands on its leaves that enable salt excretion under saline conditions6,7,8. This adaptation, common in halophytes, allows the species to thrive in challenging habitats with high salinity. Morphologically, it is a predominantly littoral species growing up to 21 cm tall9. These characteristics make it an excellent model for exploring plant resilience to abiotic stresses.

Salinity stress is a critical challenge in agriculture, as it significantly reduces plant productivity and limits species distribution by inducing osmotic imbalance, ion toxicity, and oxidative stress10. Studies on the Zoysia genus have revealed diverse mechanisms of salinity tolerance11,12,13,14. For example, Z. japonica exhibits adaptive traits such as the exclusion and isolation of sodium ions from sensitive tissues and the effective compartmentalization of potassium ions15. Z. macrostachya exhibits enhanced tolerance, as shown by its lower membrane conductivity and reduced malondialdehyde accumulation under stress16. Furthermore, this species increases proline and soluble sugar levels while boosting peroxidase (POD) activity, further enhancing its resistance to salinity. The basis of this salinity tolerance may lie in species-specific structural variations and sequence variations, underscoring the importance of genomic studies to further explore these adaptations.

Advances in genomic studies have illuminated the evolutionary and adaptive mechanisms of Zoysia species. Notably, Z. japonica is the only Zoysia species with a chromosome-level reference genome, consisting of a 334 Mb draft assembly and 59,271 predicted protein-coding genes17. Genomic synteny analyses suggest that the Zoysia genus underwent a species-specific whole-genome duplication (WGD) event approximately 20.8 Mya, which contributed to enhanced salt tolerance through expanded cytochrome P450 and ABA biosynthetic gene families18. While these findings provide insights into genetic diversity and salinity tolerance, the genome of Z. macrostachya has the potential to offer a deeper understanding of the ecological and physiological adaptations to salinity within the genus.

In this study, we present a chromosome-level genome assembly for Zoysia macrostachya constructed using Oxford Nanopore long-reads, Illumina short-reads, and Omni-C sequencing data. Z. macrostachya has a tetraploid karyotype with 2n = 4x = 40 chromosomes1,19. The assembly spans 329.78 Mb across 20 haploid chromosomes, with a scaffold N50 of 19.24 Mb and complete telomeric sequences at both ends. Genome validation showed 97.8% complete BUSCOs, highlighting the high quality of genome assembly. Additionally, gene annotation was validated with BUSCO, identifying 4,754 (97.1%) complete BUSCOs. A total of 33,474 protein-coding genes were annotated, with 86.92% functionally characterized using Swiss-Prot and eggNOG databases. Macrosynteny analysis revealed high gene-based synteny with related species, reflecting the evolutionary genome structure dynamics within Chloridoideae species. This high-quality genome establishes a valuable resource for future research on abiotic stress resilience and evolutionary genomics within the Zoysia genus.

Methods

Plant materials and sequencing

Z. macrostachya (ZN3169) was used in this study, collected from Seonyu Island, Gunsan-si, Jeonbuk-do, Republic of Korea (Fig. 1a). Genomic DNA was isolated from leaf samples using the SmartGene Plant DNA Extraction Kit II (SmartGene, Daejeon, Korea), following the manufacturer’s instructions. A 151 bp paired-end library was prepared using the xGen DNA Library Prep Kit for genomic DNA sequencing. Sequencing on the Illumina NovaSeq. 6000 platform generated 20.40 Gb of raw data (Table S1). This data was utilized for genome size estimation and heterozygosity analysis (Fig. 1b). For Oxford Nanopore long-read sequencing, a genomic DNA library was prepared using the ONT Ligation Sequencing Kit (SQK-LSK110) following the manufacturer’s guidelines. Sequencing was conducted on a MinION device with R9.4.1 flow cells, producing 30.48 Gb (92.32×) (Table S1). For Omni-C library sequencing, the Dovetail Omni-C Kit (Dovetail Genomics, USA) was utilized following the provided instructions. Approximately 300 mg of Z. macrostachya leaf was ground into a fine powder under liquid nitrogen. The powdered tissue was crosslinked with 37% formaldehyde and incubated for 10 minutes at room temperature. Chromatin was digested in situ using DNase supplied in the kit, followed by end-repair, ligation, and purification of the resulting DNA fragments. The ligated DNA underwent proximity ligation to form long-range interactions and was subsequently reverse crosslinked to release the DNA. The extracted DNA was then purified and quantified with a Qubit Fluorometer (Thermo Fisher Scientific, USA). The Omni-C libraries were constructed using the Dovetail™ Library Module for Illumina (Dovetail Cat. No. 21004) and indexed with the Dovetail Dual Index Primer Set for Illumina (Dovetail Cat. No. 25005), with amplification performed according to the manufacturer’s guidelines. Library quality and quantity were evaluated using a Qubit 4.0 Fluorometer (Invitrogen Ltd, Paisley, UK) and a 4150 TapeStation (Agilent, Santa Clara, CA, USA), respectively. The library was sequenced on an Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA) using paired-end reads of 151 bp, generating 27.43 Gb (83.07×) (Table S1). For gene annotation analysis, total RNA was extracted from shoot and root samples using the SmartGene Plant RNA Extraction Kit (SmartGene, Daejeon, Korea) following the manufacturer’s instructions. To prepare for mRNA sequencing, mRNA was isolated using the Poly(A) RNA Selection Kit (Lexogen, Vienna, Austria). The selected mRNA was subsequently used to construct 151 bp paired-end libraries with the xGen™ RNA Lib Prep Kit (Integrated DNA Technologies, Coralville, LA, USA. Sequencing of the RNA libraries was conducted on the Illumina NovaSeq 6000 platform, producing 8.88 Gb of raw data with 151 bp paired-end reads (Table S1).

Fig. 1
figure 1

Morphology and k-mer analysis of Zoysia macrostachya. (a) Morphology of Z. macrostachya in its natural habitat. (b) K-mer frequency distribution curve representing the genome size and heterozygosity of Z. macrostachya.

Genome assembly, and chromosome-level scaffolding

A total of 67,546,030 Illumina paired-end reads were cleaned using Trimmomatic v0.3920. The parameters included removal of adapter sequences (ILLUMINACLIP:TruSeq. 3-PE.fa:2:30:10:2:keepBothReads), trimming of leading and trailing low-quality bases (Phred score < 10; LEADING:10, TRAILING:10), and removal of reads shorter than 50 bp (MINLEN:50). K-mer frequency analysis was performed using Jellyfish v2.3.021 with 21-mers to facilitate genome size and heterozygosity rate estimation for the Z. macrostachya genome. Subsequently, GenomeScope 2.022 was employed with the options -k 21 -p 2 to analyze genome size and heterozygosity, including sub-genomes. The k-mer distribution exhibited two peaks, predicting a genome size of 324,246,238 bp and a heterozygosity rate of 2.58% (Fig. 1b).

The genome of Z. macrostachya was de novo assembled using NextDeNovo v2.5.023 with Oxford Nanopore long reads exceeding 1 kb in length. Minimap2 with a k-mer size of 19 was employed during consensus building to balance sensitivity and specificity in sequence alignments. Genome polishing was subsequently conducted using NextPolish v1.4.124, employing DNA-seq data from Illumina NovaSeq, comprising 135,092,060 reads with a total yield of 20.40 Gb, achieving a coverage depth of approximately 61.79X. Polishing was carried out over three iterative rounds, with BWA-MEM utilized for alignment of the short-read data. This initial assembly consisted of 95 contigs with a total length of 340,935,610 Mb and a contig N50 of 7.50 Mb (Table 1). To refine the assembly and remove haplotypic duplications, contig overlaps were resolved using Purge_Dups25, which resulted in a purged assembly of 69 contigs. This refined draft genome had a total length of 330,146,791 Mb and an improved contig N50 of 7.56 Mb

Table 1 Genome assembly statistics of Z. macrostachya.

For chromosome-level genome scaffolding, Omni-C reads were aligned to the draft genome using BWA v0.7.17-r118826 with the -5SP option. Alignments were filtered with MAPQ ≥ 1 (mapping quality ≥ 1) and NM ≤ 3. The filtered Omni-C reads were then processed using the HapHiC v1.0.627 pipeline with the options–correct_nrounds 2,–max_inflation 5, and a target chromosome count of 20 for scaffolding analysis (Fig. 2a). This step successfully anchored the 69 contigs into 20 scaffolds, dramatically increasing contiguity to a scaffold N50 of 19.24 Mb. The resulting scaffolded assembly initially contained 49 gaps. Subsequently, we performed manual curation in Juicebox_1.11.082128. For gap closing, we used YAGCloser (https://github.com/merlyescalona/yagcloser) with Oxford Nanopore long reads corrected by Ratatosk v0.9.029, which successfully filled 23 of these gaps. The final chromosome-level assembly consists of 20 scaffolds with a total length of 329,773,341 bp and a scaffold N50 of 19.24 Mb, containing only 26 remaining gaps that total approximately 2,600 bp in length. The final chromosome-level assembly was validated using the TeloExplorer module of the quarTeT30 toolkit to confirm its structural integrity. Using the plant telomeric repeat sequence (AAACCCT), the analysis successfully identified these canonical repeats at both ends of all 20 chromosomes, with copy numbers ranging from 12 to 650 times (Fig. 2b). The comprehensive detection of these terminal repeat structures provides strong evidence for a high-quality chromosome-level assembly.

Fig. 2
figure 2

Overview of the Zoysia macrostachya genome. (a) Omni-C heatmap showing intra- and inter-chromosomal interactions based on the number of aligned Omni-C reads. (b) Locations of gaps and repeated telomeric sequences (AAACCCT) within the Z. macrostachya genome.

Genome annotation

To identify both known and novel repetitive elements in the Z. macrostachya genome, a database was constructed using RepeatModeler v2.0.4 (www.repeatmasker.org/RepeatModeler/) with default settings. Repeat sequences were subsequently predicted and masked utilizing RepeatMasker v4.1.5 (http://www.repeatmasker.org/). A total of 145,185,180 bp of repetitive sequences were identified, accounting for 44.03% of the entire Z. macrostachya genome (Table S2). Long terminal repeat (LTR) elements were the most abundant, with 61,197 elements spanning 69,542,551 bp, representing 21.09% of the genome.

For gene annotation, the genome with softmasked repetitive sequences was utilized. The BRAKER3 pipeline31,32,33,34,35 was used to predict protein-coding genes in Z. macrostachya. This approach integrated two types of evidence: short-read RNA-seq data and protein homology information. Short RNA-seq reads were trimmed and filtered using Trimmomatic v0.3920 with parameters ILLUMINACLIP:TruSeq. 3-PE.fa:2:30:10:2:keepBothReads, LEADING:10, TRAILING:10, and MINLEN:50. Subsequently, the reads were further filtered with PRINSEQ-lite v0.20.436 using the parameters: -min_len 50, -min_qual_mean 15, -derep 14, -trim_qual_left 15, and -trim_qual_right 15. The cleaned reads were then aligned to the genome with HISAT2 v2.2.137, and the resulting alignment files were provided to the BRAKER3 pipeline31,32,33,34,35 along with protein sequences from 27 Liliopsida species obtained from NCBI (Table S3). These hints were applied to train GeneMark-ETP and AUGUSTUS for gene prediction. Statistical analysis of the gene annotation results was performed using the agat_sq_stat_basic.pl script from the AGAT software38. A total of 33,474 protein-coding genes with an average length of 2,544.83 bp were predicted in the Z. macrostachya genome (Table 2). The average exon length was 238.33 bp, while the average intron length was 381.49 bp.

Table 2 Statistics of Z. macrostachya protein-coding gene annotation.

Functional annotation of protein-coding genes was conducted using EnTAP v1.1.139. Protein sequences were compared against the UniProt Swiss-Prot40 and eggNOG41 databases via DIAMOND42, with an e-value threshold of 1 \({\text{e}}^{-5}\). Furthermore, KEGG43 terms and GO44 terms were assigned to the genes through eggNOG-mapper45,46. The result was visualized using TBtools47. Out of the 33,474 protein-coding genes, 29,095 genes (86.92% of total genes) were annotated based on the Swiss-Prot and EggNOG databases (Fig. 3). Among these, 18,921 genes were annotated using Swiss-Prot, while Gene Ontology (GO) annotations were assigned as follows: 19,582 genes for Biological Process (BP) terms, 20,059 genes for Cellular Component (CC) terms, and 18,740 genes for Molecular Function (MF) terms. Additionally, 13,182 genes were assigned to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

Fig. 3
figure 3

Functional annotation of protein-coding genes in Zoysia macrostachya. Annotation results of protein-coding genes based on the Swiss-Prot and EggNOG databases. GO (Gene Ontology) includes BP (Biological Process), CC (Cellular Component), and MF (Molecular Function). KEGG (Kyoto Encyclopedia of Genes and Genomes) includes KO (KEGG Orthology).

Macrosynteny analysis

To compare collinearity among closely related species (Oropetium thomaeum and Zoysia japonica cv. Compadre), the genome data were first downloaded from CoGe (gid: 51527) and NCBI GenBank (GCA_040438285.1), respectively. Gene homology-based macrosynteny was then analyzed using MCScan (JCVI v1.3.948) with the parameters (–cscore 0.99 and–minspan 30) (Fig. 4). The plants used for the analysis included three species belonging to the Chloridoideae subfamily of Poaceae. Among them, O. thomaeum is a diploid species with a chromosome number of n = 10, while Z. japonica, being a member of the same genus as Z. macrostachya, is an allotetraploid species with 20 chromosomes (n = 20). Macrosynteny analysis revealed 18,316 homologous gene pairs between O. thomaeum and Z. japonica, covering 60.40% of the 28,930 genes in O. thomaeum and 37.26% of the 49,074 genes in Z. japonica (Fig. 4a). Similarly, 27,286 gene pairs were identified between Z. japonica and Z. macrostachya, representing 55.59% of the genes in Z. japonica and 81.30% of the 33,474 genes in Z. macrostachya. These findings highlight the synteny conserved among these species. Syntenic depth analysis, performed without the minimum collinearity restriction, indicated a 1:2 gene synteny between O. thomaeum and Z. japonica, consistent with the difference in their chromosome numbers (Fig. 4b,c). Moreover, a 1:1 gene synteny pattern observed between Z. japonica and Z. macrostachya confirms that Z. macrostachya exhibits tetraploid ploidy and contains two distinct subgenome types across its 20 chromosomes.

Fig. 4
figure 4

Macrosynteny among the genomes of Oropetium thomaeum, Zoysia japonica, and Zoysia macrostachya. (a) Macrosynteny showing syntenic regions among the genomes of three Poaceae species. Syntenic regions, defined as genomic segments with homology across 30 or more consecutive genes, are represented by gray lines connecting the corresponding regions. (b,c) Synteny patterns between species, indicating the number of syntenic blocks per gene: (b) between O. thomaeum and Z. japonica, and (c) between Z. japonica and Z. macrostachya.

Data Records

All raw sequencing data have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) database under the accession numbers SRP55532149. The genome assembly information has been deposited in and is publicly available from the GenBank database under the accession number (GCA_049640385.1)50. The genome annotation files, including GFF and FASTA formats, are available in figshare51.

Technical Validation

To validate the genome assembly, we aligned Oxford Nanopore long reads and Illumina short reads using Minimap v2.28-r1209 and Bowtie2 v2.4.4, respectively. The alignment results showed 91.26% and 91.35% mapping rates, indicating a high level of completeness in the assembled genome. Genome coverage and sequence quality were verified by aligning 4,896 Poales orthologs using Benchmarking Universal Single-Copy Orthologs (BUSCO) v5.2.2 in genome mode with the odb10 dataset52. Out of the total BUSCOs, 4,792 (97.8%) were identified as complete, with 3,595 (73.4%) classified as single-copy and 1,197 (24.4%) identified as duplicated (Fig. 5a). This demonstrates significantly higher completeness compared to previously reported genomes of Zoysia species17. To assess genome quality and the impact of heterozygosity, we performed a k-mer-based analysis using Merqury v1.353. Given the high heterozygosity rate of Z. macrostachya (2.58%), we specifically implemented a haplotype-purging strategy to mitigate potential assembly artifacts and enhance base-level accuracy. We evaluated the assembly quality both before and after this purging step. Quality was assessed with Merqury using a k-mer database (k = 21) generated by meryl v1.3. For context, the initial assembly prior to polishing had a k-mer completeness of 71.81% and a QV of 29.02. After polishing, these metrics increased significantly. The initial polished assembly showed a k-mer completeness of 73.54% and a consensus quality value (QV) of 36.15. After purging, the purged assembly showed a slight decrease in completeness to 72.49% but an increase in QV to 36.31. This trade-off demonstrates that while approximately 1% of k-mer completeness was lost, the process successfully improved the base-level accuracy of the primary contigs. The final scaffolded assembly maintained these improvements, presenting a completeness score of 72.46% and a QV of 36.31. Validation of the gene annotation was performed using BUSCO v5.2.2. Out of the total Poales BUSCOs, 4,754 (97.1%) were identified as complete, with 3,154 (64.4%) classified as single-copy and 1,600 (32.7%) identified as duplicated (Fig. 5b). In the context of the Earth BioGenome Project (EBP) quality standards54, these metrics classify our assembly as a high-quality draft genome. It successfully meets the EBP criteria for chromosome-level contiguity, achieving a 100% chromosome assignment rate into 20 final scaffolds (Contig N50 = 7.56 Mb, Scaffold N50 = 19.24 Mb). Furthermore, it exceeds the benchmark for completeness with a genome BUSCO score of 97.8%. While its base accuracy is high at QV 36.31 (>99.97% precision), it is slightly below the EBP’s highest Q40 target. Collectively, these results establish our assembly as a robust and reliable foundation for subsequent genomic studies.

Fig. 5
figure 5

Benchmarking Universal Single-Copy Orthologs (BUSCO) assessments of Zoysia macrostachya and previously reported Zoysia species for (a) genome and (b) gene annotation.