Introduction

Agropyron, a genus within the tribe Triticeae of the family Poaceae, is classified as a perennial grass species widely distributed across arid and semi-arid regions of Eurasia. As an important forage grass and ecological restoration plant, Agropyron plays an important role in grassland improvement, soil and water conservation and biodiversity conservation1,2. These ecological and agronomic roles underscore the importance of further deep study especially unraveling its genetic architecture to advance both theoretical understanding and practical applications.

The taxonomy of Agropyron Gaertn has been progressively refined through decades of research, integrating morphological, cytological, and genomic data. In 1933, the Soviet botanist S. A. Nevski proposed a narrow definition of the genus based on morphological traits, which delineated its core species. Later, in the late twentieth century, Dewey3 developed a modern classification system centered on the P chromosome group, which is shared by all species in the genus. Subsequent studies incorporated ploidy levels, clearly distinguishing diploids (2n = 14) and tetraploids (2n = 28) species4. This chromosome-based system has been continually refined and now recognizes three main species within Agropyron Gaertn., a framework that remains standard today5.

The classification of subordinate taxa also relied on differences between broad-spike and narrow-spike forms. For instance, in 1960, British cytotaxonomist Keith Jones categorized populations of flat-spiked wheatgrass into western broad-spike, eastern broad-spike, and narrow-spike types based on spike morphology6. In China, the mainstream classification has long followed the concept of flat-spiked wheatgrass established by Professor Geng Yili7 which is based on Nevski system, and has remained relatively stable. However, significant challenges persist in precisely defining broad-spike and narrow-spike clades: on one hand, morphological traits are highly influenced by environmental conditions, which often leads to subjective judgements when identifying transitional forms or closely related species. On the other hand, while chromosome-based classification is accurate, its complex experimental procedures and long timelines make it impractical for rapidly screening large germplasm collections. Therefore, developing molecular markers to efficiently and accurately distinguish broad-spike and narrow-spike clades is crucial. Such techniques would resolve current ambiguities in classification and advance the precise identification and efficient use of Agropyron germplasm resources.

Chloroplasts are vital organelles in plant cells responsible for photosynthesis and various metabolic processes. They not only carry out photosynthesis but also contain abundant genetic information. Their genome exhibits characteristics such as maternal inheritance, structural conservation, and moderate evolutionary rates, making it a crucial sequence feature for plant phylogenetics, species identification, and genetic improvement research8. Since the first tobacco chloroplast genome was sequenced in the 1980s, significant progress has been made in chloroplast genomics research9,10. In recent years, with the rapid advancement of sequencing technology, an increasing number of plant chloroplast genomes have been successfully sequenced and analyzed, providing rich data support for revealing the genetic code and evolutionary patterns of plant chloroplast genomes. Within the grass family, chloroplast genome research has yielded substantial results. For example, Luo et al.11 conducted a population genetic study of Agropyron Gaertn across four populations from the Qinghai-Tibet Plateau, Central Asia, East Asia, and Europe using Accl and GBSSI gene sequences. The results revealed that Agropyron Gaertn exhibits rich genetic diversity at the population level, with the Central Asian population potentially serving as the center of differentiation for Agropyron Gaertn. This finding indicates Central Asia as the origin center for Agropyron Gaertn, providing crucial theoretical support for the conservation and utilization of its genetic diversity. These studies not only reveal the structural characteristics and evolutionary patterns of grass chloroplast genomes but also provide an important molecular foundation for crop improvement and germplasm resource conservation12. However, research on complete chloroplast genome sequences for additional species within the genus Agropyron Gaertn remains relatively scarce and lacking systematic analysis and comparison. Therefore, the identification of complete chloroplast genome sequences for 7 species of the genus Agropyron Gaertn and 2 closely related species is of great significance for comprehensively revealing the genetic characteristics, evolutionary patterns, and potential functions of the chloroplast genomes within the genus Agropyron Gaertn.

This study employed high-throughput sequencing technology to analyze the chloroplast genomes of seven Agropyron Gaertn species and two closely related species (Elymus trachycaulus and Elytrigia elongata), elucidating their genetic characteristics and evolutionary patterns. As a representative of the closely related genus within the Triticeae13, the 2 closely related species (Elymus trachycaulus and Elytrigia elongata) can be used in comparative genomic analysis to screen for specific molecular markers distinguishing the broad-spike and narrow spike clades within the genus Agropyron Gaertn, thereby providing a reference supporting taxonomic identification. This study aims to provide theoretical support for the precise utilization of Agropyron Gaertn germplasm resources and the establishment of an efficient identification system.

Materials and methods

Plant materials

This study screened seven species of the genus Agropyron Gaertn and two closely related species: ACPE(Agropyron cristatum var. pectiniforme)14,15, ACPL(Agropyron cristatum var. pluriflorum)14,15 AD(Agropyron dasystachyum var. subvillosum)16,17, ADP(Agropyron desertorun var. pilosiusculum)14,15, AS(Agropyron sibiricum f. sibiricum)14,15, AMV (Agropyron mongolicum var. villosum) 14,15, ASP(Agropyron sibiricum f. pubiflorum)14,15, ET(Elymus trachycaulus) 18 and EE(Elytrigia elongata)14,15,18,19,20. The complete chloroplast genomes were characterized and de novo assembled (Table 1). Species identification was primarily based on the identification keys in Tomus 9 (Part 2) of the Chinese Academy of Sciences15 and Flora Intramongolica (Editio Tertia) Tomus 614. The morphological identification reference by Yan Weihong21 employs a chromosome-based identification system to categorize Agropyron Gaertn into broad-spike and narrow-spike clades. This identification has long adhered to the concept of narrow-spike narrow-spike agropyron wheatgrass proposed by Professor Geng Yili 22 based on the Nevski identification system. This study does not validate this morphological identification through molecular data but instead focuses on screening chloroplast molecular markers that can assist in distinguishing the two clades. All species samples were collected from the Shalqin Experimental Station of the Grassland Research Institute, Chinese Academy of Agricultural Sciences, located at N 40°35’, E 111°47’. Genomic DNA was extracted using a modified CTAB method. All samples are stored at the National Pasture Germplasm Resource Intermediate Repository (Hohhot, Inner Mongolia).

Table 1 Coded names and morphological characteristics of seven Agropyron species and two closely related species.

cpDNA sequencing and de novo assembly

Raw data were filtered using fastp v0.20.0 (https://github.com/OpenGene/fastp) to obtain clean data23. Bowtie2 v2.2.4 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) was employed in very-sensitive-local mode to align against the chloroplast genome database, reducing assembly complexity. Sequences aligned were designated as chloroplast genome sequences (cpDNA sequences) for the project samples. The core assembly module employed SPAdes v3.10.1 (http://cab.spbu.ru/software/spades/) to assemble chloroplast genomes, utilizing kmer sizes of 55, 87, and 121, with assembly performed independently of reference genomes. The complete chloroplast genomes of seven Agropyron species and two closely related species (E. trachycaulus, E. elongata) have been deposited in the NCBI database under the following accession numbers: SAMN47853882 (E. elongata), SAMN47853883 (A. sibiricum), SAMN47853884 (E. trachycaulus), SAMN47853885 (A. cristatum var. pectiniforme), SAMN47853886 (A. cristatum var. pluriflorum), SAMN47853887 (A. mongolicum var. villosum), SAMN47853888 (A. desertorun var. pilosiusculum), SAMN47853889 (A. sibiricum f. pubiflorum) and SAMN47853890 (A. dasystachyum).

Chloroplast gene annotation

Two methods were employed to annotate the chloroplast genome, enhancing annotation accuracy. First, Prodigal v2.6.3 (https://www.github.com/hyattpd/Prodigal) was used to annotate chloroplast CDSs24, predicted rRNA using hmmmer v3.1b2 (http://www.hmmer.org/)25, and predicted tRNA using aragorn v1.2.38 (http://www.ansikte.se/ARAGORN/)26. Next, gene sequences were extracted from published relatives in NCBI, then aligned against the assembled sequences using blast v2.6 (https://blast.ncbi.nlm.nih.gov/Blast.cgi) to obtain a second annotation result27. Finally, manually examine genes with discrepancies between the two annotation sets, remove erroneous or redundant annotations, and define multi-exon boundaries to obtain the final annotation.

Chloroplast genome map

Nine chloroplast genomes of the genus Cynodon were assembled using OGDRAW (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html)28. MISA v1.0 (MIcroSAtellite identification tool, https://webblast.ipk-gatersleben.de/misa/) was used to identify chloroplast SSRs ranging from single-nucleotide to octanucleotide repeats29. RSCU values were analyzed using MEGA 7. Gene sequences were aligned using MAFFT v7.427 (https://mafft.cbrc.jp/alignment/software)30, and synonymous and non-synonymous substitution rates were calculated with KaKs_Calculator v2.0 (https://sourceforge.net/projects/kakscalculator2/)31. Global alignment of homologous gene sequences across different species was performed using MAFFT. Nucleotide diversity (Pi)32 for each gene was calculated using DNASP v5 (http://www.ub.edu/dnasp/).

Analysis and identification of cpSSR and scattered repeat sequences

Analysis of cpSSR was performed using MISA v1.0 (MIcroSAtellite identification tool, https://webblast.ipk-gatersleben.de/misa/) with parameters 1–8 (single-base repeats occurring 8 or more times), 2–5, 3–3, 4–3, 5–3, 6–3. Repeat sequences were identified using vmatch v2.3.0 (http://www.vmatch.de/) combined with a Perl script33. This included forward, reverse, complementary, and palindromic tandem repeats with a minimum length of 30 bp and an edit distance less than 3 bp.

Chloroplast sequence homology analysis, collinearity identification, and phylogenetic tree construction

Visualize the boundary information between IR and LSC/SSC using CPJSdraw (http://cloud.genepioneer.com:9929/#/tool/alltool/detail/296), expressed as LSC-IRb, IRb-SSC, SSC-IRa, and IRa-LSC34. Perform phylogenetic analysis using the entire genome. Set the same starting point for circular sequences. Perform multiple sequence alignment of interspecies sequences using MAFFT v7.427 software (–auto mode). Process the aligned data using MrBayes v3.2.7a (http://nbisweden.github.io/MrBayes/) software. using the GTR + I + G model. Ngammacat was set to 5, with statefreqpr, revmat, pinvar, and shapepr configured according to the optimal model identified by jModelTest software. Other parameters remained at default values to construct Bayesian phylogenetic trees35. Genome alignment was performed using Mauve software with default parameters36.

Results

Basic traits of the chloroplast genome

Sequencing of the chloroplast genomes from seven Agropyron Gaertn species and two closely related species (E. trachycaulus, E. elongata) yielded 17,707,085 to 20,670,776 clean paired-end reads, respectively (Table 2). Among these, E. trachycaulus (ET) yielded 19,727,229 clean reads, with a complete cp genome length of 135,037 bp. This length is significantly shorter than that of the seven Agropyron Gaertn-clade species (135,448–135,483 bp) but comparable to that of the closely related species E. elongata (EE, 135,067 bp) (Table 2 and Fig. 1).

Table 2 Basic characteristics of hloroplast genomes.
Fig. 1
figure 1

Chloroplast genome map. Note: Forward-encoding genes are located on the outer side of the circle, while reverse-encoding genes are positioned on the inner side. The gray inner circle indicates GC content.

In terms of genetic composition, E. trachycaulus exhibits unique intermediate characteristics: its chloroplast genome contains 131 genes, specifically 40 tRNA genes, 8 rRNA genes, and 83 mRNA genes. This numerical profile is identical to that of seven Agropyron Gaertn species within the broad-spike clades; However, it differs significantly from E. elongata (129 genes, 38 tRNA genes), sharing only the numbers of rRNA (8) and mRNA (83) genes with E. elongata. This characteristic—where the total gene count aligns with the broad-spike clades of the genus Agropyron Gaertn while the genome length approximates that of E. elongata—provides crucial chloroplast genomic evidence for E. trachycaulus taxonomic identification. It not only reflects its phylogenetic relationship with Agropyron Gaertn but also reveals genomic structural signals indicating its differentiation toward the narrow-spike clades (Table 2).

The chloroplast genomes of all species exhibit a single-circular quadripartite structure: a large single-copy region (LSC), a small single-copy region (SSC), and two inverted repeat regions (IRa and IRb) (Fig. 1). This architecture is similar to chloroplast genomes in various plant species37. Within the Agropyron Gaertn chloroplast genome, the IR regions harbor 8 rRNA genes, 16 tRNA genes, and 14 mRNA genes; the SSC region contains 1 tRNA gene and 10 mRNA genes, while the LSC region contains 23 tRNA genes and 59 mRNA genes (Table 3 and Fig. 1). Although the overall GC content of chloroplast genomes across different Agropyron Gaertn species is relatively similar (38.32%–38.34%), the GC content in the IR region (43.91%–44.01%) is significantly higher than that in the LSC region (36.28%–36.38%) and SSC region (32.21%–32.26%). This difference is closely related to the enrichment of high-GC-content rRNA genes in the inverted repeat region (Table 2).

Table 3 Genes annotated in the chloroplast genomes.

Among these genes, 14 genes (atpF, rpl2, rpl16, rps16, ndhA, ndhB, petB, petD, trnA-UGC, trnG-GCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) all contained one intron, while the remaining three genes (ycf3 and rps12) contained two introns (Table 4). Among these, the rps12 gene exhibited the highest nucleotide diversity (Pi = 0.05198). Its high variability makes it a potential candidate marker for molecular-assisted taxonomic identification of the broad-spike/narrow-spike clades. rps12 is located in the IR region, while ycf3 is situated in the LSC region, while ndhA, containing only one intron, is localized in the SSC region. The remaining related genes are distributed in the LSC and IR regions (Table 4).

Table 4 Exon–intron structure of annotated genes in the chloroplast genomes.

Codon usage bias analysis

The relative synonymy codon usage (RSCU) quantifies codon usage frequency, revealing codon preferences within the chloroplast genome of Agropyron Gaertn. These preferences reflect characteristics of natural selection, species mutation, and genetic variation. When RSCU > 1, it indicates that the codon is used more frequently and exhibits strong preference; when RSCU < 1, it indicates that the codon is used less frequently and exhibits weak preference; when RSCU = 1, it indicates that the codon shows no preference38. Among the chloroplast genomes of seven species of Agropyron Gaertn and two closely related species (E. trachycaulus and E. elongata), 33 codons exhibit an RSCU value greater than 1. The codon with the highest RSCU value is AUG for methionine (Met), at 6.97; followed by UUA for leucine (Leu), at 2.074; while the lowest was GUG for methionine (Met) at 0.03. Amino acid specificity analysis revealed that methionine (Met), arginine (Arg), leucine (Leu), and serine (Ser) exhibited the highest occurrence frequencies. Tryptophan (Trp) was the only codon showing no preference (RSCU = 1.00), potentially related to its strict monocodonic coding nature in the chloroplast genome. Among codons with RSCU > 1, 29 codons (96.67%) terminated with A or U, while only 3 codons (3.33%) terminated with G or C. This pattern aligns strongly with the AU-enriched nature of the chloroplast genome and transcription optimization mechanisms (Table 5 and Fig. 2A). This preference pattern provides a foundational background for developing molecular markers for identifying clades, aiding in the selection of specific markers at the codon level.

Table 5 Statistical analysis of codon usage bias in chloroplast genomes.
Fig. 2
figure 2

Relative synonymous codon usage (RSCU) frequency of amino acids and codon repeats Note: (A) Amino acid usage frequency calculated via RSCU; (B) Repeat sequence analysis under positive selection in chloroplast genomes. Forward, Palindromic, Reverse and Complement reprent different repeat pattern.

Analysis of interspersed repeats

Analysis of repetitive sequences in seven species of the Agropyron Gaertn and two closely related species (E. trachycaulus, E. elongata) revealed 29 to 39 forward repeats and 13 to 18 palindromic repeats. No reverse or complementary repeat sequences were detected, reflecting an evolutionary strategy for maintaining core functional stability in the chloroplast genome. Notably, E. trachycaulus (ET) and E. elongata (EE) exhibited significantly higher total repeat sequences than the other seven Agropyron Gaertn species. This clades-specific repeat sequence pattern can serve as a reference indicator for molecular marker-assisted taxonomic identification of the broad-spike/narrow-spike clades. (Fig. 2B).

cpSSR analysis

SSR loci were most densely distributed in the LSC region, with total numbers varying among species (Table 6). All species contained SSRs ranging from mononucleotides to hexanucleotides, and clade-specific loci were identified: E. elongata possesses a unique (ATATA)3 pentanucleotide locus (exclusive to the narrow-spike clades); A. desertorun var. pilosiusculum possesses a unique (TC)5 locus (exclusive to the broad-spike clades A. cristatum var. pluriflorum and A. desertorun var. pilosiusculum); A. sibiricum f. pubiflorum possesses a unique (TAAA)4 locus (exclusive to the broad-spike clades A. sibiricum and A. sibiricum f. pubiflorum); E. trachycaulus shares the (CCATA)3 locus with E. elongata (common to the narrow-spike clades), but lacks the (ATATA)3 locus, serving as a marker to distinguish E. trachycaulus from E. elongata. SSR loci in the broad-spike clades are highly conserved, yet exhibit single-nucleotide repeat differences. For example, A. mongolicum var. villosum lacks the (A)11 site, which can serve as a specific auxiliary marker for A. mongolicum var. villosum. The clades specificity and shared patterns of these SSR sites provide a candidate marker library for molecular-assisted taxonomic identification of the broad-spike/narrow-spike clades (Fig. 3).

Table 6 Statistics for sequence repeats (SSR) in the chloroplast genomes of seven Agropyron species and two closely related species.
Fig. 3
figure 3

Statistics on the number of SSR types in the chloroplast genomes of seven Agropyron species and two closely related species.

Analysis of chloroplast nucleotide diversity

Nucleotide diversity (Pi value) serves as a crucial indicator for measuring the degree of nucleic acid sequence variation among different species, with highly variable regions potentially serving as sequence features for population genetics research. Global homology analysis using Mafft software revealed that the rps12 gene within the large single-copy region (LSC) exhibited the highest genetic diversity, with a Pi value peak of 0.05198. Its high variability makes it a core candidate marker for molecular-assisted taxonomic identification of the broad-spike/narrow-spike clades. Further comparisons revealed that genetic variation in single-copy regions (LSC and SSC) significantly exceeded that in inverted repeat (IR) regions. This difference is closely related to the high conservation of IR regions maintained through gene conversion mechanisms during evolution (Fig. 4).

Fig. 4
figure 4

Line chart of chloroplast gene nucleotide diversity Note: The horizontal-axis represents gene names; the vertical-axis indicates Pi values.

Chloroplast boundary analysis

The chloroplast genome adopts a circular structure, with the intercalary region (IR) sharing four boundaries with the left supercalicinal region (LSC) and the right supercalicinal region (SSC): LSC-IRb, IRb-SSC, SSC-IRa, and IRa-LSC. During genomic evolution, IR boundaries undergo expansion and contraction, causing certain genes to enter the IR region or the single-copy region. Therefore, CPJSdraw was employed to visualize this boundary information. By comparing critical boundary connections within the chloroplast genomes of 10 Agropyron species, 1 Elymus species, 1 Elytrigia species, 1 Australopyrum species, and 1 Psathyrostachys species, the study focuses on linkages between the inverted repeat region (IR) and the large single-copy region (LSC) as well as the small single-copy region (SSC). Results revealed that across all examined species:—The rpl22 gene resides within the LSC, spanning 450 bp;—The rps19 and rps15 genes are located within IRb, with rps19 adjacent to the LSC region and rps15 adjacent to the SSC region; The ndhF gene was located within the SSC region; the ndhH gene was situated at the SSC/IRa boundary; the rps19 and rps15 genes were within IRa; and the psbA gene was located within the LSC region. The boundary genes and their connecting lengths showed consistency across seven Agropyron species and two closely related species (Fig. 5).

Fig. 5
figure 5

Comparative analysis of chloroplast genome IR boundaries Note: Thin lines represent junction points among regions, displaying genes adjacent to the junctions.

Chloroplast sequence homology analysis

A collinearity analysis was performed on the chloroplast genomes of 10 Agropyron Gaertn, 1 Elytrigia species, 1 Elymus species, 1 Australopyrum species, and 1 Psathyrostachys species (Table S3). Results revealed homology across all genome sequences, with no significant insertions or deletions detected. The 14 chloroplast genomes were connected by a single red line, indicating highly conserved chloroplast genome structures without gene rearrangements (Table S2 and Fig. 6).

Fig. 6
figure 6

Chloroplast sequence homology analysis. Note: Short blocks represent gene locations in the genome, where white indicates CDS, green indicates tRNA, red indicates rRNA, and connecting lines between colored blocks denote collinear relationships.

Phylogenetic tree analysis

Fourteen species from different genera within the wheat tribe were selected for phylogenetic tree analysis, including 10 Agropyron species, 1 Elymus species, 1 Elytrigia species, 1 Australopyrum species, 1 Psathyrostachys species, and 1 cultivar from Psathyrostachys. Results indicate that the 14 species are divided into two clades. Clade I comprised 10 Agropyron species whose chloroplast genomic characteristics (e.g., codon third position A/U preference > 87%, forward repeat sequence enrichment) strongly aligned with the morphological criteria for broad-spike clades (spike width > 5 mm, lanceolate glumes), supporting the broad-spike clades feature in the North American taxonomic system39. Clade II comprises four species from Elymus species, Campeiostachys species, Elytrigia species, and Australopyrum species, indicating that E. trachycaulus and E. elongata are more closely related to species of the Campeiostachys species and Australopyrum species. These phylogenetic branch results provide molecular evidence at the evolutionary level for molecular marker-assisted identification of the broad-spike/narrow-spike clades, corroborating findings from SSR and IR length markers (Table 1 and Fig. 7).

Fig. 7
figure 7

Phylogenetic tree based on 14 complete chloroplast genomes and related taxonomic clades.

Discussion

Structural variation in chloroplast genomes and its value as molecular markers

Agropyron Gaertn, an important perennial forage resource within the Triticeae tribe of the gramineae family, has garnered significant attention due to its strong stress tolerance and rich genetic diversity. Research indicates that the chloroplast genome of Agropyron Gaertn exhibits a typical quadripartite structure (LSC-IR-SSC-IR), ranging in size from 135 to 137 kb. It contains 130 to 134 annotated functional genes, including 89 to 91 protein-coding genes, 37 to 39 tRNAs, and 8 rRNAs, consistent with the chloroplast genome characteristics of most higher plants40. However, variations with taxonomic and phylogenetic significance were also identified within the conserved framework.

Coding gene variants: identification of core candidate gene markers

Nucleotide diversity (Pi) analysis revealed that the genes rps12 and ycf3 exhibited significantly high variability: The Pi value of the rps12 gene in the LSC region reached 0.05198, the highest among all tested genes. This result strongly aligns with findings from studies on Triticeae relatives (such as Setaria and Hordeum) that “high sequence polymorphism in the rps12 gene can serve as a taxonomic marker”41,42. Its sequence variation effectively distinguishes broad-spike/narrow-spike clades, making it a core candidate gene for molecular marker-assisted identification. Additionally, the ycf3 gene in the LSC region exhibits potential for clade differentiation due to its structural characteristics—containing two introns—and sequence length polymorphism. This finding corroborates the conclusion by Xie et al.43 that “intron features of the ycf3 gene can serve as markers for species and clades identification” in the gramineae, further enhancing the reliability of this gene as an auxiliary taxonomic marker.

Variation in scattered repeat sequences: identification of clade-specific structural markers

Scattered repetitive sequences comprise only forward and palindromic types, reflecting genomic structural stability. Heidari et al.44 noted that forward repeats can promote local sequence amplification through sliding mismatches, while palindromic repeats participate in transcription termination or RNA editing by forming stem-loop structures. The synergistic interaction between these two types may regulate genomic functional diversity. Regarding clades differences, the closely related species E. elongata exhibits a total of 57 repetitive sequences, significantly higher than the Agropyron Gaertn’s broad-spike clades. Similar phenomena have been applied in interspecific hybrid identification within the Triticeae, where differences in repetitive sequence numbers have been confirmed as a key indicator for distinguishing hybrids from their parents45. Wicher et al.46 further confirmed that repeat sequence expansion in gramineae often accompanies fine-tuning of genomic architecture, potentially linked to adaptive potential. Concurrently, the distribution characteristics of repetitive sequences in this study, coupled with codon bias (96.67% of highly biased codons terminate with A/U), collectively reflect the AU enrichment feature of the chloroplast genome in response to mutational pressure and natural selection47. This coevolutionary pattern was also observed in the genus Leymus48 within the gramineae, revealing a common evolutionary principle in the chloroplast genomes of Triticeae plants and providing supplementary evidence for resolving phylogenetic relationships among clades.

SSR and IR region variation: screening of target marker for precise identification

The clade specificity of SSR loci provides direct clues for the precise identification of the broad-spike/narrow-spike clades: E. elongata of the narrow-spike clade possesses a unique (ATATA)3 pentanucleotide locus, while A. desertorun var. pilosiusculum possesses a unique (TC)5 site, while A. sibiricum f. pubiflorum exhibits a distinctive (TAAA)4 site. These sites serve as specific markers at the taxon and species levels, aligning with Deng et al.49 ‘s conclusion in their Triticeae SSR study that “site combination patterns support auxiliary identification.” Furthermore, IR region length variation also holds clear taxonomic value: the IRb length in narrow-spike clades (E. trachycaulus, E. elongata) is uniformly 20,813 bp, significantly shorter than that in broad-spike clades (21,530–21,547 bp). This structural difference can serve as an auxiliary indicator for rapid differentiation between broad-spike and narrow-spike clades.

Molecular marker-assisted taxonomic identification systems and application

Phylogenetic analysis based on chloroplast genomes provides core support for defining the broad-spike/narrow-spike clades within Agropyron species: the maximum likelihood phylogenetic tree reveals two highly supported monophyletic clades, with branch clustering perfectly matching broad-spike/narrow-spike phenotypic traits. The integration of molecular markers—including SSR loci, intergenic region (IR) length, and highly variable genes (rps12, ycf3)—established a multidimensional molecular marker-assisted identification system for the broad-spike/narrow-spike clades.

Highly variable gene markers form the core of the system. The rps12 gene which has the highest nucleotide diversity (Pi = 0.05198) is particularly informative. Its unique structure and copy number variation in the IR region, combined with the length polymorphism of the ycf3 gene intron in the LSC region, provide reliable sequence-level evidence for broad-spike/narrow-spike clades discrimination. This supports findings bu Wu et al.50 in Setaria and Han et al.51 in Agropyron Geartn, confirming the universality of these genes in distinguishing closely related species.

Simple sequence repeat (SSR) loci offer a high-resolution identification tool. The study identified a (CCATA)₃ locus, shared by all narrow-spike clades, that serves as a reliable clade-specific marker. Furthermore, an (ATATA)₃ locus unique to E. elongata, cleanly distinguishes it from E. trachycaulus. This multi-locus identification strategy significantly improves identification accuracy and specificity, an approach widely validated in wheat tribe genomic research49.

Genomic structural variation enable rapid initial screening. We confirmed that the length of the IR region is stably 20,813 bp in the narrow-spike clades, which is significantly shorter than the 21,530–21,547 bp range in the broad-spike clades. This macrostructural difference is easily detectable via conventional PCR and electrophoresis, making it an ideal screening marker for large-scale germplasm resources. This aligns with reports by Qin et al.52 in legumes and Jiang et al.53 in Setaria who also found that IR region variation correlates with clades differentiation.

Scattered repetitive sequences provide supplementary corroborating evidence. The total number of repetitive sequences in the closely related species E. elongata and E. trachycaulus is significantly higher than that in the broad-spike Agropyron clades. This genomic structural difference offers further support for the classification.

In summary, the molecular markers identified in this study provide a practical tool for efficient and precise classification and identification of the broad-spike/narrow-spike Agropyron clades. They also serve as a reference for developing molecular markers in other wheat tribe species, underscoring broad value of chloroplast genomes in plant phylogenetics and taxonomy.

Conclusions

This study conducted an in-depth analysis of the chloroplast genomes of seven species of Agropyron Gaertn and two closely related species, revealing their evolutionary characteristics and taxonomic value. Analysis of chloroplast genome characteristics indicates that Agropyron Gaertn species exhibit a typical quadripartite structure (LSC-IR-SSC-IR), with genome sizes ranging from 135 to 137 kb and containing 131 genes. Among these, rps12 (Pi = 0.05198) and ycf3 were screened out as core candidate genes for molecular marker-assisted taxonomic identification of the broad-spike/narrow-spike clades. The high GC content (43.91%–44.01%) in the IR region correlates with gene conversion mechanisms, while the trnK-UUU intron length variation (2487–2504 bp) in the LSC region serves as a supplementary marker. Specific combination patterns of chloroplast genome SSR loci (e.g., the (CCATA)3 locus present in all narrow-spike clades) aid in distinguishing between wide-spike and narrow-spike clades. Combined features in Elymus trachycaulus which are highly similar with features of Elytrigia elongata such as the high GC content (44.01%) in the IR region and the length (80,642 bp) of the LSC region, supports the traditional identification of Elymus trachycaulus into the narrow-spike clades at the molecular level. Phylogenetic analysis further confirms the evolutionary validity of the identification system for the broad-spike and narrow-spike clades within Agropyron Gaertn. This study identifies chloroplast molecular markers to aid in the taxonomic identification within Agropyron Gaertn. These markers provide a tool for the precise identification, utilization, and conservation of Agropyron germplasm. Future work should focus on validating marker stability across larger populations and integrating gene markers to establish a more robust identification system.