Introduction

The genus Avena contains approximately 30 recognized oat species, including diploids, tetraploids and hexaploids1,2,3. Among them, the common oat (Avena sativa L., 2n = 6x = 42, AACCDD), a globally important cereal crop, was domesticated from the wild hexaploid Avena sterilis L. (2n = 6x = 42, AACCDD) more than 3000 years ago4. Closely related to both is Avena fatua L. (2n = 6x = 42, AACCDD), a weedy hexaploid oat that frequently invades agricultural fields alongside A. sativa. Unlike typical wild oats, A. fatua lacks a distinct natural geographic range and is predominantly associated with human-disturbed habitats. Its taxonomic status remains contentious: Ladizhinski and Zohary (1971) argued that A. fatua should be classified as a morphological type of a single biological species encompassing A. sativa and A. sterilis, given the absence of reproductive isolation5,6. However, this view is not universally accepted7, as many researchers maintain A. fatua as a separate species due to its unique ecological impact and invasive behavior.

Regardless of its taxonomic placement, A. fatua is well-known as one of the worst annual weeds worldwide, causing enormous problems in agriculture and its infection keeps rapidly increasing8,9 (Fig. 1a, b). Depending on plant density and relative time of emergence, A. fatua can reduce annual crop yields by as much as 70%10. In western Canada, over $500 million are expended on herbicides to control A. fatua annually10, and A. fatua also infests over 11 million ha of cropland in the Great Plains and Pacific Northwest regions causing over $1 billion in annual crop losses10. Despite its significant damage as a weed, the weedy oat (A. fatua) harbors valuable alleles that can be utilized to enhance disease resistance and stresses resistance in cultivated oats, such as resistance to crown rust and powdery mildew8,11,12. One successful example is the cultivar “Bayou18”13, which exhibits strong resistance to disease, drought, and barrenness, along with high yield and quality. This cultivar was developed through interspecific hybridization between A. fatua and the cultivated varieties “Jizhangyou4” and “Bayou9”. Therefore, genomic studies of A. fatua have the potential to further our understanding of its evolution and provide a valuable resource for the genetic improvement of cultivated oats.

Fig. 1: Phenotypes of Avena fatua and overview of the A. fatua accession W1004 assembly.
Fig. 1: Phenotypes of Avena fatua and overview of the A. fatua accession W1004 assembly.
Full size image

a The morphological features of A. fatua (W1004) and A. sativa, including the plant, seeds and spike type. The herbicide tolerance of A. fatua and A. sativa. The photo shows A. fatua and A. sativa after treatment with 340 g/hm2 fenoxaprop-P-ethyl for 9 days. b A. fatua plants growing in crop fileds. The arrow indicates the plant A. fatua. c Syntenic blocks shared between A. fatua (W1004) and A. sterilis (W038)/A. sativa (Sanfensan). Gray lines connect matched gene pairs. Inversion blocks are highlighted in orange. The translocation and duplication blocks are highlighted in light green and blue, respectively.

As a cosmopolitan grass weed, A. fatua can be well adapted to changing environmental conditions, and grow not only in various climatic zones from tropics to polar circle, but also on nearly all types of soil8 (Fig. 1b; Supplementary Fig. 1a, b). A. fatua also exhibits strong phenotypic plasticity in morphological characteristics, including plant height, leaf size, tiller number and length, in response to the changes in photoperiod and temperature under controlled environment conditions14. In addition to the excellent adaptation to diverse extreme environments, the consistently high abundance of A. fatua in cropland may also be assisted by its seed shattering and relatively long persistence in the soil seed bank15,16 (Supplementary Fig. 1b). A. fatua exhibits strong competitive ability, high resistance to various abiotic and biotic stresses as well as herbicide tolerance10,17,18 (Supplementary Fig. 1c), making it become a good model for understanding plant adaptation to diverse environments and the genomic basis of A. fatua’s success.

Herbicide resistance in weeds is a global problem threatening crop production19,20. The continuous and rapid increase in herbicide resistance among weeds, driven by intensive herbicide use, has caused substantial crop yield losses and increased production costs21. Many herbicide-resistant weed species are polyploids22, such as the hexaploid wild oat and Barnyard grass, but little is known about its molecular evolution in polyploids. Previous studies have demonstrated that metabolic tolerance often involves multiple detoxification enzyme families23,24, such as glutathione S-transferases (GSTs), which can conjugate glutathione to herbicide molecule, rendering them nontoxic25. Numerous studies have provided evidence for the important roles of enhanced GST activity or increased GST expression in herbicide resistance mechanisms26,27,28,29. Although numerous enzymes and related mechanisms associated with herbicide-resistance have been identified and elucidated in weeds, the availability of genome sequences could accelerate research on these processes. Due to the relatively limited genetic information, few enzymes and related mechanisms associated with herbicide-resistance have been identified and elucidated in A. fauta30,31,32. Exploring herbicide-tolerance mechanisms and identifying the key loci responsible for strong herbicide resistance in A. fatua could aid in the development of new herbicides and the breeding of cultivated oats with strong herbicide resistance.

Here, we report a near-complete genome assembly for A. fatua (W1004) through the integration of HiFi, Nanopore, and Hi-C sequencing techniques. Additionally, we construct a comprehensive genomic variation map consisting of 768 wild and cultivated oats worldwide (288 cultivated oats from A. sativa, 443 wild oats from A. fatua, and 37 previously released33 wild oats data from A. sterilis). Population genetic analysis reveals the evolutionary history of A. sterilis, A. fatua and A. sativa. It also aids in the identification of highly divergent genomic regions between A. fatua and A. sativa, and the genes in these regions are largely associated with responses to diverse extreme environments. One highly divergent genomic locus on chromosome 4D with expanded GST genes in A. fatua is identified. By combining transcriptome and chromatin accessibility profiling with functional validation using knock-down and over-expression transgenic lines in oat, we demonstrate the crucial role of a recently expanded cluster of GST genes in conferring strong herbicide resistance in A. fatua. Our study provides resources for elucidating oat genome evolution and understanding the genetic basis of A. fatua’s adaptation to diverse and extreme environmental conditions.

Results

High-quality genome assembly, annotation and comparative genomics of A. fatua

To uncover the genome architecture of A. fatua, we obtained 306.43 Gb of Pacbio HiFi long reads with a N50 length of 19.61 kb, 317.73 Gb ONT ultra-long reads with N50 of 74.51 kb and 196.64 Gb of high-throughput chromosome conformation capture (Hi-C) data (Supplementary Table 1). The HiFi reads and ONT ultra-long reads were used to construct a preliminary assembly comprising 145 contigs with the contig N50 of 473.48 Mb. Utilizing Hi-C data, the assembled contigs were further anchored onto 21 pseudochromosomes (Supplementary Fig. 2; Supplementary Fig. 3a). As a result, we achieved a final assembly of 10.98 Gb with a contig N50 length of 473.48 Mb. More than 99.79% of sequences were anchored to the 21 chromosomes with only 22 gaps, and among them, 10 chromosomes were gap-free (Supplementary Fig. 2; Table 1; Supplementary Table 2-4). These chromosomes were further assigned to A, C, and D subgenomes based on synteny with previously published hexaploid oat genomes3, and subgenome-specific k-mers34 (Supplementary Fig. 3b, c).

Table 1 Statistics of genome assembly and annotation for A. fauta accession W1004

This highly contiguous assembly enabled us to identify all telomere regions at 42 chromosomal ends (Supplementary Table 5). The Benchmarking Universal Single-Copy Orthologs (BUSCO) evaluation with the embryophyta_odb10 database demonstrated high genome completeness, with 98.50% of conserved orthologs successfully detected under the genome mode (Table 1; Supplementary Fig. 3d). To further evaluate its quality, we remapped all primary sequencing data to the A. fatua (W1004) genome assembly, achieving mapping rates of 99.94% for HiFi reads, 99.97% for ONT reads and 99.94% for Hi-C reads. Moreover, the consensus Quality Value (QV) of k-mer-based estimates reached 72.42, exceeding all of the published oat genomes (Supplementary Fig. 3e). These metrics showed that we have obtained a high-quality and near-complete A. fatua (W1004) genome assembly (Supplementary Fig. 2).

For genome annotation, we first characterized repetitive sequences, which comprise 9.70 Gb (88.38%) of the assembled genome (Table 1), consistent with other oat genomes. The majority (87.26-89.64%) of these repeats were transposons distributed across all three subgenomes (A: 87.83%, 3.02 Gb; C: 89.64%, 3.78 Gb; D: 87.26%, 2.85 Gb; Supplementary Table 6). Retrotransposons dominated the transposable element landscape, with Gypsy-type LTR retrotransposons representing the most abundant class (35.17% of total transposon length; Supplementary Fig. 3f). For protein-coding gene annotation, we integrated evidence from five tissue-specific RNA-seq datasets with ab initio predictions and homology-based searches. This comprehensive approach identified 135,470 high-confidence protein-coding genes (Table 1; Supplementary Table 7), establishing A. fatua (W1004) as having one of the most extensively annotated oat genomes. We also applied the same annotation procedure to other related oats genomes, including Sang2, SanfenSan3, and OT3098, identifying 135,528, 131,864, and 131,157 genes, respectively, to further facilitate their comparison (Supplementary Table 7).

To identify the closest extant relatives of each subgenome and to verify the allopolyploid origin of A. fatua, we conducted a phylogenetic analysis using 4709 single-copy orthologs from 19 (sub) genomes representing 10 Pooideae species (Fig. 2a; Supplementary Table 8). The result indicated that the divergence between A. fatua (W1004) and hulled oat (Sang) as well as naked oat (Sanfensan3) occurred more recently, at 0.79 ~ 1.13 Mya, and A. fatua was identified as the closest outgroup to cultivated oats (Fig. 2a). To better capture the genomic features of A. fatua (W1004) genome, we performed a comparative analysis of large-scale structural variations among A. fatua (W1004), cultivated oat genomes (Sanfensan) and A. sterilis (Fig. 1c; Supplementary Fig. 4). This analysis revealed numerous large-scale inversions specific to A. fatua (W1004), particularly on chromosomes 4 A and 4 C, which are absent in both A. sterilis and cultivated oats (Supplementary Fig 4d–f).

Fig. 2: Phylogeny of Avena genomes and gene family analysis in A. fatua accession W1004.
Fig. 2: Phylogeny of Avena genomes and gene family analysis in A. fatua accession W1004.
Full size image

a Phylogenetic relationship of 19 subgenomes from 10 species based on single-copy orthologs. Numbers in black represent the divergence time of each node (MYA, million years ago). The numbers of gene family expansion and contraction are enumerated below the species names in red and green, respectively. b Cluster analysis of gene families in four oat genomes. c Gene ontology enrichment analyses of expanded gene families in A. fatua (W1004). d The number of different types of GSTs in four oat genomes. e The number of Tau genes in the A, C and D subgenomes of four oat genomes. f GSTs located on the 1D and 4D chromosomes in A. fatua (W1004), Sang, Sanfensan, and OT3098. Red lines indicate GST genes. Source data are provided as a Source Data file.

Comparative genomic analysis revealed 472 expanded gene families in A. fatua (W1004) versus cultivated oats, including stress-related ‘Gibberellin-insensitive, Repressor of gal-3 and Scarecrow (GRAS)’, ‘Glutathione-S transferase (GST)’, and ‘UDP-Glycosyltransferase (UGT)’ families (Fig. 2b, c; Supplementary Fig. 5a). GRAS, an important gene family in plant development and response to multiple stresses35, showed notable expansion (187 vs. 162-179 in cultivars), particularly on the A subgenome (Supplementary Fig. 5a, b). And homologs of many cloned disease resistance genes co-localized with R gene-rich regions in A. fatua (W1004) (Supplementary Note 1; Supplementary Fig. 5c–e; Supplementary Table 9). We also found that GSTs exhibited dramatic expansion (266 vs. 236-248 in cultivars), with Tau-class genes specifically amplified on chromosome 4D (13 genes vs. 9–11 in cultivars) (Fig. 2d–f; Supplementary Fig. 6). RNA-seq analysis demonstrated tissue-specific expression of these GSTs, with preferential accumulation in root and leaf (Supplementary Fig. 6e), consistent with their roles in detoxification and herbicide tolerance36,37. These expansions likely underlie A. fatua’s unique environmental adaptability.

Genetic relationship and evolutionary history of A. sterilis, A. fatua and A. sativa

To reveal the evolutionary relationships among wild, weedy and cultivated oats, we performed whole-genome resequencing for 443 A. fatua accessions collected from roadsides and fields. The sampled A. fatua accessions cover three representative ecological niches in northern China, including 267 accessions from the Tibet Plateau (AFT, average altitude ~3800 m), 71 accessions from Xinjiang (AFXJ, average altitude ~1006 m), and 105 accessions from Zhangjiakou (AFZJK, average altitude ~1416 m) (Fig. 3a; Supplementary Fig. 7a, b; Supplementary Data 1). We further sequenced a diverse collection of cultivated oats (A. sativa) consisting of 288 globally distributed accessions (Fig. 3b; Supplementary Table 10; Supplementary Data 1). In total, 58.16 Tb of resequencing data were generated, with an average depth of 7.29 (79.56 Gb per accession) (Supplementary Data 2). By combining the previously released genomic data of 37 wild (A. sterilis) accessions from different regions of Israel33 and taking the A. fatua (W1004) genome as the reference, we obtained 76,205,385 high-quality single-nucleotide polymorphisms (SNPs) across the genome, with 84.65% in intergenic and 1.97% in coding regions, respectively (Supplementary Tables 11 and 12). Subgenome D exhibited elevated SNP density, particularly on chromosomes 1D and 4D (Supplementary Fig. 7c; Supplementary Table 12).

Fig. 3: Population genetic analysis of A. sterilis, A. fatua and A. sativa.
Fig. 3: Population genetic analysis of A. sterilis, A. fatua and A. sativa.
Full size image

a Geographic distribution of A. fatua accessions in China. The red dots represent A. fatua accessions in the Zhangjiakou (Hebei province); The green dots represent A. fatua accessions in Xinjiang province; The blue dots represent A. fatua accessions in Tibet. DEM data were obtained from the NASA SRTM global DEM (https://lpdaac.usgs.gov/) and the study area was extracted using ArcGIS 10.8. b Worldwide distribution of A. sterilis and A. sativa. The radius of each pie chart represents the sample size in each country and the colors indicate the proportions of the A. sativa (CSA, dark blue), the A. sativa (CNU, yellow), and the wild relatives A. sterilis (WST, red) respectively. c Model-based clustering analysis with different numbers of ancestry kinship (K =  2-4). Species names are indicated by the colored bar at the bottom. WST is A. sterilis, AFT, AFXJ, and AFZJK are A. fatua from Tibet, Xinjiang and Zhangjiakou; CSA is hulled oat of A. sativa, and CNU is naked oat of A. sativa. The world map was constructed using the R package ggplot2 with the Natural Earth dataset (http://www.naturalearthdata.com). d PCA plot of A. sterilis, A. fatua and A. sativa accessions. Different colors represent different groups as follows: red, WST (A. sterilis); green, AFT (A. fatua); orange, AFXJ (A. fatua); pink, AFZJK (A. fatua); dark blue, CSA (A. sativa) and yellow, CNU (A. sativa). e Summary of nucleotide diversity (π) and population divergence (FST) among A. sterilis, A. fata and A. sativa (CSA and CNU). f Linkage disequilibrium was measured as the squared correlation coefficient between genotypes (r2) in the WST, CSA, CNU and A. fatua groups. g Genome-wide screening and annotation of selected sweeps. The XP-CLR value between the A. fatua and the A. sativa was plotted across the 21 chromosomes. The orthologs of known adaptation-related genes are shown above the XP-XLR value. Source data are provided as a Source Data file.

To reveal the population structure of these oat populations, we performed principal components analysis (PCA) and ADMIXTURE analysis. To mitigate sample size bias among the three oat species, we performed SNP ascertainment using a balanced panel consisting of 30 randomly selected samples per species and used this SNP panel for subsequent analyses (Supplementary Fig. 8; see Methods). PCA revealed a clear separation of A. sterilis (WST) accessions from other populations along the first principal component, with hulled oats (CSA) forming an intermediate cline and naked oats (CNU) clustering at the opposite end, which is consistent with the notion that A. sativa was domesticated from A. sterilis4 (Fig. 3d). Along the second principal component, A. fatua accessions were distinctly separated from cultivated oats. Within A. fatua, accessions from different regions formed overlapping clusters, with Xinjiang accessions positioned intermediately between those from Zhangjiakou and Tibet, reflecting their geographic distribution (Fig. 3d). The ADMIXTURE analysis closely mirrored PCA results (Fig. 3c; Supplementary Fig. 7d), identifying A. sterilis as harboring a distinct genetic component at K = 2. At K = 3, A. sativa and A. fatua were modeled with predominantly different ancestral components. The lowest cross-validation error occurred at K = 4 (Supplementary Fig. 8b), where cultivated oats split into two components corresponding to hulled and naked varieties. Phylogenetic analysis further supported these patterns (Supplementary Fig. 7e), with oat accessions clustering into three major clades representing the three species, and naked oats forming a distinct subclade within cultivated oats. These results demonstrate that A. fatua accessions from northern China are genetically distinct from both cultivated and wild oats, consistent with their unique ecological niche and morphological characteristics. In PCA, A. fatua accessions clustered closer to naked oats than to hulled oats, forming a cline that suggests closer genetic affinity with naked oats (Fig. 3d; Supplementary Fig. 8c). The pattern on PCA is also supported by D-statistics as A. fatua populations positioned closer to naked oat share more alleles with it than others (D (AFT/AFXJ, AFZJK; CNU, A. sterilis) <0, 7 <|Z | <31) (Supplementary Fig. 7f). Together, these results demonstrate a genetic connection between northern Chinese A. fatua and the widely cultivated naked oats, potentially reflecting either a direct evolutionary origin from naked oats or extensive historical gene flow between these groups. Moreover, ADMIXTURE analysis consistently detected a minor A. sterilis-related component in A. fatua populations that is absent in naked oats, indicating additional hybridization occurred between A. fatua and A. sterilis.

To investigate genetic diversity and divergence among the four groups, we calculated the nucleotide diversity (π) for each group and conducted a pairwise analysis of genetic distances (FST). The naked oat population (CNU) exhibited the lowest nucleotide diversity (π = 0.61 × 10−3), while the A. fatua population showed the highest genetic diversity (π = 1.06 × 10−3) among the four groups. Within A. fatua, genetic diversity decreased from high altitude (AFT) to low (AFXJ and AFZJK), with the highest π in Tibet province of China (Supplementary Table 13). The genetic differentiation between A. fatua and CNU (0.272) is lower than the comparisons between A. fatua and CSA/WST (0.337/0.345), further supporting their genetic connection (Fig. 3e). The lowest level of linkage disequilibrium (LD) decay distance was observed for WST and highest level of LD was observed for CNU (Fig. 3f), suggesting a substantial decrease in genetic diversity of CNU during its unique process of evolution and domestication history in China. Together, our results demonstrated the close relationship between A. fatua and A. sativa, and shed light on the genetic relationships and evolutionary history of A. sterilis, A. fatua from various geographical regions in China, and A. sativa worldwide.

Genomic signatures of selection underlying A. fauta’s environmental adaptability

A. fatua has long been noted for being adaptable to a wide range of extreme environments38. To uncover the genetic basis, we compared the population genomes of A. fatua and its closely related, A. sativa, using six complementary methods to identify genomic regions with selection signatures in A. fatua population (Fig. 4a; Supplementary Fig. 9a). (I) Top 10% genomic regions with the greatest differences in the average reads coverage of the whole-genome resequencing data from A. fatua and A. sativa accessions; (II) Top 10% genomic regions with highest structural variations (SVs) frequencies between A. fatua and all cultivated oat genomes (Supplementary Note 2; Supplementary Tables 14 and 15); (III) Cross-population extended haplotype homozygosity (XP-EHH) for the top 10% haplotypes; (IV) Cross-population composite likelihood ratio (XP-CLR) for the top 10% genomic regions (Supplementary Note 3); (V) Fixation statistics (FST) for top 10% genomic regions; (VI) Nucleotide diversity PI (π A. sativaA. fatua) for top 10% genomic regions. Specifically, methods I and II searches for differentiated genomic regions between the two species based on the pattern of large sequence alterations, while methods III to VI focused on SNP patterns. Collectively, these approaches identified 1.11–1.71 Gb of regions (spanning 12,896–22,051 genes) (Figs. 3g, 4b, c; Supplementary Fig. 9b; Supplementary Note 3; Supplementary Data 3). Among them, we identified 2,417 candidate genes detected by at least five methods (Fig. 4d), suggesting strong divergence between A. sativa and A. fatua with potential selection in A. fatua. We focused on these genes to investigate the genomic basis of their differences in environmental adaptation and phenotypic traits. Functional enrichment analysis revealed strong associations with stress responses (e.g., oxidative stress, auxin signaling) and growth/development (Supplementary Fig. 10a). A total of 105 candidate genes are homologs of known stress-tolerance genes in rice and wheat (Fig. 4e; Supplementary Data 4), linked to heavy metal resistance (8), salt tolerance (22), heat resistance (4), cold resistance (13), drought resistance (14), disease resistant (41) and herbicide resistance (3). Another 51 genes were homologs of genes conferring for important agronomic traits (Fig. 4e; Supplementary Data 4), including yield and quality (16), growth and development (25), sterility (9) and shattering (1). Besides, we found stress-related genes exhibited lower sequence similarity between A. fatua and A. sativa than agronomic-related genes (Fig. 4e), suggesting significant divergence in environmental adaptation between A. fatua and A. sativa.

Fig. 4: Genome-wide scan for regions of genetic divergence between A. fatua and A. sativa.
Fig. 4: Genome-wide scan for regions of genetic divergence between A. fatua and A. sativa.
Full size image

a Whole-genome signatures screening for signatures of divergent regions in the D subgenome between A. fatua and A. sativa using multi methods. The tracks show (I) SNP density along the chromosomes in 1 Mb windows; (II) Top 10% genomic regions with the greatest differences in the average reads coverage of the whole-genome resequencing data from A. fatua and A. sativa accessions; (III) Top 10% genomic regions with highest structural variations (SVs) frequencies between A. fatua and all cultivated oat genomes; (IV) Cross-population Supplementary haplotype homozygosity (XP-EHH) for the top 10% haplotypes; (V) Cross-population composite likelihood ratio (XP-CLR) for the top 10% genomic regions; (VI) Fixation statistics (FST) for top 10% genomic regions; (VII) Nucleotide diversity PI (π A. sativaA. fatua) for top 10% genomic regions; (VIII) Regions considered as highly differentiated common candidate regions were identified by more than five methods. The area box in black represents the highly differentiated region. b Statistics of total length of divergent regions identified by different methods in the A, C, and D subgenomes. c Number of genes in divergent regions identified by different methods in the A, C, and D subgenomes. d UpSet plot showing the number of genes identified by the four, five, and six methods. Pie charts showing the total number and proportion of genes identified by four, five and six methods. e Top: The similarities of 156 environmental adaptability and agronomically important genes in A. fatua to their homologs in A. sativa. Bottom: The functional classification of the 156 environmental adaptability and agronomically important genes. f An example of regions at 4D chromosome that are highly divergent between A. fatua and A. sativa. The heatmap of genotypes of two main haplotype in this region. Syntenic relationship of the expanded GST region in the A. fatua (W1004) compared to that in cultivated oats (Sanfensan, Sang and OT3098). The blue boxes indicate the GST genes. Red are private GST genes in A. fatua (W1004). Source data are provided as a Source Data file.

This candidate gene set provides important foundation for unveiling the genetic mechanism of the broad adaptation to diverse environments in A. fatua. To facilitate their utilization, we conducted detailed analysis of these genes, incorporating information on SNP variations, resulting amino acid changes, allele frequencies across species, functional annotations of their rice homologous, and tissue-specific expression patterns (Supplementary Data 5 and 6). For example, 2Cg0008813, the ortholog of OsTPS28, encodes a terpene synthase that confers blast and blight resistance in rice39. In oat, this gene is highly expressed in stems and harbors nine non-synonymous SNPs distinguishing A. fatua from A. sativa (Supplementary Fig. 10b; Supplementary Data 5 and 6). Similarly, 5Dg0059047, an ortholog of HIS1 that confers resistance to benzobicyclon and other β-triketone herbicides in rice40, exhibits strong floral expression and contains three non-synonymous SNPs defining divergent haplotypes between A. fatua and A. sativa (Supplementary Fig. 10b; Supplementary Data 5 and 6).

We observed significant phenotypic divergence between A. fatua and A. sativa, including stronger seed-shattering and longer awns in A. fatua (Fig. 1a; Supplementary Fig. 1b). Among our candidate genes, 3Cg0189326-a homolog of rice qSH1 that regulates seed shattering41-exhibited two major haplotypes. Haplotype1 was present in 97% of A. fatua accessions, where haplotype2 dominated cultivated oats (94%). Variant analysis revealed that two non-synonymous SNPs and a 21-bp deletion in the first exon of 3Cg0189326, prevalent in most A. fatua accessions (Supplementary Fig. 10c). Long awns are crucial for seed dispersal and propagation. We identified 2Cg0006814, encoded a cytokinin-activating enzyme to LONG AND BARBED AWN1 (LABA1), which was known to control awn elongation in rice42. Haplotype analysis revealed a non-synonymous mutation defining two haplotypes: haplotype1 was present in 93% of the A. fatua accessions, contrasting to only 12% of accessions in the A. sativa (Supplementary Fig. 10d). Furthermore, haplotype1 contained a 5-bp deletion in the upstream regulatory region, disrupting a predicted NAC transcription factor binding site. These mutations represent candidate functional variants underlying the observed interspecific variation in awn length. (Supplementary Fig. 10d). Nevertheless, the functional relevance of these candidate genes remains to be experimentally validated in oat.

The distal region of chromosome 4D, exhibiting GST gene expansions in A. fatua, emerged as a strongly differentiated region between A. fatua and A. sativa across all six detected methods (Fig. 4a, f). This locus harbored two distinct haplotypes segregating between A. fatua and A. sativa, and the density of SVs and divergent SNPs was high in this region. The high FST values between the two species associated with the markedly low nucleotide diversity (π) in the A. fatua population in this interval probably indicated that this genomic region was positively selected in A. fatua (Fig. 4f). This result further supports that the GST gene cluster in the locus have contributed to environmental adaptation in A. fatua.

Overall, we provided a comprehensive catalog of selective sweeps in A. fatua, which could be useful resources for future identifying important genes associated with A. fatua’s environmental adaptability.

Multi-omics analysis of herbicide response mechanisms in A. fatua

A. fatua is reported to exhibit remarkable herbicide resistance38, and our herbicide screening experiments confirm that A. fatua exhibits a significantly higher survival rate following herbicide treatment compared to A. sativa (Supplementary Data 7). However, the genetic mechanisms underlying this trait remain unknown. To investigate this, we conducted RNA-seq analysis of leaf and root tissues at three time points (6 h, 24 h, and 7 d) following fenoxaprop-P-ethyl (85 g/hm2) treatment (Fig. 5a). Principal component analysis confirmed data reproducibility (Supplementary Fig. 11a; Supplementary Data 8). We identified 23,833 differentially expressed genes (DEGs) across treatments (Supplementary Fig. 11b–d; Supplementary Table 16), with 2685 and 559 showing consistent differential expression in leaf and root, respectively (Fig. 5b; Supplementary Fig. 11d). DEGs numbers increased with duration (Fig. 5b; Supplementary Table 17). Functional enrichment revealed defense-related response, including amino acid catabolism, auxin response, glutathione metabolism, DNA damage repair (Supplementary Fig. 11e). Among 23 genes whose homologs are related to herbicide resistance in other plants, 21 out of 23 and 20 out of 23 genes response to herbicide treatment in leaf and root of A. fatua, respectively (Supplementary Fig. 11f, g). A total of 142 out of 266 GST genes showed differential expression (Fig. 5c; Supplementary Fig. 11h), supporting GST’s role in herbicide resistance.

Fig. 5: Transcriptome and chromatin accessibility profiling of herbicide response in A. fatua assession W1004.
Fig. 5: Transcriptome and chromatin accessibility profiling of herbicide response in A. fatua assession W1004.
Full size image

a Photographs of A. fatua plants under 0 g/hm2 fenoxaprop-P-ethyl control (top) and 85 g/hm2 fenoxaprop-P-ethyl treatment (bottom) at different time point. b Statistics of the DEGs between under the 85 g/hm2 fenoxaprop-P-ethyl treatment (6 h, 24 h and 7 d) and control plants in root and leaf. Different colors represent the number of DEGs at different time points and tissues. c Expression heatmap for GST genes after herbicide treatment. d Heatmap shows transcriptional, chromatin accessibility of DEGs. e TF family enrichment analysis in each group. f GO enrichment analysis in each group. P-values were calculated using a hypergeometric test (one-sided). g Dynamic transcription (top) and chromatin accessibility tracks (bottom) for three GST genes of 1Dg0078713, 2Dg0156249, and 7Dg0137778. Source data are provided as a Source Data file.

Dynamics of chromatin accessibility is an important factor regulating gene expression during various stresses43. To explore the potential impacts of chromatin sequencing (ATAC-seq) libraries under 85 g/hm2 fenoxaprop-P-ethyl treatment and control conditions at two time points for leaves: 24 hours and 7 days. A total of 8931 and 10,152 differentially accessible regions (DARs) were identified, respectively, with strong enrichment in the promoter regions (Supplementary Fig. 12a-c). Integration with RNA-seq data revealed that 1908 herbicide-responsive DEGs were associated with these chromatin accessibility changes, which were classified into four distinct expression clusters (G1-G4, Fig. 5d). Transcription factors (TFs) binding to the cis-regulatory elements in open chromatin regions (OCRs) always governs the gene expression44. TF binding site enrichment patterns differed significantly among DEG-associated accessible regions in clusters G1-G4 (Fig. 5e; Supplementary Fig. 12d). SBP and bZIP were enriched within clusters G1 and G3, respectively; NAC and SBP were enriched at the cluster G2; and MYB and TCP stood out within cluster G4. Furthermore, GO enrichment analysis revealed temporal patterns of highly expressed genes (Fig. 5f): prior to herbicide treatment, genes associated with oxygen binding, carbohydrate metabolic, hormone biosynthetic, protein-DNA complex and protein heterodimerization activity (G2 and G4) were highly expressed. After herbicide treatment, genes related to calcium ion and carbohydrate derivative binding, protein phosphorylation and kinase activity (G3) as well as genes related to response to oxidative stress, endogenous stimulus, hormone and stress (G1) were highly expressed. The key herbicide-responsive module-comprising genes linked to glutathione metabolism, including members of the GST family was strongly induced post-treatment (Fig. 5f, g), suggesting a role of xenobiotic detoxification.

By integrating transcriptomic and chromatin accessibility analyses, we uncovered the dynamic genetic and epigenetic mechanisms underlying the response to herbicide treatment in A. fatua, providing a critical resource for understanding the herbicide tolerance in A. fatua.

qGST4D underlies herbicide tolerance in A. fatua

To further unravel the genomic basis of herbicide tolerance in A. fatua, we performed a genome-wide association study (GWAS) using survival rates following herbicide (fenoxaprop-P-ethyl) treatment as the phenotypic trait (Supplementary Fig. 13a). We identified a major association peak on the end of chromosome 4D (Fig. 6a; Supplementary Fig. 13b), overlapping with the locus we noted in the gene expansion and selective sweep analyses above, which we termed qGST4D (4D: 442-444 Mb) here. The lead SNP (P = 1.189251E-20) of the peak is located within the GST gene clusters on the A. fatua (W1004) genome (Fig. 6a). Moreover, combining evidence from multi-omics analyses and qRT-PCR validation, we found 11 GST genes in the qGST4D locus are responsive to herbicide treatment (Fig. 5c; Supplementary Fig. 13c), indicating the GST cluster might be the major contributor to the herbicide resistance variation in oat population, driving the strong genome-wide association signal.

Fig. 6: qGST4D underlies herbicide tolerance in A. fatua.
Fig. 6: qGST4D underlies herbicide tolerance in A. fatua.
Full size image

a Top: Manhattan plot of GWAS for herbicide tolerance in oat. Bottom: Zoomed-in view of the significant association region on chromosome 4D (434.8-450.8 Mb). b Read mapping diagrams for 4Dg0135144 in A. fatua and A. sativa. c 4Dg0135144 was identified by PCR in 6 A. fatua and 6 A. sativa lines. Amplified fragment size (~1283 bp) matched genomic sequence inference, and the DNA marker ranged from 100–2000 bp. d Time-course qRT-PCR analysis of 4Dg0135144 expression before and after herbicide treatment. Plant materials were from roots and leaves at 6 h, 24 h, and 7 days post-treatment. AsACT2 was the internal control. Student’s t test determined significance; Error bar represents the SD of three biological replicates. e Screenshots of chromatin accessibility shown by IGV browse. fg Assessment of herbicide tolerance of the 4Dg0135144 knockdown lines. TRV1 and TRV2 empty vectors as negative controls, PDS gene as positive control. Photographs were taken 7 days post-herbicide treatment. Scale bars, 5 cm. h Values of survival rate statistics were means ± SDs (n = 5 replicates) after a 7-day treatment with herbicides; i Detection of physiological indexes in 4Dg0135144 knock-down lines pre- and post-herbicides treatment: catalase activity, soluble sugar content, and chlorophyll content. Leaf samples were taken 3 days post - treatment. The Student’s t test was performed to determine statistical significance. The values are the means ± SDs (n = 6 replicates). jl Over-expression of 4Dg0135144 enhanced plant herbicide tolerance. Photographs were taken 15 days post-herbicide treatment. Scale bars, 5 cm. T0 transgenic plants with pUBI::4Dg0135144-GFP and T1 transgenic plants with pUBI::4Dg0135144-Flag were used for herbicide phenotype evaluation. (d, h and i were used two-sided Student’s t-test, p < 0.05; p < 0.01; p < 0.001; p < 0.0001; ns, p ≥ 0.05). Source data are provided as a Source Data file.

To further test the role of qGST4D in herbicide resistance, we focused on 4Dg0135144, a GST gene expanded in the A. fatua (W1004) genome within the locus, for further validation. Read coverage (Fig. 6b; Supplementary Fig. 13d) and PCR (Fig. 6c) results confirmed that 4Dg0135144 is present in 96.74% A. fatua accessions compared to 53.41% accessions in A. sativa. Time course qRT-PCR result showed that the expression of 4Dg0135144 was significantly up-regulated herbicide treatment both in root and leaves (Fig. 6d). The ATAC-seq result also showed the chromatin accessibility in the upstream and downstream regions of 4Dg0135144 was increased after herbicide treatment (Fig. 6e). To validate its function, we developed a TRV (Tobacco rattle virus) -induced gene silencing system in A. fatua to obtain knockdown lines of 4Dg0135144, in which gene expression were successfully down-regulated to less than 50% (Supplementary Fig. 14a, b). After 7 days of herbicide treatment, survival rate of the two knockdown lines both decreased significantly compared to the control (P = 0.0004/0.0213, Two-tailed student’s t test; Fig. 6f–h). Previous studies have shown herbicide treatment can affect the burst of reactive oxygen species (ROS), plant cell permeability, and photosynthesis45,46,47,48,49,50. We then measured CAT enzyme activity, the contents of soluble sugar and chlorophyll to evaluate the herbicide resistance of the knock-down line of 4Dg0135144. Our result showed that all the three physiological indices of the knock-down line were significantly lower than the control after herbicide treatment (CAT enzyme activity, P = 0.0003; The contents of soluble sugar, P < 0.0001; The contents of chlorophyll, P < 0.0001; Two-tailed student’s t test) (Fig. 6i). These results showed that knock-down of 4Dg0135144 reduced the herbicide resistance in A. fatua. Moreover, we constructed the pUBI::4Dg0135144-GFP and pUBI::4Dg0135144-flag vector, to generate the over-expression transgenic plants of 4Dg0135144 (4Dg0135144-OE) in oat, with the transgenic line pUBI::GFP or wide type (WT) as the negative control (Supplementary Fig. 14c–h). We found 4Dg0135144-OE plants exhibited greater resistance to herbicide treatment compared to control plants (Fig. 6j-l), concomitant with increased survival rates (P = 0.0042, Two-tailed student’s t test) (Supplementary Fig. 14i) and elevated catalase (CAT) enzyme activity (line 2, P = 0.0007; line 6, P = 0.0003; Two-tailed student’s t test) (Supplementary Fig. 14j). These results demonstrate that 4Dg0135144, an expanded gene within the qGST4D locus, is a key determinant of herbicide tolerance in A. fatua. This finding further corroborates the functional importance of the GST gene cluster in this locus and its contribution to the evolution of enhanced herbicide tolerance in A. fatua.

Discussion

A. fatua is well-known as one of the worst annual weeds worldwide and can adapt to diverse climatic and extreme conditions51. Its genome harbors numerous stress-resistant alleles, which-given the lack of reproductive barriers with cultivated oats (A. sativa)-could serve as a valuable genetic resource for enhancing abiotic stress tolerance in cultivated oats52. In this study, we present a near-complete reference genome assembly and comprehensive annotation for A. fatua. This resource will enable precise gene discovery and accelerate molecular breeding efforts-including marker-assisted selection and interspecific hybridization-in cultivated oats (A. sativa). Moreover, through population genomic sequencing and analyses of 443 A. fatua accessions collected from three distinct ecological zones in northern China, we identified multiple genomic regions showing strong differentiation from cultivated oats and signatures of selection. These loci represent promising targets for investigating the genetic basis of A. fatua’s adaptation to extreme environments.

One of the key traits underlying the remarkable invasiveness of A. fatua is its herbicide tolerance. In this study, we performed multi-omics analyses to characterize the response to herbicide treatment, revealing coordinated regulatory dynamics at the transcriptomic and chromatin accessibility levels. These findings provide a valuable resource for investigating the molecular basis of herbicide tolerance in this invasive weed. Through genome-wide association studies, we identified a major locus associated with herbicide resistance. This locus harbors a GST gene cluster, with most A. fatua accessions carrying a haplotype characterized by GST gene expansion (Fig. 4f). Using knockdown and overexpression transgenic experiments, we functionally validated the contribution of one expanded GST gene to oat herbicide resistance. Our results demonstrate that copy number variation of GST genes underlies variation in herbicide resistance. At this locus, A. fatua and cultivated oat (Avena sativa) populations exhibit strong genetic differentiation. The A. fatua population is dominated by a single haplotype and shows signatures of selection (Fig. 4f). This selected haplotype is also present in cultivated and wild oats (A. sterilis), suggesting that the adaptive haplotype in A. fatua represents standing genetic variation in oats that was selectively favored in weedy populations, facilitating their adaptation.

Although the weedy oat is classified as a distinct species (A. fatua), it lacks reproductive isolation from its close relatives, the wild oat (A. sterilis) and cultivated oat (A. sativa), leaving its evolutionary status unresolved. Here, we generated a population genomic dataset comprising multiple A. sterilis accessions from Israel and globally representative A. sativa accessions, enabling us to assess the evolutionary origins of A. fatua populations collected from northern China. Population genetic analyses revealed that these weedy oats are genetically more closely related to cultivated oats, particularly to naked oats (A. sativa ssp. nuda) varieties widely grown in China. This suggests two possible scenarios: (1) they originated via feralization of local cultivated oats, or (2) they experienced extensive historical gene flow with domesticated populations. However, more detailed demographic modeling is required to resolve their precise evolutionary origin. We acknowledge that our sampling was geographically limited. Given A. fatua’s cosmopolitan distribution, broader sampling is required to fully elucidate its genetic ancestry and establish a global understanding of its genomic basis, as A. fatua populations from different regions may have distinct origins, similar to patterns observed in other crop systems such as rice53.

Methods

Plant materials and genome sequencing

The A. fatua accession W1004 was originally collected in Lhasa, Tibet Province, China (91.39 E, 229.84 N). For genome sequencing, high-quality genomic DNA was extracted from leaves using a DNA extraction kit. SMRTbell libraries were sequenced on a PacBio Sequel II system with HiFi mode. A total of 306.43 Gb circular consensus sequencing (CCS) data were obtained. Hi-C libraries were constructed from tender leaves using the Illumina NovaSeq platform. This allowed us to generate a total of 196.64 Gb of sequencing data. The ONT ultra-long sequencing library was prepared with the Oxford Nanopore SQK-LSK109 kit and subsequently sequenced using a Nanopore PromethION platform. To extract RNA for gene annotation and gene expression analysis, samples from five tissues, including root, stem, leaf, flower and seed, were harvested and frozen immediately in liquid nitrogen. Three independent experiments were used for RNA-seq. Sequencing was conducted by a commercial company (Annoroad Gene Technology, Beijing, China).

Genome assembly and quality assessment

The assembly of the A. fatua (W1004) genome was performed using hifiasm54 (v0.19.9-r616) with default parameters (--hg-size 11 G –ul), integrating PacBio HiFi and ONT long reads for genome assembly. Subsequently, the Hi-C reads were processed and aligned to the contigs using Juicer55 (v1.6), after which they were clustered into chromosomes by 3D-DNA56 with default parameters. Finally, we used Juicebox57 Assembly Tools (v2.20) to manually correct errors and visualize the assembled results. Additionally, the three subgenomes (A, C, and D) were distinguished using SubPhaser34, relying on sequence similarity from published hexaploid oat genomes and subgenome-specific k-mers.

For genome assessment, the Hi-C reads were mapped to the final assembly using BWA58 (v0.7.17), and ONT and HiFi reads were mapped with Minimap259 (v2.28), the mapping rate of reads on the genome was calculated using SAMtools60 (v1.7). Then, BUSCO61 (v5.2.2) with the “embryophyta_odb10” database to evaluate the genome completeness. Finally, we assessed the genome assembly using Merqury62 (v1.3) based on the 21-mer hybrid Merqury k-mer database by HiFi reads.

Identification of telomere sequences

The telomeric sequence 5’ CCCTAAA-3’ and the reverse complement of these seven bases were directly searched in the genome sequence.

Repeats and gene annotation

The sequenced genomes of the Avena species are a central resource for comparative work supporting the understanding of genome evolution. However, full utility has been hampered by uneven genome annotation. To better utilize the genomic resources of these released cultivated oats, we have performed repeat and gene reannotations for all of these 4 genomes (A. fatua (W1004), Avena sativa cv. Sang, Sanfensan, and OT3098) using the same ab initio-based annotation strategy combining with our extensively collected and integrated evidence data. Repetitive sequences in the genome were identified based on ab initio prediction and homology searching. The reference genome was hard masked during gene annotation. For ab initio prediction, RepeatModeler63 (v2.0.2), LTR_FINDER64 (v1.07), LTRharvest65 (v1.5.10) and LTR_retriever66 (v2.9.0) were used to bulid a de novo-based repeat database. RepeatMasker67 was used to mask the genome and annotate the TE elements using the library combined by RepeatModeler63 (v4.1.1), LTR_retriever66 (v2.9.0) and Repbase68 (v15.02). Tandem Repeats Finder69 (v.4.07b) was used to search tandem repeats with default parameters.

To obtain high-quality annotation of protein-coding genes, we employed multiple methods for gene prediction, including transcriptome-based predictions, de novo predictions, and homology-based predictions. We adopted Augustus70 (v3.2.3) to perform de novo gene prediction. We have constructed a comprehensive protein database using the high-quality sequences from Brachypodium distachyon, Hordeum vulgare, Oryza sativa and Triticum aestivum and Avena species generated from third-generation sequencing. Genomethreader71 (v1.7.3) was used for homology-based gene prediction. For transcript-based predictions, Trinity72 (v2.12) was used to assemble all the RNA-seq data, followed by alignment to the genome with the PASA73 (v2.3) software. In addition, the clean RNA-seq reads were mapped to the genome using HISAT274 (v2.2.1) to obtain transcriptome mapping data. Subsequently, StringTie75 (v2.1.6) was used to assemble the mapped reads into transcripts. The assembled transcripts were then used for open reading frame (ORF) prediction with TransDecoder (v.5.1.0) (https://github.com/TransDecoder/TransDecoder). Finally, EVidenceModeler76 (v1.1.1) was used to integrate the prediction results obtained by the above three methods to produce a non-redundant set of gene structures.

The completeness of the final gene set was evaluated by BUSCO61 (v5.2.2) with the “embryophyta_odb10” database. Functions of the protein-coding genes were annotated using Diamond77 (v.0.9.29) blastp with an E-value 1e-5 to search against the databases of NCBI non-redundant protein database (NR) and SwissProt (http://web.expasy.org/docs/swiss-prot/guideline.html) database. Protein domain and gene ontology term annotations were performed using InterProScan78 (v4.7), while KO terms were assigned through homology searches with KofamScan79.

Gene families and phylogenetic analysis

We used OrthoFinder80 (v2.3.14) to identify orthologous gene families between A. fatua and 9 other plant species, including two Triticeae species (Secale Cereale, Triticum Aestivum) and eight Aveneae species (Avena strigose, Avena atlantica, Avena longiglumio, Avena eriantha, Avena insularis, Avena sativa ssp. Sang, and Avena sativa ssp. Sanfensan) with default parameters. The phylogenetic relationships among these species were estimated using RAxML81 (v8.2.12) based on the protein sequences of single-copy ortholog families. Divergence times were estimated using the MCMCtree program embedded in PAML82 (v4.9). The expansion and contraction of orthologous gene families were assessed based on a maximum likelihood tree using CAFE83 (v4.2). A P value threshold of 0.05 was applied to determine which families had significant size changes between species. GO enrichment analysis of specific expanded gene families was performed using the R package clusterProfiler84. Collinear blocks between subgenomes were obtained by Jcvi85 (v1.2.1) with default parameters.

Identification of gene families

Hidden Markov model profiles of gene domains were downloaded from Pfam (http://pfam.xfam.org/) for use in gene family identification. Hmmer86 (v3.1) and Diamond77 (v.0.9.29) blastp searches were used to identify putative genes in the A. fatua, A. sativa cv. Sang, A. sativa cv. Sanfensan, and OT3098 genome. Protein sequences of gene family members from Arabidopsis thaliana were used as queries with an E value of 1e-5.

GST family genes were further identified using InterProScan78 (v5.22) based on the keywords “GST” and “Glutathione S -transferase”. To classify the GST gene family, we first aligned the protein sequences of all identified GSTs together with reference GSTs from Oryza sativa and Arabidopsis thaliana (Supplementary Data 9) using MAFFT87 (v7.490). A maximum likelihood phylogenetic tree was then constructed with IQ-TREE88 (v2.2.0), and GST genes were assigned to established classes based on sequence similarity and conserved domain architecture. The chromosomal distribution of GST genes was visualized using TBtools89 (v2.121).

Identification of R genes

R genes in the four Avena genomes (A. fatua, A. sativa cv. Sang, A. sativa cv. Sanfensan, and OT3098) were identified using RGAugury90 pipeline. The nucleotide-binding site (NB-ARC) domain-encoding genes were classified into different subgroups based on their domain and motif structures: TN (Toll/interleukin-1 receptor (TIR) and NB-ARC), CN (coiled-coil (CC) and NB-ARC), NL (NB-ARC and leucine rich repeat (LRR)), CNL (CC, NB-ARC and LRR), NB (NB-ARC), TNL (TIR, NB-ARC and LRR).

To understand whether the identified R genes were related to the map positions of known cloned loci for powdery mildew, rust, and smut, DNA markers that co-segregated or flanked known genes were mapped to the A. fatua and cultivated oat genomes by Diamond77 (v.0.9.29) blastp analysis. The distribution of R genes and known quantitative trait loci are shown in Supplementary Fig. 5e.

Sampling and whole-genome sequencing

To collect A. fatua accessions in China, we surveyed three regions severely affected by common wild oats: Xinjiang, Tibet, and Hebei-Zhangjiakou. In various locations, we searched for weedy A. fatua along roadsides and in agricultural fields. Seeds from each plant were gathered, and their geographical coordinates were documented. All accessions were collected from farmlands such as wheat and barley fields, or from the margins of these agricultural areas10,11, which are well documented as typical habitats for A. fatua (https://cals.cornell.edu/weed-science/weed-profiles/wild-oat). The identification of A. fatua was based on well-established morphological characteristics10. Specifically, the collected individuals exhibited key features of A. fatua, including spontaneous seed shattering at maturity, long dark brown to black awns, and seeds with distinctly darker seed coats. These characteristics were consistently observed across accessions and are consistent with earlier reports10. Finally, 443 A. fatua with different geographic origins from Xinjiang, Tibet and Zhangjiakou of Hebei province were collected. Additionally, 37 A. sterilis accessions from Israel33 and 228 A. sativa accessions were selected from global collection were included for comparative analysis (Supplementary Table 10; Supplementary Data 1). Plants were grown in the greenhouses at the Hebei University (Baoding City, Hebei province, China). Fresh young leaves from individual seedings were gathered and immediately frozen in liquid nitrogen for the subsequent DNA extraction. The genomic DNA was extracted using the cetrimonium bromide method. The libraries were sequenced using the Illumina HiSeq X Ten platform. and a total of 58.16 TB sequencing data were obtained, with an average depth of 7.29x for each sample.

Variant detection and annotation

The clean reads were mapped to the refence genome (W1004) using the ‘mem’ algorithm in BWA58 (0.7.17) with default parameters. The results were converted to BAM format using SAMtools60 (v1.7), filtered for mapping quality with ‘-q 30’ parameter, and the BAM files were sorted by Sambamba91 (v0.8.2). The Genome Analysis Toolkit92 (GATK, v4.2.0.0) modules MarkDuplicates were used to marked PCR-duplicated read pairs. Then, genomic variants in GVCF format for each accession were identified by the HaplotypeCaller model of GATK. Subsequently, the modules CombineGVCFs, GenotypeGVCFs, SelectVariants and VariantFiltration were used for variants calling. Finally, the SNPs were filtered with the following parameters: “-filterExpression QD < 2.0 | | FS > 60.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0 | | SOR > 3.0 | | MQ < 40.0.” Potentially false SNPs were further filter using VCFtools93 (v0.1.16) with the following parameters: (1) MAF ≥ 5% (2) Missing rate ≤ 0.5% (3) bi-allelic sites. High-quality SNPs were further annotated with SnpEff94 (v5.1).

Population genetic analyses

To ensure the quality of genotype data, we first performed principal component analysis (PCA) on the balanced panel (768 individuals) and projected the remaining individuals. Based on the results of the first two principal components (PC1 and PC2), we excluded samples that exhibited significant outlier patterns, which may indicate contamination or mislabeling, in total 17 outliers were removed. To perform population genetic analysis, 30 randomly picked individuals from each of A. sterilis, A. fatua and A. sativa population to construct a sample size balanced panel. Specifically, within A. sativa, 15 individuals were picked from hulled and naked oats, respectively; within A. fatua, 10 individuals were picked from each of the three regions (Zhangjiakou, Xinjiang, Tibet) respectively. Within the panel, SNPs were further filtered by keeping sites where minor allele frequency is greater than 5% and missing rate lower than 10%, retaining 44,752,824 SNP sites. To reduce linkage disequilibrium and ensure even variant spacing, SNPs were further thinned by randomly keeping one site every 10 kb across the genome, retaining 838,840 SNPs for subsequent analyses. To reduce potential reference bias, we called pseudo haploid genotypes at each site for each individual by randomly picking a read mapped to the site and taking the call from the read. The neighbor-joining (NJ) tree was constructed to assess the genetic relationships among the oat accessions. Pseudo haploid genotype calls at each SNP for each individual were encoded as binary matrices, and pairwise Hamming distances were calculated using pdist from SciPy. Hierarchical clustering with the Ward method (linkage) was then used to generate the tree. Population structure was inferred using the ADMIXTURE95 program (v1.3.0) in supervised mode. To provide reference information for supervised clustering, the balanced panel consisting of 90 individuals used as the reference panel. Genotype data in PLINK BED96 format were analyzed across a range of K values, with 8 independent runs per K (--seed 1-10). The optimal number of ancestral populations was determined by identifying the run with the lowest cross-validation error (--cv). The supervised mode leveraged the predefined group labels of the reference individuals to estimate ancestry proportions for the remaining samples. The principal component analysis was performed on the balanced panel consisting of 90 individuals using smartpca (https://github.com/wenhedr/PCA_analysis_using_SmartPCA), with the rest individual projected. To perform D-statistics, we used the qpDstat program from the AdmixTools package97 (v6.0). The input genotype dataset was first converted to EIGENSTRAT format using convertf. The D-statistics were computed for selected quartets of populations based on a user-defined test list, each consisting of four populations (W, X, Y, Z).

Structural variation analysis

We selected W1004 genome as the reference. The other cultivated genomes (A. sativa cv. Sanfensan) were aligned to the reference using mummer98 (v4.0.0) with the parameter -c 1000. The raw alignment results were further filtered using delta-filter with parameters -m -i 90 -l 100. The resulting filtered delta files were used to detect structural variations using the SyRI99 (v.1.4) pipeline with default parameters and visualized with plotsr100 (v.1.4). To confirm the inversions identified within genomes, we performed reciprocal mapping of Hi-C data using the Juicer55 (v.1.6) pipeline. The Hi-C contact maps were visualized using JuiceBox57 (v.2.20).

Identification of differentiated regions

To identify the genomic regions of divergence, we scanned the genome for multiple methods: (I) We used BWA58 (v.0.7.17) to align the genome resequencing reads of A. fatua and A. sativa L. accessions to the W1004 reference genome, comparing their sequence similarity. Then the average identity score was estimated for each 1 M non-overlapping window. (II) We compared the genomes of cultivated oats with the reference A. fatua to identify structural variants (SVs) using SyRI99 (v.1.4). We calculated the SVs frequency around 500 kb sliding windows. The top 10% of regions with the highest breakpoint frequencies as SV hotspots, representing divergence regions between the two groups. (III) We used Selscan101 (v.2.0) to estimate XP-EHH, and then the average XP-EHH score was estimated for 500 kb sliding windows with 250 kb steps. (IV) The XP-CLR values calculated by the XP-CLR102 program (v1.1.2) using 500 kb sliding windows with 250 kb steps. (V) Popuation Fixation statistics (FST) were calculated using VCFtools93 (v0.1.16). The FST value were measured for each 500-kb window, with a step size of 250 kb. VI) Genetic diversity (π) was calculated using VCFtools93 (v0.1.16) with a window size of 500 kb and a window step of 250 kb. Pi-ratio was further calculated between two groups as PI (π A. sativa L.A. fatua). Outlier windows (top 10%) detected by at least five methods were considered as the candidate regions of highly divergent regions. The distribution of the divergent regions was visualized with RectChr v1.38 (https://github.com/BGI-shenzhen/RectChr). Genes located in selected sweeps were considered as selected candidate genes.

Identification of homologous genes in divergent regions

To identify orthologous genes between A. sativa and A. fatua, we first performed synteny analysis using the JCVI85, based on a cloned gene from rice. Colinear gene pairs were extracted from the syntenic blocks between the rice gene and subgenomes of both cultivated and A. fatua oats. To further validate orthology, we constructed phylogenetic trees using iq-tree88 based on the protein sequences of the colinear genes. Gene pairs that clustered together with high support were retained for downstream analysis. Finally, the gene structures of these candidate orthologs were examined; only those with highly similar exon-intron organization were designated as putative one-to-one orthologs.

Herbicide tolerance evaluation of the oat association panel

Survival rate (SR) tests were conducted in an association-mapping panel composed of 259 oat accessions (Supplementary Data 7). The herbicide tolerance of the panel was recorded and evaluated103. To compare the herbicide resistance between A. fatua and A. sativa, 103 wild accessions and 156 cultivated accessions were selected. All of the accessions were randomly planted in a cultivation pool with a soil: vermiculite mixture at a 2:1 ratio, and 20 plants of each accession were grown per pot in each assay. The seedlings were grown in a greenhouse under 12-h light/12-h dark and 20 °C/15 °C with a relative humidity of 50% for assessment of herbicide stress response or SR. Seedlings at the three-leaf stage were used for herbicide (fenoxaprop-P-ethyl, Huaxing, GB/T22618-2008) treatment. When the seedlings reached the three-leaf stage, they were uniformly sprayed with a solution of 680 g/hm2 fenoxaprop-P-ethyl. Survival rate was assessed 10 days post-treatment by counting the number of living plants per pot.

RNA-seq data analyses

We performed RNA-seq under different herbicide treatments. Seedlings were treated with fenoxaprop-P-ethyl at 0 g/hm2 (Control) and 85 g/hm2 (Treatment). Then the leaf and root samples were separately collected in triplicate as biological replicates at three different timepoints (6 hours, 24 hours and 7 days). and resulted in the generation of 36 RNA-seq libraries. The clean reads were mapped to the reference genome W1004 using HISAT274 (v2.2.1), and SAMtools60 (v1.7) was used to sort the resulting aligned reads. The gene expression values were normalized in units of reads per kilobase per million (TPM). DESeq2104 software (v1.28.1) was used to identify DEGs with thresholds of Padj < 0.05 and |log2 (fold-change) | >1. Then, the genes with TPM ≥ 0.1 were defined as DEGs.

Of the total DEGs gene sets, 34 DEGs presented in all 6 groups. In the root, 572, 1013 and 1690 DEGs presented in 4 to 5 group (>90% of the collection), 1834, 2960 and 5572 DEGs presented in 2 to 3 group, and 419, 666 and 3074 DEGs presented in only one group (Fig. 5b; Supplementary Table 17). As well as, in the leaf, 1755, 1771 and 1883 DEGs presented in 4 to 5 group (>90% of the collection), 5595, 5655 and 6443 DEGs presented in 2 to 3 group, and 1611, 1550 and 2539 DEGs presented in only one group (Fig. 5b; Supplementary Table 17). Heatmap visualization was performed using the R package pheatmap (https://CRAN.R-project.org/package=pheatmap).

ATAC-seq data analysis

Fastp105 (v0.23.2) was used process raw reads by removing adapters, trimming low-quality bases, and filtering bad reads. Clean reads were aligned to the W1004 reference genome using the BWA58 mem algorithm (v0.7.17). The aligned reads were sorted and filtered using SAMtools60 (v1.7).

Peak calling was done using bam files in macs2 (v2.2.9.1) (https://github.com/macs3-project/MACS). For ATAC-seq data, the parameter of peak calling using macs2 was ‘-q 0.05 -f BAMPE -nomodel -extsize 200 -shift -100 -g 10911912618’. Peaks located in promoter and genic regions were annotated to genes that overlapped with them, while peaks in distal regions were annotated to the nearest gene/TSS. For quantification of ATAC-seq data, read counts under the reference peak, along with normalized counts per million values (DBA_SCORE_TMM_READS_EFFECTIVE_CPM), were produced using the R package DiffBind106 (v2.16.2). Raw peak counts were used as input for identifying differentially accessible and differentially marked peaks with the R package DESeq2104 (v1.28.1). The thresholds were |log2 fold change | > 1, FDR < 0.01 and adjusted P < 0.05. The heatmaps centered on peaks were created using computeMatrix and plotHeatmap from deeptools107 (v3.5.5).

For each cluster, the representation analysis in the open chromatin was performed using the enricher function in the R package clusterProfiler84 (v3.19). Enrichment analysis of TF families for DEGs with open chromatin was achieved using homer (v24.05.1) (https://github.com/bastienwirtz/homer.git).

qRT-PCR assay

Total RNA was extracted with TRIzol reagent (Thermo Fisher Scientific,15596018CN) and reverse transcribed into cDNA using the PrimeScript-RT Reagent Kit (Takara, RR047A) according to the manufacturer’s instructions. The qRT-PCR assay was performed using TB Green premix Ex TaqTM II (Takara, RR820A). All primers used in the qRT-PCR assays are listed in Supplementary Table 18.

Herbicides treatment

Pot culture method was used to cultivate wild oat seedlings. The indoor temperature was set at 20 °C during the day, 15 °C at night, the illumination was 12 hours, and the relative humidity was (50 ± 5) %. At the stage of 2 ~ 3 leaves after the emergence of wild oats, a small spray pot was used to spray fenoxaprop-P-ethyl (Huaxing, GB/T22618-2008) at a dose of 340 g/hm2.

Virus-mediated gene silencing

To investigate the potential role of 4Dg0135144 in herbicide tolerance of oat, we employed the Tobacco Rattle Virus (TRV) -virus induced gene silencing (VIGS) system to knock down the expression of the gene. The vector was generated by subcloning these fragments into the TRV2 infection plasmid vector using the NC clone (NC Biotech, NC001). To initiate the virus-induced gene silencing assay, Agrobacterium tumefaciens strains GV3101 carrying pTRV1 and different pTRV2 derived vectors (TRV2, V-4Dg0135144-1/V-4Dg0135144-2) in 1:1 ratio, and supplemented with acetosyringone (AS, coolaber, SL95131) (19.62 mg. L−1), cysteine (Cys, Amethyst, 52-90-4) (400 mg. L−1), and Tween-20 (sigma, P2287) (5 ml. L−1). Infect experiments were performed under vacuum suction108, the Agrobacterium tumefaciens strains GV3101 concentration was OD = 2.0 and the vacuum condition is 40 kpa.

Plasmids and cloning procedures

To generate the constructs for virus-induced gene silencing (VIGS) assay, the target ORF fragment region of 4Dg0135144 was amplified from W1004 cDNA and the target ORF fragment regions of AsPDS was amplified from A. sativa cv. Marvellous cDNA. The target ORF fragment regions should be 200–400 bp in length, with 40–60% GC content, while avoiding repetitive sequences or secondary structures109. These fragments were then cloned into TRV2 infection plasmid vector with NC clone (NC Biotech, NC001) to generate pTRV2::4Dg0135144-1/pTRV2::4Dg0135144-2/pTRV2::AsPDS. All primers are listed in Supplementary Table 18.

Determination of catalase activity and soluble sugar content

The materials for physiological indexes were obtained from the materials treated with herbicides for 7 days. The catalase activity and soluble sugar content was tested with the CAT activity detection kit (Solarbio, BC0200) and the Plant soluble sugar detection kit (Solarbio, BC0030).

Chlorophyll measurement

The leaves were incubated in 95% (v/v) ethanol for 5 d in the dark. The absorbances were measured at 665 and 649 nm. The chlorophyll contents were calculated according to the following ratio: (6.63A665 + 18.08A649) /g fresh weight110.

A. sativa transformation

Mature embryos of healthy plants of Avena sativa cultivar ‘Bayou18’ grown in a well-conditioned greenhouse were collected and cultured on L3-M medium [4.6 g. L−1 L3 Base Salts with vitamins (Coolaber, PM1621), 30 g. L−1 maltose (Coolaber, CM7181), 4 g. L−1 phytagel (Coolaber, CP8581Z), 2 mg. L−1 2,4-D (Coolaber, PH105), 1 g. L−1 Dicamba (Coolaber, PH113)] until embryonic callus was produced. Agrobacterium tumefaciens strain GV3101 with pUBI::4Dg0135144-GFP was cultured at 28 °C in YEP medium overnight. Centrifuge at 25 °C, 3438 × g for 10 minutes. Then, the precipitates were resuspended to OD = 0.5 with WLS solution [4.30209 g Linsmaise&Skoog Base Salts (Coolaber, PM1470), 100 μL 1000 ×MS vitamins (Coolaber, PML1780), 10 g Glucose (Sangon biotech, A501991-0500), 0.5 g MES (Sigma, M8250), add H2O to 1 L. PH = 5.8]. Mixed Agrobacterium tumefaciens strain GV3101 with pUBI::4Dg0135144-GFP and Agrobacterium tumefaciens strain GV3101 with pUBI::TaWOX5 in equal proportion. Soak the selected embryonic callus in the mixed Agrobacterium tumefaciens strain for 30 minutes. The embryogenic callus was removed and the residual bacterial liquid was absorbed by filter paper. The embryogenic calli were cultured on filter paper containing 75 μ mol as in dark for 3 days. And then, embryogenic calli were cultured on wls-res medium for 5 days. WLS-P5 medium [the wls-res medium containing 0.5- 5 mg.L−1 Basta (Sangon Biotech, A356357)] was used to gradiently screen the callus which survived successfully, and the callus was transferred to regeneration medium [4.6 g L3 Base Salts with vitamins (Coolaber, PM1621), 5 mg Zeatin (Coolaber, PH110), 20 g sucrose (Diamond, A100335-0250), 0.5 g MES (Sigma, M8250), 200 μL 12.5 g.L−1 CuSO4·5H2O (Sangon biotech, A600063), 4 g phytagel (Coolaber, CP8581Z), add H2O to 1 L. PH = 5.8] and grow to 3-5 cm. Finally, the roots were cultured on the rooting medium [4.6 g L3 Base Salts with vitamins (Coolaber, PM1621), 0.2 mg.ml-1 IBA (Coolaber, PH1031), 15 g sucrose (Diamond, A100335-0250), 0.5 g MES (Sigma, M8250), 4 g phytagel (Coolaber, CP8581Z), add H2O to 1 L. PH = 5.8]. Transgenic plants confirmed by Genetically modified PAT/bar colloidal gold test strips (BioRun, RIA03) or qRT-PCR, were used for phenotypic evaluation111,112,113.

The pUBI::4Dg0135144-Flag transgenic plants were also generated. The vector was generated by homologous recombination into the pUBI::Flag plasmid vector using the ClonExpressII One Step Cloning Kits (Vazyme, C112). Hygromycin (5 mg. L−1) was used as the selection pressure for screening positive seedlings. Transgenic plants of the T1 generation were confirmed by PCR, qRT-PCR and western blot. Positive transgenic plants were used for phenotypic evaluation.

Western blot assay

Total protein was extracted by the extraction buffer [0.2 M NaCl, 5 mM MgCl2, 5 mM DTT, 20 mM Tris-HCl (pH 7.5), 0.03% Tween-20 (Ameresco), and 0.5 tablets of protease inhibitor (Roche)]. The supernatant was collected by centrifuging at 12,000 rpm for 15 min. Total proteins were examined by western blot analysis using α-tubulin (1:5000; EASYBIO, BE0031) as a loading control. Proteins in the study were also probed with α-Flag (1:2000; Sigma, F1804). Secondary antibodies were goat anti-mouse IgG (1:5000; EASYBIO, BE0102). The protein Marker (product #26616) purchased from Thermo Scientific was used in all the western blot assays in this manuscript. The instrument (BIO-01, O1900) was used to obtain images.

Statistical analyses

Details of the statistics of Fig. 6d, h, I and Supplementary Fig. 13c, 14a, b, e, g, i and j are provided in the figure legends and Methods. All statistics were carried out in R using Student’s t-test (two tailed) (unless otherwise indicated). Each experiment involved at least three biological replicates.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.