Abstract
Historical plant breeding, which optimizes phenotypes through selective crossing guided by phenotypic evaluation and molecular markers, is limited by evolutionary constraints that hinder rapid crop improvement. A new paradigm, precision breeding, circumvents these limitations by targeting genetic variants through functional molecular knowledge. To generate this knowledge at scale, sequence-based deep learning leverages high-quality genome sequence data to predict variant effects at base-pair resolution. When linked to agronomically important traits, these predictions enable breeders to prioritize variants for precision selection or editing. Although it is still in the early stages of development, we foresee three key applications for this approach: introgressing genes from distant breeding pools, purging deleterious mutations and designing new plant ideotypes. Looking ahead, refined computational models will facilitate targeted editing and the systematic redesign of complex physiological processes to address emerging breeding goals under shifting environmental conditions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Purugganan, M. D. & Fuller, D. Q. The nature of selection during plant domestication. Nature 457, 843–848 (2009).
Fisher, R. A. XV.—The correlation between relatives on the supposition of Mendelian inheritance. Earth Environ. Sci. Trans. R. Soc. Edinb. 52, 399–433 (1919).
Fisher, R. A. The Design of Experiments (Oliver and Boyd, 1935).
Wallace, J. G., Rodgers-Melnick, E. & Buckler, E. S. On the road to breeding 4.0: unraveling the good, the bad, and the boring of crop quantitative genomics. Annu. Rev. Genet. 52, 421–444 (2018).
Ramstein, G. P., Jensen, S. E. & Buckler, E. S. Breaking the curse of dimensionality to identify causal variants in breeding 4. Theor. Appl. Genet. 132, 559–567 (2019).
Tanksley, D., Medina-Filho, H. & Rick, C. M. The effect of isozyme selection on metric characters in an interspecific backcross of tomato — basis of an early screening procedure. Theor. Appl. Genet. 60, 291–296 (1981).
Lander, E. S. & Botstein, D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 121, 185–199 (1989).
Bernardo, R. Prediction of maize single-cross performance using RFLPs and information from related hybrids. Crop. Sci. 34, 20–25 (1994).
Meuwissen, T. H., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
Flint-Garcia, S. A., Thornsberry, J. M. & Buckler, E. S. 4th Structure of linkage disequilibrium in plants. Annu. Rev. Plant Biol. 54, 357–374 (2003).
Holland, J. B. Genetic architecture of complex traits in plants. Curr. Opin. Plant Biol. 10, 156–161 (2007).
Buckler, E. S. et al. The genetic architecture of maize flowering time. Science 325, 714–718 (2009).
Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).
Whiting, J. R. et al. The genetic architecture of repeated local adaptation to climate in distantly related plants. Nat. Ecol. Evol. 8, 1933–1947 (2024).
Telfer, P., Edwards, J., Taylor, J., Able, J. A. & Kuchel, H. A multi-environment framework to evaluate the adaptation of wheat (Triticum aestivum) to heat stress. Theor. Appl. Genet. 135, 1191–1208 (2022).
MacLeod, I. M. et al. Exploiting biological priors and sequence variants enhances QTL discovery and genomic prediction of complex traits. BMC Genom. 17, 144 (2016).
Fang, L. et al. Use of biological priors enhances understanding of genetic architecture and genomic prediction of complex traits within and between dairy cattle breeds. BMC Genomics 18, 604 (2017).
Xiang, R. et al. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc. Natl Acad. Sci. USA 116, 19398–19408 (2019).
Koziel, M. G. et al. Field performance of elite transgenic maize plants expressing an insecticidal protein derived from Bacillus thuringiensis. Biotechnology 11, 194 (1993).
Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816–821 (2012).
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Schreiber, M., Jayakodi, M., Stein, N. & Mascher, M. Plant pangenomes for crop improvement, biodiversity and evolution. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00691-4 (2024).
Sendrowski, J., Bataillon, T. & Ramstein, G. P. In silico prediction of variant effects: promises and limitations for precision plant breeding. Theor. Appl. Genet. 138, 193 (2025).
Gao, C. Genome engineering for crop improvement and future agriculture. Cell 184, 1621–1635 (2021).
Huang, X., Su, D. & Xu, C. Revitalizing orphan crops to combat food insecurity. Nat. Commun. 16, 10596 (2025).
Bromberg, Y., Prabakaran, R., Kabir, A. & Shehu, A. Variant effect prediction in the age of machine learning. Cold Spring Harb. Persp. Biol. 16, a041467 (2024).
Orr, H. A. The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6, 119–127 (2005).
Liang, Y., Liu, H.-J., Yan, J. & Tian, F. Natural variation in crops: realized understanding, continuing promise. Annu. Rev. Plant Biol. 72, 357–385 (2021).
Kearsey, M. J. & Farquhar, A. G. QTL analysis in plants; where are we now? Heredity 80, 137–142 (1998).
Qi, T., Song, L., Guo, Y., Chen, C. & Yang, J. From genetic associations to genes: methods, applications, and challenges. Trends Genet. https://doi.org/10.1016/j.tig.2024.04.008 (2024).
Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).
Wang, M. & Xu, S. Statistical power in genome-wide association studies and quantitative trait locus mapping. Heredity 123, 287–306 (2019).
Ming, L. et al. Transcriptome-wide association analyses reveal the impact of regulatory variants on rice panicle architecture and causal gene regulatory networks. Nat. Commun. 14, 7501 (2023).
Zhao, T. et al. Population-wide DNA methylation polymorphisms at single-nucleotide resolution in 207 cotton accessions reveal epigenomic contributions to complex traits. Cell Res. 34, 859–872 (2024).
Marand, A. P. et al. The genetic architecture of cell type-specific cis regulation in maize. Science 388, eads6601 (2025).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Avsec, Ž et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Barbadilla-Martínez, L., Klaassen, N., van Steensel, B. & de Ridder, J. Predicting gene expression from DNA sequence using deep learning models. Nat. Rev. Genet. https://doi.org/10.1038/s41576-025-00841-2 (2025).
Nikolados, E.-M., Wongprommoon, A., Aodha, O. M., Cambray, G. & Oyarzún, D. A. Accuracy and data efficiency in deep learning models of protein expression. Nat. Commun. 13, 7755 (2022).
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Ma, Z. et al. DeepWheat: predicting the effects of genomic variants on gene expression and regulatory activities across tissues and varieties in wheat using deep learning. Genome Biol. 26, 321 (2025).
Wrightsman, T. et al. Current genomic deep learning architectures generalize across grass species but not alleles. Preprint at bioRxiv https://doi.org/10.1101/2024.04.11.589024 (2024).
Sasse, A. et al. Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings. Nat. Genet. 55, 2060–2064 (2023).
Huang, C. et al. Personal transcriptome variation is poorly explained by current genomic deep learning models. Nat. Genet. 55, 2056–2059 (2023).
Drusinsky, S., Whalen, S. & Pollard, K. S. Deep-learning prediction of gene expression from personal genomes. Genome Biol. 27, 19 (2026).
Song, B., Buckler, E. S. & Stitzer, M. C. New whole-genome alignment tools are needed for tapping into plant diversity. Trends Plant Sci. 29, 355–369 (2024).
Mascher, M., Jayakodi, M., Shim, H. & Stein, N. Promises and challenges of crop translational genomics. Nature https://doi.org/10.1038/s41586-024-07713-5 (2024).
Hufford, M. B. et al. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science 373, 655–662 (2021).
Garrison, E. et al. Building pangenome graphs. Nat. Methods 21, 2008–2012 (2024).
Washburn, J. D. et al. Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proc. Natl Acad. Sci. USA 116, 5542–5549 (2019).
Peleke, F. F., Zumkeller, S. M., Gültas, M., Schmitt, A. & Szymański, J. Deep learning the cis-regulatory code for gene expression in selected model plants. Nat. Commun. 15, 3488 (2024).
Li, T. et al. Modeling 0.6 million genes for the rational design of functional cis-regulatory variants and de novo design of cis-regulatory sequences. Proc. Natl Acad. Sci. USA 121, e2319811121 (2024).
Linder, J., Srivastava, D., Yuan, H., Agarwal, V. & Kelley, D. R. Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. Nat. Genet. 57, 949–961 (2025).
Avsec, Ž et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature 649, 1206–1218 (2026).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Hingerl, J. C. et al. scooby: modeling multimodal genomic profiles from DNA sequence at single-cell resolution. Nat. Methods 22, 2275–2285 (2025).
Flood, P. J. & Hancock, A. M. The genomic basis of adaptation in plants. Curr. Opin. Plant Biol. 36, 88–94 (2017).
Price, N. et al. Combining population genomics and fitness QTLs to identify the genetics of local adaptation in Arabidopsis thaliana. Proc. Natl Acad. Sci. USA 115, 5028–5033 (2018).
Schrider, D. R. & Kern, A. D. S/HIC: robust identification of soft and hard sweeps using machine learning. PLoS Genet. 12, e1005928 (2016).
Chevin, L.-M. & Hospital, F. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics 180, 1645–1660 (2008).
Matuszewski, S., Hermisson, J. & Kopp, M. Catch me if you can: adaptation from standing genetic variation to a moving phenotypic optimum. Genetics 200, 1255–1274 (2015).
Chevin, L.-M. Selective sweep at a QTL in a randomly fluctuating environment. Genetics 213, 987–1005 (2019).
Latrille, T., Rodrigue, N. & Lartillot, N. Genes and sites under adaptation at the phylogenetic scale also exhibit adaptation at the population-genetic scale. Proc. Natl Acad. Sci. USA 120, e2214977120 (2023).
Cooper, G. M. et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 15, 901–913 (2005).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).
Ramensky, V., Bork, P. & Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900 (2002).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010).
Choi, Y., Sims, G. E., Murphy, S., Miller, J. R. & Chan, A. P. Predicting the functional effect of amino acid substitutions and indels. PLoS One 7, e46688 (2012).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Joly-Lopez, Z. et al. An inferred fitness consequence map of the rice genome. Nat. Plants 6, 119–130 (2020).
Zhao, T. & Schranz, M. E. Network-based microsynteny analysis identifies major differences and genomic outliers in mammalian and angiosperm genomes. Proc. Natl Acad. Sci. USA 116, 2165–2174 (2019).
Simon, E., Swanson, K. & Zou, J. Language models for biological research: a primer. Nat. Methods 21, 1422–1429 (2024).
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023).
Elnaggar, A. et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022).
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Ramstein, G. P. & Buckler, E. S. Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize. Genome Biol. 23, 183 (2022).
Long, E. M., Romay, M. C., Ramstein, G., Buckler, E. S. & Robbins, K. R. Utilizing evolutionary conservation to detect deleterious mutations and improve genomic prediction in cassava. Front. Plant Sci. 13, 1041925 (2023).
Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful predictors of genome-wide variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023).
Zhai, J. et al. Cross-species modeling of plant genomes at single-nucleotide resolution using a pretrained DNA language model. Proc. Natl Acad. Sci. USA 122, e2421738122 (2025).
Brixi, G. et al. Genome modelling and design across all domains of life with Evo 2. Nature https://doi.org/10.1038/s41586-026-10176-5 (2026).
Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021).
Brandes, N., Ofer, D., Peleg, Y., Rappoport, N. & Linial, M. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38, 2102–2110 (2022).
Mendoza-Revilla, J. et al. A foundational large language model for edible plant genomes. Commun. Biol. 7, 835 (2024).
Lipsh-Sokolik, R. & Fleishman, S. J. Addressing epistasis in the design of protein function. Proc. Natl Acad. Sci. USA 121, e2314999121 (2024).
Hayes, T. et al. Simulating 500 million years of evolution with a language model. Science 387, eads0018 (2025).
Hawkins-Hooker, A. et al. Generating functional protein variants with variational autoencoders. PLoS Comput. Biol. 17, e1008736 (2021).
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Lin, M. T., Stone, W. D., Chaudhari, V. & Hanson, M. R. Small subunits can determine enzyme kinetics of tobacco RuBisCO expressed in Escherichia coli. Nat. Plants 6, 1289–1299 (2020).
Lin, M. T., Salihovic, H., Clark, F. K. & Hanson, M. R. Improving the efficiency of RuBisCO by resurrecting its ancestors in the family Solanaceae. Sci. Adv. 8, eabm6871 (2022).
Jores, T. et al. Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nat. Plants https://doi.org/10.1038/s41477-021-00932-y (2021).
Gorjifard, S. et al. Arabidopsis and maize terminator strength is determined by GC content, polyadenylation motifs and cleavage probability. Nat. Commun. 15, 5868 (2024).
Oliva, R. et al. Broad-spectrum resistance to bacterial blight in rice using genome editing. Nat. Biotechnol. 37, 1344–1350 (2019).
Li, Z. & Zhou, X. Towards improved fine-mapping of candidate causal variants. Nat. Rev. Genet. https://doi.org/10.1038/s41576-025-00869-4 (2025).
Wang, J. et al. Fine-mapping methods for complex traits: essential adaptations for samples of related individuals. Brief. Bioinform. 26, bbaf614 (2025).
Yuan, X. et al. Integrative omics analysis elucidates the genetic basis underlying seed weight and oil content in soybean. Plant Cell 36, 2160–2175 (2024).
Torres-Rodríguez, J. V., Li, D. & Schnable, J. C. Evolving best practices for transcriptome-wide association studies accelerate discovery of gene–phenotype links. Curr. Opin. Plant Biol. 83, 102670 (2025).
Marshall-Colon, A. et al. Crops in silico: generating virtual crops using an integrative and multi-scale modeling platform. Front. Plant Sci. 8, 786 (2017).
Hammer, G., Messina, C., Wu, A. & Cooper, M. Biological reality and parsimony in crop models — why we need both in crop improvement! Silico Plants 1, diz010 (2019).
Matthews, M. L. & Marshall-Colón, A. Multiscale plant modeling: from genome to phenome and beyond. Emerg. Top. Life Sci. 5, 231–237 (2021).
Levins, R. The strategy of model building in population biology. Am. Sci. 54, 421–431 (1966).
Messina, C., Hammer, G., Dong, Z., Podlich, D. & Cooper, M. in Crop Physiology (eds Sadras, V. & Calderini, D.) Ch. 10 (Academic Press, 2009).
Poudel, P., Alderman, P. D., Ochsner, T. E. & Lollato, R. P. A parsimonious Bayesian crop growth model for water-limited winter wheat. Comput. Electron. Agric. 217, 108618 (2024).
Guadagno, C. R. et al. Use of transcriptomic data to inform biophysical models via Bayesian networks. Ecol. Modell. 429, 109086 (2020).
Poudel, P., Naidenov, B., Chen, C., Alderman, P. D. & Welch, S. M. Integrating genomic prediction and genotype specific parameter estimation in ecophysiological models: overview and perspectives. Silico Plants 5, diad007 (2023).
Demetci, P. et al. Multi-scale inference of genetic trait architecture using biologically annotated neural networks. PLoS Genet. 17, e1009754 (2021).
Bourgeais, V., Zehraoui, F. & Hanczar, B. GraphGONet: a self-explaining neural network encapsulating the gene ontology graph for phenotype prediction on gene expression. Bioinformatics 38, 2504–2511 (2022).
Orr, H. A. The distribution of fitness effects among beneficial mutations. Genetics 163, 1519–1526 (2003).
Harjes, C. E. et al. Natural genetic variation in lycopene epsilon cyclase tapped for maize biofortification. Science 319, 330–333 (2008).
Wu, J. et al. Overexpression of zmm28 increases maize grain yield in the field. Proc. Natl Acad. Sci. USA 116, 23850–23858 (2019).
Zhou, J., Rizzo, K., Christensen, T., Tang, Z. & Koo, P. K. Uncertainty-aware genomic deep learning with knowledge distillation. npj Artif. Intell. 2, 3 (2026).
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. Neural Inf. Process. Syst. 30, 6402–6413 (2016).
South, P. F., Cavanagh, A. P., Liu, H. W. & Ort, D. R. Synthetic glycolate metabolism pathways stimulate crop growth and productivity in the field. Science 363, eaat9077 (2019).
De Souza, A. P. et al. Soybean photosynthesis and crop yield are improved by accelerating recovery from photoprotection. Science 377, 851–854 (2022).
Khaipho-Burch, M. et al. Genetic modification can improve crop yields — but stop overselling it. Nature 621, 470–473 (2023).
Raszap Skorbiansky, S., Thornsbury, S. & Effland, A. Specialty crops and the Farm Bill. Appl. Econ. Persp. Policy 44, 1241–1260 (2022).
Varshney, R. K. et al. Can genomics boost productivity of orphan crops? Nat. Biotechnol. 30, 1172–1176 (2012).
Ye, C.-Y. & Fan, L. Orphan crops and their wild relatives in the genomic era. Mol. Plant 14, 27–39 (2021).
Cooper, M., Voss-Fels, K. P., Messina, C. D., Tang, T. & Hammer, G. L. Tackling G × E × M interactions to close on-farm yield-gaps: creating novel pathways for crop improvement by predicting contributions of genetics and management to crop productivity. Züchter Genet. Breed. Res. 134, 1625–1644 (2021).
Welcker, C. et al. Physiological adaptive traits are a potential allele reservoir for maize genetic progress under challenging conditions. Nat. Commun. 13, 3225 (2022).
Kusmec, A. et al. A genetic tradeoff for tolerance to moderate and severe heat stress in US hybrid maize. PLoS Genet. 19, e1010799 (2023).
Li, W. et al. A natural gene on–off system confers field thermotolerance for grain quality and yield in rice. Cell 188, 3661–3678.e21 (2025).
Zhang, J. Patterns and evolutionary consequences of pleiotropy. Annu. Rev. Ecol. Evol. Syst. 54, 1–19 (2023).
Frachon, L. et al. Intermediate degrees of synergistic pleiotropy drive adaptive evolution in ecological time. Nat. Ecol. Evol. 1, 1551–1561 (2017).
Mural, R. V. et al. Meta-analysis identifies pleiotropic loci controlling phenotypic trade-offs in sorghum. Genetics 218, iyab087 (2021).
Khaipho-Burch, M. et al. Elucidating the patterns of pleiotropy and its biological relevance in maize. PLoS Genet. 19, e1010664 (2023).
Xu, S. Theoretical basis of the Beavis effect. Genetics 165, 2259–2268 (2003).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Danecek, P. & McCarthy, S. A. BCFtools/csq: haplotype-aware variant consequences. Bioinformatics 33, 2037–2039 (2017).
Choi, S. S. & Hannenhalli, S. Three independent determinants of protein evolutionary rate. J. Mol. Evol. 76, 98–111 (2013).
He, X. & Zhang, J. Toward a molecular understanding of pleiotropy. Genetics 173, 1885–1891 (2006).
Vande Zande, P., Hill, M. S. & Wittkopp, P. J. Pleiotropic effects of trans-regulatory mutations on fitness and gene expression. Science 377, 105–109 (2022).
Pavličev, M. & Cheverud, J. M. Constraints evolve: context dependency of gene effects allows evolution of pleiotropy. Annu. Rev. Ecol. Evol. Syst. 46, 413–434 (2015).
Gaynor, R. C. et al. A two-part strategy for using genomic selection to develop inbred lines. Crop. Sci. 57, 2372–2386 (2017).
Covarrubias-Pazaran, G. et al. Breeding schemes: what are they, how to formalize them, and how to improve them? Front. Plant Sci. 12, 791859 (2022).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Reshef, Y. A. et al. Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk. Nat. Genet. 50, 1483–1493 (2018).
Wainschtein, P. et al. Estimation and mapping of the missing heritability of human phenotypes. Nature 649, 1219–1227 (2026).
Meuwissen, T., Hayes, B., MacLeod, I. & Goddard, M. Identification of genomic variants causing variation in quantitative traits: a review. Agriculture 12, 1713 (2022).
Hill, W. G. Rates of change in quantitative traits from fixation of new mutations. Proc. Natl Acad. Sci. USA 79, 142–145 (1982).
Collard, B. C. Y. & Mackill, D. J. Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Phil. Trans. R. Soc. Lond. B 363, 557–572 (2008).
Kumar, K. et al. Genetically modified crops: current status and future prospects. Planta 251, 91 (2020).
Lassoued, R., Phillips, P. W. B., Smyth, S. J. & Hesseln, H. Estimating the cost of regulating genome edited crops: expert judgment and overconfidence. GM Crops Food 10, 44–62 (2019).
Srivastava, V. & Thomson, J. Gene stacking by recombinases. Plant Biotechnol. J. 14, 471–482 (2016).
Sun, C. et al. Precise integration of large DNA sequences in plant genomes using PrimeRoot editors. Nat. Biotechnol. 42, 316–327 (2024).
Li, B., Sun, C., Li, J. & Gao, C. Targeted genome-modification tools and their advanced applications in crop breeding. Nat. Rev. Genet. 25, 603–622 (2024).
Esmaeili, N., Shen, G. & Zhang, H. Genetic manipulation for abiotic stress resistance traits in crops. Front. Plant Sci. 13, 1011985 (2022).
Quiroz, D., Lensink, M., Kliebenstein, D. J. & Monroe, J. G. Causes of mutation rate variability in plant genomes. Annu. Rev. Plant Biol. 74, 751–775 (2023).
Monroe, J. G., McKay, J. K., Weigel, D. & Flood, P. J. The population genomics of adaptive loss of function. Heredity 126, 383–395 (2021).
Chen, J., Bataillon, T., Glémin, S. & Lascoux, M. Hunting for beneficial mutations: conditioning on SIFT scores when estimating the distribution of fitness effect of new mutations. Genome Biol. Evol. 14, evab151 (2022).
Johnsson, M. et al. Removal of alleles by genome editing (RAGE) against deleterious load. Genet. Sel. Evol. 51, 14 (2019).
Wu, Y. et al. Phylogenomic discovery of deleterious mutations facilitates hybrid potato breeding. Cell 186, 2313–2328.e15 (2023).
Yang, J. et al. Incomplete dominance of deleterious alleles contributes substantially to trait variation and heterosis in maize. PLoS Genet. 13, e1007019 (2017).
Ramu, P. et al. Cassava haplotype map highlights fixation of deleterious mutations during clonal propagation. Nat. Genet. 49, 959–963 (2017).
Kim, M.-S. et al. The patterns of deleterious mutations during the domestication of soybean. Nat. Commun. 12, 97 (2021).
Glaus, A. N. et al. Repairing a deleterious domestication variant in a floral regulator gene of tomato by base editing. Nat. Genet. 57, 231–241 (2025).
Lozano, R. et al. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nat. Plants 7, 17–24 (2021).
Donald, C. M. The breeding of crop ideotypes. Euphytica 17, 385–403 (1968).
Ye, X. et al. Engineering the provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 287, 303–305 (2000).
Paine, J. A. et al. Improving the nutritional value of golden rice through increased pro-vitamin A content. Nat. Biotechnol. 23, 482–487 (2005).
Dong, O. X. et al. Marker-free carotenoid-enriched rice generated through targeted gene insertion using CRISPR–Cas9. Nat. Commun. 11, 1178 (2020).
Yan, J. et al. Rare genetic variation at Zea mays crtRB1 increases beta-carotene in maize grain. Nat. Genet. 42, 322–327 (2010).
Furbank, R., Kelly, S. & von Caemmerer, S. Photosynthesis and food security: the evolving story of C4 rice. Photosynth. Res. 158, 121–130 (2023).
Messina, C. D. et al. Radiation use efficiency increased over a century of maize (Zea mays L.) breeding in the US corn belt. J. Exp. Bot. 73, 5503–5513 (2022).
Ojeda-Rivera, J. O. et al. Designing a nitrogen-efficient cold-tolerant maize for modern agricultural systems. Plant Cell 37, koaf139 (2025).
Broman, K. W. et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211, 495–502 (2019).
Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635 (2007).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Hof, J. P. & Speed, D. LDAK-KVIK performs fast and powerful mixed-model association analysis of quantitative and binary phenotypes. Nat. Genet. https://doi.org/10.1038/s41588-025-02286-z (2025).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Wrightsman, T., Marand, A. P., Crisp, P. A., Springer, N. M. & Buckler, E. S. Modeling chromatin state from sequence across angiosperms using recurrent convolutional neural networks. Plant Genome 15, e20249 (2022).
Zhai, J. et al. DeepTFBS: Improving within- and cross-species prediction of transcription factor binding using deep multi-task and transfer learning. Adv. Sci. 12, e03135 (2025).
Wang, Z. et al. DeepCBA: a deep learning framework for gene expression prediction in maize based on DNA sequences and chromatin interactions. Plant Commun. 5, 100985 (2024).
Adzhubei, I., Jordan, D. M. & Sunyaev, S. R. Predicting functional effect of human missense mutations using PolyPhen-2. Curr. Protoc. Hum. Genet. https://doi.org/10.1002/0471142905.hg0720s76 (2013).
Choi, Y. & Chan, A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).
Gaut, B. S., Díez, C. M. & Morrell, P. L. Genomics and the contrasting dynamics of annual and perennial domestication. Trends Genet. 31, 709–719 (2015).
Tenaillon, M. I. et al. Patterns of DNA sequence polymorphism along chromosome 1 of maize (Zea mays ssp. mays L.). Proc. Natl Acad. Sci. USA 98, 9161–9166 (2001).
Hill, W. G. & Robertson, A. The effect of linkage on limits to artificial selection. Genet. Res. 8, 269–294 (1966).
Comeron, J. M., Williford, A. & Kliman, R. M. The Hill–Robertson effect: evolutionary consequences of weak selection and linkage in finite populations. Heredity 100, 19–31 (2008).
Rodgers-Melnick, E. et al. Recombination in diverse maize is stable, predictable, and associated with genetic load. Proc. Natl Acad. Sci. USA 112, 3823–3828 (2015).
Salson, M. et al. Interplay between large low-recombining regions and pseudo-overdominance in a plant genome. Nat. Commun. 16, 6458 (2025).
Swarts, K. et al. Genomic estimation of complex traits reveals ancient maize adaptation to temperate North America. Science 357, 512–515 (2017).
Xue, S., Bradbury, P. J., Casstevens, T. & Holland, J. B. Genetic architecture of domestication-related traits in maize. Genetics 204, 99–113 (2016).
Wisser, R. J. et al. The genomic basis for short-term evolution of environmental adaptation in maize. Genetics 213, 1479–1494 (2019).
Soyk, S. et al. Variation in the flowering gene SELF PRUNING 5G promotes day-neutrality and early yield in tomato. Nat. Genet. 49, 162–168 (2017).
Stitzer, M. C. & Ross-Ibarra, J. Maize domestication and gene interaction. N. Phytol. 220, 395–408 (2018).
Martínez-Ainsworth, N. E. & Tenaillon, M. I. Superheroes and masterminds of plant domestication. CR Biol. 339, 268–273 (2016).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).
Wagner, G. P. & Zhang, J. The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12, 204–213 (2011).
Tenaillon, O. The utility of Fisher’s geometric model in evolutionary genetics. Annu. Rev. Ecol. Evol. Syst. 45, 179–201 (2014).
Orr, H. A. Adaptation and the cost of complexity. Evolution 54, 13–20 (2000).
Wagner, G. P. et al. Pleiotropic scaling of gene effects and the ‘cost of complexity’. Nature 452, 470–472 (2008).
Wang, Z., Liao, B.-Y. & Zhang, J. Genomic patterns of pleiotropy and the evolution of complexity. Proc. Natl Acad. Sci. USA 107, 18034–18039 (2010).
Fagny, M. & Austerlitz, F. Polygenic adaptation: integrating population genetics and gene regulatory networks. Trends Genet. 37, 631–638 (2021).
Stone, K. L., Platig, J., Quackenbush, J. & Fagny, M. The importance of regulatory network structure for complex trait heritability and evolution. Mol. Biol. Evol. 42, msaf174 (2025).
Author information
Authors and Affiliations
Contributions
The authors contributed equally to all aspects of the article.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Reviews Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
UniProtKB: https://www.uniprot.org/uniprotkb/statistics
Supplementary information
Glossary
- Biological language models
-
Self-supervised models trained to predict the likelihood of elements in DNA or protein sequences.
- Breeding by design
-
Creation of new varieties with ideal genetic characteristics defined according to biological (for example, pathways) or computational models (such as crop growth models).
- Crop growth models
-
(CGMs). Computational models consisting of multiple equations that represent physiological processes in plants and dynamically simulate crop growth in response to environmental inputs.
- Epistasis
-
The non-additive interaction between distinct genomic loci — the genotype-by-genotype (G × G) interaction — in which variation at one locus modifies the effect of another locus.
- Genetic load
-
Reduction of fitness owing to the accumulation of deleterious mutations via multiple evolutionary factors (genetic drift, inbreeding, linked selection, recombination, migration).
- Genome-wide association study
-
(GWAS). Study design that applies marker–trait association models at the scale of the entire genome to identify genetic variants associated with specific traits within a population.
- Genomic prediction
-
Statistical modelling approach for estimating breeding values of individuals within a breeding population by utilizing quantitative trait loci or variant effect estimates spread across the entire genome.
- Genotype-by-environment
-
(G × E). Interaction where the effect of a genomic locus depends on environmental conditions.
- Introgression
-
Process by which genomic regions from one species or population are stably integrated into the gene pool of another (through repeated backcrossing or genetic modification).
- Knowledge distillation
-
Machine learning technique in which a smaller ‘student’ model is trained to replicate the predictions of a larger, more complex ’teacher’ model, transferring knowledge while reducing model size and computational cost.
- Linkage disequilibrium
-
Nonrandom association of alleles at different loci, arising because of physical linkage and/or evolutionary factors (mutation, genetic drift, natural selection, population structure).
- Pleiotropy
-
Situation in which variation at a genomic locus influences multiple traits, reflecting multiple effects of a single causal variant (biological pleiotropy) or physical linkage between different causal variants (statistical pleiotropy).
- QTL mapping
-
Statistical association between variation at a genomic quantitative trait locus (QTL) and a molecular, cellular or organismal phenotype, given a particular genomic background and environmental context.
- Transcriptome-wide association study
-
(TWAS). Method used to identify associations between gene expression levels and a specific trait.
- Variant effect
-
Biological effect that a specific genetic variant (alteration in the DNA sequence) has on phenotypes, given a particular genomic background and environmental context.
- Zero-shot prediction
-
Inference on a new task that a model was not explicitly trained on. In the context of biological language models, this generally involves estimating the likelihood of alleles (for example, nucleotides or amino acids) in sequences that were never observed.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ramstein, G.P., Zhai, J., Buckler, E.S. et al. Translating functional molecular knowledge into crop-breeding success. Nat Rev Genet (2026). https://doi.org/10.1038/s41576-026-00968-w
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41576-026-00968-w


