Abstract
Soybean [Glycine max (L.) Merr.] is a critical crop globally, valued for its protein and oil content. However, historical bottlenecks have constrained genetic diversity in soybean, particularly in high-latitude regions such as North Dakota, where environmental conditions necessitate maturity group (MG) 00 and 0 cultivars. This genetic diversity study examines the North Dakota State University (NDSU) soybean breeding program using pedigree, coefficient of parentage (CP), and SNP-based analyses. Pedigree tracing of 40 NDSU cultivars revealed a genetic base derived from 49 founders. CP analysis confirmed these findings, emphasizing dependence on limited germplasm, with the top ten founders accounting for over 70% of the genetic background and Mandarin (Ottawa) alone contributing 24%. SNP-based dendrograms and genetic relationship structures demonstrate the relationships among cultivars and founders. Notably, the specialty food grade natto cultivars formed a distinct cluster unrelated to commodity soybean. Population structure analyses emphasized the reliance on specific ancestral germplasm for breeding. This study underscores the need to diversify breeding materials to prevent genetic gain plateaus in MG 00 and 0 soybeans, thereby enhancing yield potential and adaptability in high-latitude regions.
Data availability
The genotypic datasets generated and/or analyzed during the current study are available in the NCBI Sequence Read Archive (SRA) repository: http://www.ncbi.nlm.nih.gov/bioproject/1235621.
References
Fornari, H. D. The big change: cotton to soybeans. Agric. Hist. 53, 245–253 (1979). https://www.jstor.org/stable/3742873
Wilson, R. F. & Soybean market driven research needs. In Genetics and Genomics of Soybean. (eds. Stacey Gary) 3–15 (Springer Science + Business Media, 2008); https://doi.org/10.1007/978-0-387-72299-3_1
Hartman, G. L., West, E. D. & Herman, T. K. Crops that feed the world 2. Soybean—worldwide production, use, and constraints caused by pathogens and pests. Food Secur. 3, 5–17. https://doi.org/10.1007/s12571-010-0108-x (2011).
Carter, T. E. Jr., Nelson, R. L., Sneller, C. H. & Cui, Z. Genetic diversity in soybean in Soybeans: Improvement, Production, and Uses. (eds. Boerma, H.R. & Specht, J.E.) 303–416 (American Society of Agronomy–Crop Science Society of America–Soil Science Society of America, ; (2004). https://doi.org/10.2134/agronmonogr16.3ed.c8
Hyten, D. L. et al. Impacts of genetic bottlenecks on soybean genome diversity. Proc. Natl. Acad. Sci. U.S.A. 103, 16666–16671. https://doi.org/10.1073/pnas.0604379103 (2006).
Li, Y. H. et al. Genetic diversity in domesticated soybean (Glycine max) and its wild progenitor (Glycine soja) for simple sequence repeat and single-nucleotide polymorphism loci. New Phytol. 188, 242–253. https://doi.org/10.1111/j.1469-8137.2010.03344.x (2010).
Larson, G. et al. Current perspectives and the future of domestication studies. Proc. Natl. Acad. Sci. U.S.A. 111, 6139–6146. https://doi.org/10.1073/pnas.1323964111 (2014).
Tang, H., Sezen, U. & Paterson, A. H. Domestication and plant genomes. Curr. Opin. Plant. Biol. 13, 160–166. https://doi.org/10.1016/j.pbi.2009.10.008 (2010).
Song, Q. et al. Fingerprinting soybean germplasm and its utility in genomic research. G3 5, 1999–2006. https://doi.org/10.1534/g3.115.019000 (2015).
Gizlice, Z., Carter, T. E. Jr. & Burton, J. Genetic base for North American public soybean cultivars released between 1947 and 1988. Crop Sci. 34, 1143–1151. https://doi.org/10.2135/cropsci1994.0011183X003400050001x (1994).
Xavier, A., Thapa, R., Muir, W. M. & Rainey, K. M. Population and quantitative genomic properties of the USDA soybean germplasm collection. Plant. Genetic Resour. 16, 513–523. https://doi.org/10.1017/S1479262118000102 (2018).
Mikel, M. A., Diers, B. W., Nelson, R. L. & Smith, H. H. Genetic diversity and agronomic improvement of North American soybean germplasm. Crop Sci. 50, 1219–1229. https://doi.org/10.2135/cropsci2009.08.0456 (2010).
Wilcox, J. R. Sixty years of improvement in publicly developed elite soybean lines. Crop Sci. 41, 1711–1716. https://doi.org/10.2135/cropsci2001.1711 (2001).
Rincker, K. et al. Genetic improvement of US soybean in maturity groups II, III, and IV. Crop Sci. 54, 1419–1432. https://doi.org/10.2135/cropsci2013.10.0665 (2014).
Bruce, R. W. et al. Genome-wide genetic diversity is maintained through decades of soybean breeding in Canada. Theor. Appl. Genet. 132, 3089–3100. https://doi.org/10.1007/s00122-019-03408-y (2019).
USDA-NASS USDA/NASS QuickStats AD-hoc Query Tool. United States Department of Agriculture—National Agriculture Statistics Service; (2024). https://quickstats.nass.usda.gov/
Bangsund, D. A., Olson, F. E. & Leistritz, F. L. Economic contribution of the soybean industry to the North Dakota economy. Agribusiness and Applied Economics Report No. 678. (Department of Agricultural Economics, North Dakota State University, 2011); http://ageconsearch.umn.edu
Specht, J. E. et al. Crop Science Society of America Special Publications,. Soybean in Yield Gains in Major US Field Crops. (eds. Smith, S., B. Diers, Specht, J. & Carver B.) 311–355 ; (2014). https://doi.org/10.2135/cssaspecpub33.c12
Kandel, H. Soybean production field guide for North Dakota and Northwestern Minnesota. Agronomy Field Guide No. A-1172. (North Dakota State University Extension Service, North Dakota State University, (2010).
Helms, T. C. & Halvorson, M. A. Registration of ‘Council’ soybean. Crop Sci. 36, 206 (1996).
Woodworth, C. M., Illini & Soybeans University of Illinois Agricultural Experiment Station Bulletin No. 335 (Urbana, 1929).
USDA-AMS USDA Agricultural Marketing Service Plant Variety Protection (PVP) records. United States Department of Agriculture–Agricultural Marketing Service; (2024). https://www.ams.usda.gov/services/plant-variety-protection/
Brown, A. V. et al. A new decade and new data at SoyBase, the USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 49, 1496–1501. https://doi.org/10.1093/nar/gkaa1107 (2021).
USDA-ARS Germplasm Resources Information Network – Global (GRIN-Global). United States Department of Agriculture—Agriculture Research Service Germplasm Resources Information Network; (2024). https://www.ars-grin.gov/
Chan, Y. O. et al. The allele catalog tool: a web-based interactive tool for allele discovery analysis. BMC Genom. 24, 107. https://doi.org/10.1186/s12864-023-09161-3 (2023).
Danecek, P. et al. Twelve years of samtools and BCFtools. Gigascience 10, giab008. https://doi.org/10.1093/gigascience/giab008 (2021).
Grant, D., Nelson, R. T., Cannon, S. B., Shoemaker, R. C. & SoyBase The USDA-ARS soybean genetics and genomics database. Nucleic Acids Res. 38, D843–D846 https://doi.org/10.1093/nar/gkp798 (2010).
Bradbury, P. J. et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics 23, 2633–2635. https://doi.org/10.1093/bioinformatics/btm308 (2007).
Money, D. et al. LinkImpute: fast and accurate genotype imputation for nonmodel organisms. G3. 5, 2383–2390 . https://doi.org/10.1534/g3.115.021667 (2015).
Amadeu, R. R., Garcia, A. A. F. & Munoz, P. R. Ferrão L.F.V. AGHmatrix: genetic relationship matrices in R. Bioinformatics 39, 1–4. https://doi.org/10.1093/bioinformatics/btad445 (2023).
R Core Team R. A Language and Environment for Statistical Computing. (2023). https://www.R-project.org/
Zheng, X. et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326–3328. https://doi.org/10.1093/bioinformatics/bts606 (2012).
Yu, G. Data Integration, Manipulation and Visualization of Phylogenetic Trees (CRC, 2022). https://doi.org/10.1201/9781003279242
VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423. https://doi.org/10.3168/jds.2007-0980 (2008).
Zhao, S., Yin, L., Guo, Y., Sheng, Q. & Shyr, Y. heatmap3: An Improved Heatmap Package. (2021). https://CRAN.R-project.org/package=genetic relationship map3.
Frichot, E. & François, O. L. E. A. An R package for landscape and ecological association studies. Methods Ecol. Evol. 6, 925–929. https://doi.org/10.1111/2041-210X.12382 (2015).
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. Cluster: Cluster analysis basics and extensions. (2019). https://doi.org/10.32614/CRAN.package.cluster
Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J. Royal Stat. Soc. Ser. B: Stat. Methodol. 63, 411–423. https://doi.org/10.1111/1467-9868.00293 (2001).
Kassambara, A., Mund, F. & Factoextra Extract and visualize the results of multivariate data analyses. (2017). https://doi.org/10.32614/CRAN.package.factoextra
Viana, J. P. G. et al. Impact of multiple selective breeding programs on genetic diversity in soybean germplasm. Theor. Appl. Genet. 135, 1591–1602. https://doi.org/10.1007/s00122-022-04056-5 (2022).
Hymowitz, T. & Bernard, R. Origin of the soybean and germplasm introduction and development in North America in Use of Plant Introductions in Cultivar Development Part 1. (eds. Shands H.L. & Wiesner L.E.) 147–164 (Crop Science Society of America Special Publications, 1991); https://doi.org/10.2135/cssaspecpub17.c9
Stoa, T. Growing soybeans in North Dakota. Bimon. Bull. North. Dak. Agricultural Exp. Stn. 12, 131 (1950).
Bernard, R. L., Juvik, G. A., Hartwig, E. E. & Edwards, C. J. Jr Origins and pedigrees of public soybean varieties in the United States and Canada. USDA Technical Bulletin. US Department of Agriculture. 1746, (1988).
LeRoy, A. R., Fehr, W. R. & Cianzio, S. R. Introgression of genes for small seed size from Glycine Soja into G. max. Crop Sci. 31, 693–697. https://doi.org/10.2135/cropsci1991.0011183X003100030029x (1991).
Escamilla, D. M., Rosso, M. L., Holshouser, D. L., Chen, P. & Zhang, B. Improvement of soybean cultivars for Natto production through the selection of seed morphological and physiological characteristics and seed compositions: a review. Plant. Breed. 138, 131–139. https://doi.org/10.1111/pbr.12678 (2019).
Dietz, N. et al. Geographic distribution of the E1 family of genes and their effects on reproductive timing in soybean. BMC Plant Biol. 21, 441. https://doi.org/10.1186/s12870-021-03197-x (2021).
Bandillo, N. et al. A population structure and genome-wide association analysis on the USDA soybean germplasm collection. The Plant Genome. 8, 1–13; (2015). https://doi.org/10.3835/plantgenome2015.04.0024 (2015).
Acknowledgements
Thank you to the Bilyeu lab at the USDA in Columbia, Missouri, for preparing the tissue and DNA samples for whole genome sequencing. Thank you to Brian Diers and Rex Nelson for their advice on data interpretation and historical soybean semantics.
Funding
The NDSU Soybean Breeding program and cultivars created are supported by the North Dakota Soybean Council and the USDA Hatch Project Grant/Award Number: ND01505. The author Forrest Hanson was supported by the North Central Soybean Research Program project “SOYGEN3: Building capacity to increase soybean genetic gain for yield and composition through combining genomics-assisted breeding with characterization of future environments”.
Author information
Authors and Affiliations
Contributions
CD conceptualized the research; CD, NB, MH proposed and established methodology; FH, BH, GK, AM were responsible for data curation; FH, BH, GK analyzed the data and generated figures and tables; FH, BH, GK, CD wrote the manuscript; FH, BH, GK, NB, MH, AM, CD revised the manuscript. All authors read and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Hanson, F., Harms, B., Kreutz , G. et al. Genetic diversity analysis of North Dakota public soybean breeding program cultivars. Sci Rep (2026). https://doi.org/10.1038/s41598-026-35464-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-35464-y