Genomic selection with GWAS-identified QTL markers enhances prediction accuracy for quantitative traits in poplar (Populus deltoides)

Guo, Chenchen; Yin, Tongming; Wu, Huaitong; Dai, Xiaogang; Chen, Yingnan; Wei, Suyun

doi:10.1038/s42003-025-08700-w

Download PDF

Article
Open access
Published: 18 August 2025

Genomic selection with GWAS-identified QTL markers enhances prediction accuracy for quantitative traits in poplar (Populus deltoides)

Chenchen Guo^1,2,3,4^na1,
Tongming Yin^1,2,3,4^na1,
Huaitong Wu^1,2,3,4,
Xiaogang Dai^1,2,3,4,
Yingnan Chen ORCID: orcid.org/0000-0002-0095-6040^1,2,3,4 &
…
Suyun Wei ORCID: orcid.org/0000-0003-3262-2771^1,2,3,4,5

Communications Biology volume 8, Article number: 1242 (2025) Cite this article

4012 Accesses
3 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Poplar (Populus deltoides) serves as a model tree species with economic importance for wood and biomass production. Genomic genetic improvement of traits is crucial for accelerating tree breeding programs. In this study, we systematically characterized phenotypic variation across ten traits related to growth, wood properties, disease resistance, and leaf morphology in 237 poplar accessions. Phenotypic variation analysis revealed substantial variability among individuals, with coefficients of variation ranging from 4.86% to 73.49%. Narrow-sense heritability estimates indicated genetic contributions ranging from 6.23% to 66.84% for ten traits. A genome-wide association study identified 69 significant quantitative trait loci (QTL) distributed across various chromosomes, strongly associated with traits and implicating 130 annotated genes such as late embryogenesis abundant protein, uridine nucleosidase, and MYB transcription factor. Furthermore, the effects of QTL alleles were significantly correlated with phenotypic values. The integration of multi-trait QTL as random effects into genomic selection (GS) models significantly enhanced prediction accuracy, with an increase ranging from 0.06 to 0.48. Specially, the Bayesian Ridge Regression (BRR) model exhibited superior prediction accuracy for multiple traits. This study provides critical insights into the genetic basis of important traits in poplar, facilitating accelerated breeding efforts and enhancing genetic gains in forestry.

Dual-trait genomic analysis in highly stratified Arabidopsis thaliana populations using genome-wide association summary statistics

Article 31 May 2024

QTL mapping for yield contributing traits in mungbean (Vigna radiata L.) using a RIL population

Article Open access 01 July 2025

Reveal genomic insights into cotton domestication and improvement using gene level functional haplotype-based GWAS

Article Open access 21 May 2025

Introduction

Poplar is one of the fastest-growing temperate trees worldwide and serves as an important source of wood and biomass energy. The worldwide distribution of natural poplar forests covers ~54.5 million hectares, while the total area of cultivated poplar plantations reaches 31.4 million hectares¹. With the increasing scale of poplar cultivation, breeding germplasm with fast growth, high quality, and strong disease resistance has become a research focus. Traditional poplar breeding strategies include establishing base populations, selecting appropriate parents for intra- or interspecific hybridization, screening superior individuals in progeny, and expanding them through asexual propagation^2,3. Natural populations provide rich genetic materials for plant breeding, and their genetic diversity suggests that some individuals may carry commercially valuable genes⁴. The collection of germplasm resources and the establishment of core collections can enhance the utilization of these genetic resources. Several studies have focused on constructing core germplasm bank of poplar and assessing their genetic diversity^5,6. However, due to the long generation cycles of forest trees, completing a breeding generation typically takes several to dozens of years. To meet the increasing demand for wood and biomass energy, accelerating the poplar breeding process has become increasingly critical.

With advancements in sequencing technology, it is now possible to efficiently and cost-effectively obtain dense and reliable genetic markers that cover the entire genome. Various methods have been widely employed in the genetic dissection and breeding of important traits in plants, achieving significant progress. Genome-wide association studies (GWAS) are a powerful tool for investigating complex traits using genome-wide single nucleotide polymorphisms (SNPs), and multiple statistical models have been developed to enhance computational efficiency and detection capability⁷. Association studies have revealed the genetic architecture of growth traits in poplar⁸. Using genome-wide association studies, genes associated with sex determination in poplar have been identified, marking a significant advancement in understanding the mechanisms of plant sex differentiation⁹. In terms of disease resistance traits, key genes associated with poplar’s resistance to leaf rust were identified through linkage analysis, providing a critical molecular foundation for breeding disease-resistant poplar varieties¹⁰. In Eucalyptus, SNPs associated with growth and wood properties identified through GWAS (top 1% or P < 10⁻³) significantly improved the prediction accuracy for all traits¹¹. Association studies not only enhance our understanding of the genetic mechanisms underlying important traits in trees but also lay the groundwork for accelerating molecular breeding efforts in forestry.

Trees typically enter their reproductive phase only after several years of growth. Genomic selection (GS), which utilizes dense markers covering the entire genome to predict genetic value for selection, significantly reduces the time required for testing hybrid progeny in forests, thereby enhancing the genetic gain and accelerating the breeding process in plants¹². For instance, the phenotypic selection cycle for oil palm requires 19 years, whereas the genomic selection cycle takes only 6 years¹³. Specifically, a model is constructed using a training population with known genotypes and phenotypes to estimate marker effects, allowing the prediction of breeding values for candidate individuals based solely on genotype data¹⁴. Prediction accuracy is influenced by the genetic relationship between the training and prediction populations¹⁵. Additionally, factors such as effective population size, genotype density, training population size, genetic complexity, and statistical methods also affect the accuracy of genomic selection^16,17.

Despite the advantages of including GWAS in genomic selection, the application of these methods in poplar genetic breeding remains to be further explored. In this study, we aimed (1) to systematically characterize ten traits related to growth, wood properties, disease resistance, and leaf morphology in 237 poplar germplasm accessions; (2) to identify significant SNP loci associated with traits through genome resequencing and genome-wide association studies; (3) to analyze the effects of alleles at quantitative trait loci (QTL) on phenotypes and elucidate the relationship between favorable alleles and trait values; and (4) to assess the impact of incorporating multi-trait QTL markers on the prediction accuracy of genomic selection. Our study provides valuable insights into the genetic basis of growth, wood properties, disease resistance, and leaf morphology in poplar and significantly advances the development of genetic breeding in poplar.

Results

Phenotypic variation analysis of growth, wood properties, disease resistance, and leaf morphology in P. deltoides

The growth, wood properties, disease resistance, and leaf morphology of P. deltoides were systematically analyzed, focusing on phenotypic traits including diameter at breast height (DBH), basic density (BD), hemicellulose content, cellulose content, lignin content, black spot disease (BSD) infection rate, leaf area (LA), leaf length (LL), leaf width (LW), and leaf vein angle (LVA). Frequency distribution analysis indicated that all traits followed a bell-shaped curve, suggesting an approximate normal distribution was suitable for subsequent analysis (Fig. 1, Supplementary Table 1). The Shapiro–Wilk test results indicated that DBH, BD, and LVA followed a normal distribution (P > 0.05). In terms of growth traits, the average DBH was 37.5 ± 6.26 cm, with a range from 21.4 cm to 54.1 cm, highlighting significant variability among the individuals and suggesting a notable impact of growth conditions and genetic background. Regarding wood properties, the average BD was 0.40 ± 0.03 g/cm³, ranging from 0.31 g/cm³ to 0.49 g/cm³, indicating substantial potential for variation in basic density, likely associated with environmental adaptability and genetic factors. Furthermore, the average contents of hemicellulose, cellulose, and lignin were 23.48 ± 2.07%, 54.77 ± 2.66%, and 12.93 ± 2.54%, respectively, demonstrating significant differences in wood composition among individuals, thereby providing foundational data for wood utilization and improvement. For disease resistance, the average BSD was 16.35 ± 12.02%, with a range from 0.62% to 75.51%, indicating considerable individual variation in disease resistance, which may influence performance in diverse disease environments. In the leaf morphological trait analysis, the averages for LA, LL, LW, and LVA were 99.65 ± 31.01 cm², 11.99 ± 1.68 cm, 12.30 ± 1.9 cm, and 64.17 ± 7.58°, with respective ranges of 46.57 cm² to 214.78 cm², 8.83 cm to 18.45 cm, 8.71 cm to 18.04 cm, and 47.63° to 90.31°. These results reveal substantial variability in leaf-related traits, suggesting potential directions for further genetic improvement. Overall, the coefficients of variation (CV) for the ten traits ranged from 4.86% to 73.49%, with BSD exhibiting the highest CV (73.49%) and hemicellulose the lowest (4.86%), further emphasizing the differing degrees of phenotypic variation among traits.

**Fig. 1: Phenotypic variation analysis in growth, wood properties, disease resistance, and leaf morphology.**

Estimates of the narrow-sense heritability indicated clear differences in the genetic background of various traits (Supplementary Table 1), with the genetic contributions being 21.58% for DBH, 50.00% for BD, 9.28% for hemicellulose, 27.14% for cellulose, 6.23% for lignin, 66.51% for BSD, 26.43% for LA, 16.88% for LL, 30.46% for LW, and 66.84% for LVA. Correlation analysis revealed significant relationships among traits (Supplementary Fig. 1), with a significant positive correlation between DBH and BD (r = 0.39), and a significant negative correlation between BD and hemicellulose (r = –0.14). Additionally, lignin showed significant negative correlations with both hemicellulose (r = –0.54) and cellulose (r = –0.66). Furthermore, there were strong positive correlations among LA, LL, and LW, while BSD exhibited significant negative correlations with LA (r = –0.22), LL (r = –0.18), and LW (r = –0.17). These findings provide important theoretical foundations for understanding the genetic characteristics and phenotypic values of P. deltoides.

Identification of QTL for complex quantitative traits

We performed re-sequencing on 237 samples, yielding a total of 1375 GB of high-quality clean data, with each individual having over 5 GB of clean data. The sequencing data exhibited high quality, with Q30 values surpassing 90.47% (Supplementary Data 1). After aligning to the P. deltoides reference genome and filtering out low-quality and linkage disequilibrium markers, a total of 685,181 SNPs that were evenly distributed across 19 chromosomes were identified (Supplementary Fig. 2). Based on geographic location, these 237 samples were categorized into three subgroups. The calculated linkage disequilibrium (LD) decay rates showed that the average decay distances for the Louisiana, Mississippi, and Texas subgroups were 4.1 kb, 6.9 kb, and 9.1 kb, respectively, with an overall average decay distance of 6.7 kb, which suggests that the LD decay distance in the Louisiana subgroup is significantly lower than in the other subgroups, possibly due to its geographical distribution characteristics. Phylogenetic tree analysis indicated that individuals from the three subgroups did not cluster together. Principal component analysis (PCA) revealed that the PC1-PC3 explained 15.65% of the genetic variation, roughly dividing the 237 accessions into three clusters. However, when the Admixture analysis was performed with K set to 2, 3, and 4, no clear population structure was detected (Supplementary Fig. 3). Furthermore, the pairwise fixation index (Fst) values computed among the geographically defined Louisiana, Mississippi, and Texas subgroups were 0.2 (Louisiana and Mississippi), 0.3 (Louisiana and Texas), and 0.4 (Texas and Mississippi), indicating a low level of genetic differentiation among these subgroups (Supplementary Fig. 4).

To identify genetic variations associated with multiple complex quantitative traits, a GWAS was conducted using filtered SNPs combined with phenotypic data for ten traits. Using significance threshold of P < 1 × 10⁻⁵, a total of 69 significant QTLs were identified, distributed across various chromosomes and associated with the following traits: 4 QTL for DBH, 8 QTL for BD, 9 QTL for hemicellulose, 7 QTL for cellulose, 3 QTL for lignin, 7 QTL for BSD, 6 QTL for LA, 11 QTL for LL, 5 QTL for LW, and 9 QTL for LVA (Supplementary Fig. 5, Supplementary Table 2). Among these, four significant QTLs were identified: chr2_6037140 and chr10_9264225 for cellulose, chr1_11210638 for LA, and chr7_15696963 for BSD (Fig. 2). Haplotype analysis of the candidate genes in these QTLs showed that Hap002 of EVM0002801 had a significantly higher cellulose content than Hap001 by 1.5%, and Hap003 of EVM0030857 had a significantly lower BSD by 8.43% compared to Hap001. Certain traits, particularly LL, showed clusters of QTLs within specific chromosome regions, suggesting potential linkage disequilibrium in these areas. Co-localization analysis further revealed an overlap of QTL associated with LW, LL, and LA on chromosome 1. Additionally, QTL associated with cellulose and lignin showed overlap on chromosomes 2. These results indicate that these chromosomal regions may contain key genes regulating multiple traits.

**Fig. 2: Genome‑wide association study (GWAS) for ten traits in poplar.**

A total of 130 annotated genes were involved in the 69 identified significant QTL regions (Fig. 3, Supplementary Data 2), which were distributed across various chromosomes and implicated in the genetic regulation of important traits related to plant growth, wood properties, disease resistance, and leaf morphology. Gene Ontology (GO) enrichment analysis of these 130 candidate genes revealed significant enrichment in terms related to receptor serine/threonine kinase binding, cytoplasmic translation, magnesium ion transmembrane transport, dynein intermediate chain binding, chloroplast envelope, early endosome, cytoplasmic dynein complex, magnesium ion transmembrane transporter activity, malate transport, tryptophan biosynthetic process, etc. (Supplementary Fig. 6). These results suggest that the identified candidate genes may play essential roles in various cellular and metabolic processes. The key genes EVM0002801 and EVM0030857, identified as regulators of cellulose content and BSD, were annotated as late embryogenesis abundant protein and uridine nucleosidase, respectively. The co-effect loci associated with LW, LL, and LA involves a key functional gene, EVM0007952, annotated as a MYB transcription factor, which plays a crucial role in regulating plant growth, development, and stress responses. Further analysis revealed that the co-effect loci associated with cellulose and lignin involve several key functional genes, including EVM0014191, EVM0002801, and EVM0037792. Specifically, EVM0014191 encodes desiccation protectant protein Lea14 homolog. EVM0037792 is annotated as having ubiquitin-protein transferase activity, a key function in protein degradation and cellular regulation.

**Fig. 3: Genomic distribution of quantitative trait loci (QTL).**

Effects of QTL alleles on trait phenotypes

We investigated the effects of alleles at QTL on phenotype and found significant differences between in the impact of favorable and unfavorable alleles on the traits (Supplementary Data 3). Specifically, at locus chr13_1225902, allele A exhibited a significantly higher DBH compared to allele G. At locus chr4_5602349, the average BD was 0.40 g/cm³ for allele T and 0.36 g/cm³ for allele C. At locus chr11_16124994, the average hemicellulose content of allele A was 27.59%, compared to 23.34% for allele G. At locus chr2_7224508, allele A was associated with lower cellulose content, whereas allele C was associated with higher cellulose content. At locus chr1_27799505, the average lignin content of allele G was 13.25%, while that of allele A was 11.95%. Additionally, at locus chr7_10250865, the average BSD for allele T was 15.02%, compared to 29.66% for allele C. Furthermore, allele A at locus chr17_10568531, allele A at locus chr3_9907822, allele C at locus chr5_12597886, and allele A at locus chr13_15731917 exhibited higher mean values for LA, LL, LW, and LVA, respectively.

We further analyzed the influence of the number of favorable alleles on trait phenotypes. The distribution of the mean favorable frequency (MFAF) per individual among the 237 genotypes ranged from 0.03 to 0.74 across ten traits, with the highest MFAF of 0.74 for BSD and the lowest MFAF of 0.03 for LVA (Fig. 4). This result indicated that the relatively small population size and the low MFAF of favorable alleles for certain traits may have led to an overestimation of their effect size. Correlation analyses indicated significant associations between the number of favorable alleles and trait phenotypes. For instance, the number of favorable alleles in each individual was positively correlated with DBH, BD, and LA, with correlation coefficients of 0.37, 0.48, and 0.42, respectively, suggesting that individuals with a higher number of favorable alleles exhibited larger phenotypic values. Similarly, the number of favorable alleles showed a significant negative correlation with BSD, indicating that individuals with more favorable alleles had fewer disease lesions from black spot disease, thus exhibiting stronger disease resistance. Overall, the number of favorable alleles present in individuals significantly influences the breeding values of traits, positioning these favorable alleles as potential targets for future genetic improvement.

**Fig. 4: Regression statistics between the number of favorable alleles and phenotypic data, and heatmap of favorable allele frequencies.**

Inclusion multi-trait QTLs enhance prediction accuracy in GS models

We assessed the impact of incorporating multi-trait QTL into genomic selection models on prediction accuracy, comparing them with traditional methods such as Bayesian ridge regression (BRR), BayesA, and BayesC models. The results demonstrated that incorporating multi-trait QTLs significantly improved prediction accuracy across all ten traits (Fig. 5, Supplementary Data 4). For the growth trait, integrating the 10⁻³sig marker set into the BRR and BayesA models resulted in the highest improvement in DBH prediction accuracy, with an increase of 0.37 (P < 0.0001). For wood properties, lignin content showed the greatest improvement in prediction accuracy, with the BRR model both achieving a significant increase of 0.48 incorporating the 10⁻⁴sig and 10⁻³sig marker set, compared to the unoptimized model (P < 0.0001). For the BSD, the BayesA model showed the greatest improvement in prediction accuracy, with an increase of 0.15 incorporating the 10⁻⁵sig marker set (P < 0.001). For leaf morphological traits, the BRR model incorporating the 10⁻⁴sig marker set resulted in the highest improvement in prediction accuracy for LL and LW, with improvement of 0.44 for both traits (P < 0.0001). Specifically, for DBH, prediction accuracies of the BRR, BayesA, and BayesC models were 0.15 ± 0.04, 0.13 ± 0.06, and 0.08 ± 0.06, respectively, and these were consistently improved by incorporating multi-trait QTLs, with the BRR model integrating the 10⁻³sig marker set achieving the highest accuracy of 0.52 ± 0.03. Similarly, for BSD, incorporating 10⁻⁵sig and 10⁻⁴sig marker set into the BRR model achieved the highest accuracy (0.52 ± 0.06), compared to baseline accuracies of 0.41 ± 0.08 (BRR), 0.36 ± 0.07 (BayesA), and 0.29 ± 0.06 (BayesC). A comparison of the effects of different marker sets on model performance across the ten traits and the BRR, BayesA, and BayesC models showed that incorporating the 10⁻⁵sig and 10⁻²sig marker sets significantly improved prediction accuracy in 97% of the models, compared to the baseline models (Supplementary Data 4). In contrast, integrating 10⁻⁴sig and 10⁻³sig marker sets led to significantly improved prediction accuracy in all models. Multiple comparisons revealed that models incorporating the 10⁻³sig marker set exhibited significantly higher predictive accuracy than those using other marker sets, further highlighting the effectiveness of this approach (P < 0.05, Supplementary Fig. 7). Subsequently, multiple comparisons among the BRR, BayesA, and BayesC models demonstrated that BRR consistently outperformed both BayesA and BayesC across all ten traits, highlighting the robustness of the BRR model in leveraging multi-trait QTL information for genomic selection (P < 0.05, Supplementary Fig. 8).

**Fig. 5: Comparison of prediction accuracy between genomic selection (GS) models incorporating multi-trait QTL identified by GWAS and traditional GS models (BRR, BayesA, and BayesC).**

Discussion

In this study, we genotyped 237 accessions from the P. deltoides germplasm plantation derived from natural populations in the southern United States to investigate the role of genetic variation in regulating complex traits such as growth, wood properties, disease resistance, and leaf morphology. Although these accessions were categorized into three subgroups based on geographical location, no definitive genetic structure was observed. This absence of distinct genetic differentiation may be attributed to the characteristics of poplar as a dioecious species, which allows for wind-mediated long-distance dispersal of pollen and seeds¹⁸. The extensive gene flow resulting from such dispersal may facilitate genetic exchange between populations, thereby accounting for the observed lack of genetic structure. However, PCA analysis classified the 237 accessions into three distinct groups. The majority of the 237 germplasms accessions are distributed within the Mississippi River basin. River networks can function as natural corridors or barriers, influencing dispersal patterns and gene flow within populations. These localized factors may contribute to the observed population structure, potentially shaping genetic differentiation at a finer scale. This suggests that while extensive gene flow may reduce large-scale genetic differentiation, regional river level may play a significant role in genetic structure within populations. These findings are consistent with previous studies indicating that P. deltoides individuals in Texas and Louisiana exhibit similar genetic structures at K = 2 and K = 3¹⁹. Furthermore, a genetic structure analysis of 384 unrelated P. deltoides individuals demonstrated that those from Missouri, Louisiana, Tennessee, and Iowa clustered together, reinforcing the idea of widespread gene exchange among populations⁵.

GWAS identified 4, 27, 7, and 31 significant SNPs (P < 1 ×10⁻⁵) associated with growth, wood properties, disease resistance, and leaf morphology, respectively, to elucidate the genetic basis of these complex traits. By leveraging recombination events in diverse germplasm throughout evolutionary history, GWAS significantly enhances the resolution for identifying alleles associated with these traits^8,20. Previous association analyses of 17 half-sib families of black poplar (P. nigra), 448 wild accessions of black cottonwood (P. trichocarpa), and 435 unrelated individuals of white poplar (P. tomentosa) also indicated the presence of polygenic structures^21,22,23. The identified association loci and candidate genes may vary based on the statistical methods and population characteristics. In this study, GWAS was performed using 198 germplasms, a sample size that may constrain the generalizability and robustness of the findings. Although significant associations loci were identified, the relatively small population size may not fully capture the genetic structure underlying complex traits. As a result, the loci identified here may differ when applied to broader populations. To enhance the reliability and comprehensiveness of these findings, future studies should incorporate larger sample sizes to better capture the genetic architecture underlying the traits of interest. Nevertheless, conducting GWAS still poses challenges in balancing the risks of false positives and false negatives²⁴. Employing high-density SNP markers for association analysis proves to be more effective in obtaining valuable information^22,25. Overall, our findings contribute to an expanded understanding of the genetic structures associated with various quantitative traits in poplar.

Wood formation is a highly intricate process that involves vascular cambium proliferation, differentiation of xylem mother cells, secondary wall deposition, programmed cell death, and finally heartwood formation²⁶. The function of the EVM0002801 genes identified in the GWAS for cellulose and lignin was annotated as late embryogenesis abundant protein (LEA proteins). In poplar, LEA genes are essential for growth and development and play a significant role in the response to salt stress²⁷. Expression patterns analysis of the 53 LEA genes identified in poplar revealed that 26 of these genes were expressed in the phloem under normal growth conditions²⁸. This suggests that the EVM0002801 gene may play a potential role in regulating the accumulation of cellulose and lignin content during wood formation. Further functional validation is needed to fully understand its regulatory mechanisms. Plants have complex mechanisms to protect themselves from pathogen attacks. Genes encoding uridine nucleosidase in grapevines were activated in response to infection by anthracnose²⁹. Similarly, we identified the candidate gene associated with BSD as EVM0030857 with the functional annotation uridine nucleosidase. Leaf morphology represents a critical category of adaptive trait variation, making it a favorable system for evolutionary studies³⁰. Additionally, we observed a significant negative correlation between BSD and leaf morphology (LA, LL, and LW). Previous GWAS efforts have identified key genes affecting leaf serration traits in poplar³¹, highlighting the pivotal role of cellular polarity in leaf morphological variation³⁰. Notably, we found the same QTLs associated with LA, LL, and LW, Notably, and the EVM0007952 gene located within the QTL was functionally annotated as MYB transcription factor. Previous studies have shown that MYB-related gene family regulates leaf morphology in Arabidopsis³² and maize³³, which is consistent with our findings that EVM0007952 also belongs to this family. Overall, existing evidence indicated that the candidate genes had been directly or indirectly validated for their impact on the target traits, providing a foundation for future functional validation. The candidate genes identified in this study enhance the understanding of the genetic mechanisms underlying growth, wood properties, disease resistance, and leaf morphology traits.

Tree breeding faces numerous challenges, including longer breeding cycles and larger field trial areas. Therefore, breeders urgently need to explore new methods to enhance the efficiency of forest breeding. In recent years, genomic selection has emerged as a powerful tool for accelerating tree breeding^17,34. Our study demonstrates that the BRR model significantly outperforms the BayesA and BayesC models in predicting ten traits. Our result is consistent with previous studies, indicating that the BRR model excels in predicting both the vertical and transverse diameters of pear fruit cores compared to BayesA, BayesB, and BayesC models³⁵. Additionally, in the context of predicting resistance to wheat rust, the BRR and BayesL models demonstrated higher accuracy than support vector regression models³⁶. In high-density marker datasets, the majority of SNPs exhibit neutral phenotypic effects, with only a limited number associated with specific traits^16,37,38. The accuracy of genomic selection relies heavily on the assumptions regarding marker effects within the model. Recent studies focusing on tree species have indicated that selecting markers with specific effects on traits can enhance the predictive accuracy of genomic selection^39,40. This enhancement has been corroborated by empirical studies across various crops and horticultural plants^41,42,43,44 and supported by relevant simulation studies⁴⁵. The incorporation of QTLs as fixed effects further strengthens the predictive capabilities of genomic selection^43,46,47,48. In this study, the integration of multi-trait QTLs as random effects within the BRR, BayesA, and BayesC model resulted in improved predictive accuracy for DBH, BD, hemicellulose, cellulose, lignin, BSD, LA, LL, and LW. While previous studies have demonstrated the benefits of incorporating major QTLs as fixed effects, our results indicate that incorporating multi-trait QTLs as random effects can lead to significant improvements in predictive accuracy across multiple traits. This improvement may be attributed to the presence of moderate to strong phenotypic correlations among traits, as well as the identification of overlapping QTL regions. These observations suggest that certain genomic regions may exert pleiotropic effects or contain tightly linked loci influencing multiple traits. Accordingly, selecting SNPs based on combined GWAS results across traits enabled the models to capture shared genetic signals, improving their ability to borrow information across traits during training and thereby enhancing prediction accuracy.

The previous research showed that selecting markers through GWAS and using the same population to construct genomic selection models reduces prediction accuracy and leads to biased estimates of predicted breeding values⁴⁹. In contrast, GWAS was conducted on three different breeds of Nordic cattle, where higher accuracy was achieved in the populations used for GWAS, and the greatest improvement in genomic prediction was observed in the populations not included in the GWAS⁵⁰. The linkage between markers and QTLs may not be consistent across different populations^51,52. This inconsistency is not accounted in genomic selection models for populations without preselected markers, which can lead to reduced prediction accuracy across populations. We conducted a GWAS and marker selection on the P. deltoides germplasm population, which may impact the application of GWAS results across different poplar populations to improve genomic prediction accuracy. This limitation can be overcome by conducting GWAS across multiple populations⁵³. In summary, including multi-trait QTLs as random effects within genomic selection models can significantly enhance the accuracy of trait predictions, thereby providing a robust approach to improving the efficiency of forest breeding programs.

In conclusion, this study identified the genomic sources of quantitative trait loci and their favorable alleles associated with growth, wood properties, disease resistance, and leaf morphology traits. These reliable markers, closely linked to the traits of interest, are significant for enhancing the accuracy of genomic selection in Populus breeding. Integrating these multi-trait QTL markers into genomic selection strategies can effectively accelerate the tree breeding process, thereby addressing the increasing demands of the timber industry.

Methods

Plant materials and phenotyping

The population of 237 germplasm accessions used in this study was derived from the Populus deltoides germplasm plantation (Supplementary Fig. 9, Supplementary Data 5). Specifically, 30, 90, and 117 germplasm accessions were collected from Texas, Louisiana, and Mississippi in the United States, respectively. The germplasm plantation was established in 1998 at the Sihong Forest Farm in Jiangsu Province, China, using a randomized block design, with a 6 ×6 m spacing and 12 ramets for each clone.

In this study, we measured ten traits related to growth performance, wood quality, and disease resistance, including DBH, BD, hemicellulose content, cellulose content, lignin content, BSD infection rate, LA, LL, LW, and LVA. In detail, DBH was measured using a measuring tape at 1.3 m above ground level. Wood core samples were collected from the same height of each tree and the basic wood density was determined using the maximum moisture content method¹. The collected wood core samples were oven-dried at 105 °C, ground through a 40-mesh sieve, and hemicellulose, cellulose, and lignin contents were measured using the Van Soest method⁵⁴. In October of the current growing season, mature leaves were collected from each tree at different heights and orientations. Images of the leaves were captured using a Canon camera (Canon, Tokyo, Japan), arranged at a 1:1 scale, and digitized using ImageJ version 1.54f software⁵⁵ to measure the BSD-infected area, leaf area, leaf length, leaf width, and leaf vein angle (Supplementary Fig. 10). The BSD infection rate was calculated as the ratio of the infected area to the total leaf area. Spatial autocorrelation may influence disease spread and severity in forest experimental plots. To address micro-spatial variation in the analysis of BSD traits, spatial heterogeneity was corrected using the R package SpATS by considering row and column variables before estimating heritability⁵⁶. The narrow-sense heritability (h²) was estimated using the formula h² = V_G / (V_G + V_E), where V_G is the genotypic variance and V_E is the environmental variance. Pearson correlation analysis was conducted to explore the relationships between traits using the R package Hmisc and corrplot.

Whole-genome resequencing and SNP detection

The leaf samples from each genotype were used for DNA extraction. Total DNA was isolated using the magnetic bead plant DNA extraction kit (Tiangen Biotech, Beijing, China). DNA quality was assessed by measuring purity using NanoDrop 2000 & 8000 spectrophotometers (Thermo Fisher Scientific, Waltham, USA), quantifying concentration with Qubit fluorometer (Thermo Fisher Scientific, Waltham, USA), and integrity evaluation via 1% agarose gel electrophoresis (120 V, 45 min). Sequencing libraries for plant whole-genome resequencing were prepared using the Illumina DNA PCR-Free Prep, Tagmentation Kit (Illumina, San Diego, USA). Sequencing was performed on the Illumina NovaSeq 6000 platform (Beijing Annoroad Gene Technology Co., Beijing, China) using a paired-end 150 bp (PE150) strategy with a depth exceeding 10×. Raw reads were trimmed for quality using the Trimmomatic software⁵⁷, and high-quality clean reads were aligned to the P. deltoides reference genome⁹ using the BWA-MEM algorithm⁵⁸. Potential PCR duplicates were filtered, and single nucleotide polymorphism (SNP) calling was conducted using the Genome Analysis Toolkit (GATK) v4.5⁵⁹. SNP selection was performed using Plink v1.90b, retaining only SNPs located on chromosomes. Missing alleles were imputed using Beagle v5.4 software⁶⁰. Biallelic SNPs were extracted and filtered using VCFtools v0.1.16⁶¹, applying a missing rate threshold of 0.2 and excluding SNPs with a minor allele frequency below 0.05.

Linkage disequilibrium, GWAS and QTL identification

The genetic structure of the population was assessed using PCA and phylogenetic tree analysis. PCA was performed with Plink v1.90b software⁶², while the phylogenetic tree was constructed using the SNPhylo software⁶³. Population structure was determined using the Admixture v 1.3.0 software, and Fst between subpopulations was calculated using VCFtools v0.1.16. LD analysis was conducted using PopLDdecay software, with the parameter “-MaxDist 1000” to calculate the LD coefficients between SNPs and to evaluate LD decay⁶⁴. SNP filtering based on LD and Hardy-Weinberg Equilibrium (HWE) was performed using Plink v1.90b with the parameters “-indep-pairwise 50 5 0.2” and “--hwe 0.000001”.

Following quality control, GWAS was performed using the BLINK model from the GAPIT v3.0 package in R⁶⁵, based on phenotypic and genotypic data from all 237 individuals to provide a population-level overview of SNP-trait associations across the ten traits. The BLINK model is considered to have greater statistical efficiency, thereby enhancing the detection power of significant signals in GWAS⁶⁶. It identifies quantitative trait nucleotides (QTNs) using a generalized linear model (GLM), with the equations as follows:

$$y=X\beta +Q{{{\rm{v}}}}+e$$

(1)

Where y is the vector of observed phenotypes, X is the matrix of molecular markers, $\beta$ is the vector of additive SNP effects treated as fixed effects, Q is the population structure matrix, v is the vector of population structure fixed effects, $e$ is the residuals. The BLINK model uses QTNs identified by GLM as covariates, incorporating LD information to further refine the identification of QTNs. A Bayesian information criterion (BIC) strategy is then applied to assess the accuracy of the QTNs, excluding false hypotheses and retaining true QTNs. The mixed model included the first three principal components (PCs) derived from whole-genome SNPs as fixed effects to account for population structure. The deviation of P-values from normality was evaluated using quantile-quantile (Q-Q) plots. To balance the need for stringent filtering while accommodating the broad scope of phenotypic analyses, a significance threshold (−log₁₀(P) = 5.00) was selected based on the adjusted Bonferroni correction threshold (−log₁₀(0.05/685181) = 7.13) to identify loci associated with the traits. The confidence intervals for significant SNPs were defined based on the average LD decay to r² = 0.1 among the subpopulations.

QTL were delineated as the confidence intervals upstream and downstream of significant SNPs^67,68. Colocalization between traits was determined by assessing whether the QTLs identified for multiple traits through LD were located within the same genomic region. Candidate genes within the QTL regions were identified by aligning with the reference genome and annotated for their putative functions using eggNOG-mapper v.2⁶⁹. Haplotype analysis of candidate genes was performed using the R package geneHapR⁷⁰. The candidate gene sets were analyzed for GO enrichment using R package clusterProfiler v4.10.1⁷¹. The LD relationships between SNPs within the QTL regions were further analyzed using LDBlockShow software⁷². SNP markers with P < 1 × 10⁻⁵ associated with the traits were used to identify favorable alleles. Genotypes that associated with increased DBH, BD, hemicellulose, cellulose, lignin, BSD, LA, LL, and LW values, or with decreased BSD values were considered favorable alleles. The number of homozygous favorable alleles for each individual was defined as the number of favorable alleles. The relationship between favorable alleles and phenotypes was visualized using the ggplot2 package in R. Additionally, a heatmap was generated using the pheatmap package in R to illustrate the distribution characteristics and clustering patterns of the favorable alleles across different poplar genotypes.

Construction of genomic selection model

Genomic selection models were constructed using BRR, BayesA, and BayesC methods^12,73. To integrate GWAS information into GS, these marker sets were incorporated into a mixed model as random effects for marker grouping, and the resulting Bayesian prediction model is formulated as follows.

$$y={\sum }_{i=1}^{l}{w}_{i}{\delta }_{i}+{\sum }_{K=1}^{m}{Z}_{k}{\gamma }_{k}+\varepsilon$$

(2)

where y is the phenotypic vector for n individuals, ${w}_{i}$ is the genotype vector for n individuals at the i-th marker from the four SNP marker sets, ${\delta }_{i}$ indicates the effect of the i-th SNP marker, ${\gamma }_{k}$ indicates the effect of the k-th marker, and ${Z}_{k}$ is the genotype vector for n individuals at the k-th marker from other markers. The term $\varepsilon$ represents the random error component, where $\varepsilon \sim N\left(0,{I}_{n}{\sigma }^{2}\right)$.

The predictive accuracy of GS was evaluated using five-fold cross-validation. The population of 237 accessions was randomly divided into five folds. In each iteration, four folds (190 accessions) were used as the training population, while the remaining fold (47 accessions) served as the validation population. To avoid bias in the cross-validation assessment caused by SNP selection from GWAS, both GWAS and SNP selection were conducted exclusively within the training population (190 accessions). Four SNP marker sets, identified as 10⁻⁵sig, 10⁻⁴sig, 10⁻³sig, and 10⁻²sig, were selected from the combined GWAS results of the ten traits, corresponding to significance thresholds of P < 1 × 10⁻⁵, P < 1 × 10⁻⁴, P < 1 × 10⁻³, and P < 1 × 10⁻², respectively. These marker sets were then used to optimize GS models within their respective training populations. This process was repeated across all five folds, and prediction accuracy was assessed by calculating the Pearson correlation coefficient between observed phenotypes and genomic estimated values. To ensure robustness, the entire cross-validation procedure was repeated ten times. By setting the same random seed, the same training individuals were maintained across all models to ensure comparability. All GS models were constructed using the BGLR v1.1.3 package in R⁷⁴.

Statistics and reproducibility

Statistical analyses were performed using R software (v4.4.0). For Fig. 2c, two-tailed unpaired Student’s t tests were conducted, while two-tailed paired t-tests were applied in Supplementary Data 5. P-values were adjusted using the Benjamini–Hochberg method to control the false discovery rate. Linear regression analysis was used in Fig. 4 to assess the relationship between genomic prediction accuracy and transcriptomic correlation. For Supplementary Figs. 7 and 8, linear mixed models (LMMs) were fitted using the lme4 package (v1.1-37), with model type and marker set included as fixed effects, and cross-validation replicates modeled as a random effect to account for variability across resampling iterations. Multiple comparisons were performed using the emmeans package (v1.11.1) with Bonferroni correction. Residual diagnostics were conducted to confirm the assumptions of normality and homoscedasticity. Statistical significance was set at P < 0.05. Significance levels were indicated as: ***P < 0.001; **P < 0.01; *P < 0.05; ns not significant. Data are presented as mean ± standard deviation.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw high-throughput sequencing data generated in this study are available in the NCBI Sequence Read Archive (SRA) under the BioProject accession number PRJNA1298496. Source data for all figures and tables are provided with this paper in the following supplementary files: Source data for Figs. 2c, 4, and 5 are included in Supplementary Data 6. Should any raw data files be needed in another format they are available from the corresponding author upon reasonable request.

Code availability

No custom code was developed or used in this study. All statistical analyses and genomic selection modeling were performed using publicly available software and R packages. All analyses were conducted in R (v4.4.0). Genomic selection models were constructed using the BGLR package (v1.1.3), with different ETA parameters set to model multiple marker effects. Linear mixed models were fitted using the lme4 package (v1.1-37). Detailed software and package versions are provided in the Methods section.

References

Zhang, Y., Fang, S., Tian, Y., Wang, L. & Lv, Y. Responses of radial growth, wood density and fiber traits to planting space in poplar plantations at a lowland site. J. For. Res. 33, 963–976 (2022).
Article CAS Google Scholar
du Cros, E. T. Breeding strategies with poplars in Europe. For. Ecol. Manag. 8, 23–39 (1984).
Google Scholar
Ruotsalainen, S. Increased forest production through forest tree breeding. Scand. J. For. Res. 4, 333–334 (2014).
Article Google Scholar
Benetka, V., Novotná, K. & Štochlová, P. Wild populations as a source of germplasm for black poplar (Populus nigra L.) breeding programmes. Tree Genet. Genomes 8, 1073–1084 (2012).
Chen, C., Chu, Y., Ding, C., Su, X. & Huang, Q. Genetic diversity and population structure of black cottonwood (Populus deltoides) revealed using simple sequence repeat markers. BMC Genet. 21, 2 (2020).
Article PubMed PubMed Central Google Scholar
Storme, V. et al. Ex-situ conservation of black poplar in Europe: genetic diversity in nine gene bank collections and their value for nature development. Theor. Appl. Genet. 108, 969–981 (2004).
Article CAS PubMed Google Scholar
Tibbs Cortes, L., Zhang, Z. & Yu, J. Status and prospects of genome-wide association studies in plants. Plant Genome 14, e20077 (2021).
Article CAS PubMed Google Scholar
Du, Q. et al. Genetic architecture of growth traits in Populus revealed by integrated quantitative trait locus (QTL) analysis and association studies. New Phytol. 209, 1067–1082 (2016).
Article CAS PubMed Google Scholar
Xue, L. et al. Evidences for a role of two Y-specific genes in sex determination in Populus deltoides. Nat. Commun. 11, 5893 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wei, S. et al. Identification of genes underlying the resistance to Melampsora larici-populina in an R gene supercluster of the Populus deltoides genome. Plant Dis. 104, 1133–1143 (2020).
Article CAS PubMed Google Scholar
Tan, B. & Ingvarsson, P. K. Integrating genome-wide association mapping of additive and dominance genetic effects to improve genomic prediction accuracy in Eucalyptus. Plant Genome 15, e20208 (2022).
Article CAS PubMed Google Scholar
Meuwissen, T. H. E., Hayes, B. J. & Goddard, M. E. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157, 1819–1829 (2001).
Article CAS PubMed PubMed Central Google Scholar
Wong, C. K. & Bernardo, R. Genomewide selection in oil palm: increasing selection gain per unit time and cost with small populations. Theor. Appl. Genet. 116, 815–824 (2008).
Article CAS PubMed Google Scholar
Heffner, E. L., Sorrells, M. E. & Jannink, J.-L. Genomic selection for crop improvement. Crop Sci. 49, 1–12 (2009).
Article CAS Google Scholar
Grattapaglia, D. & Resende, M. D. V. Genomic selection in forest tree breeding. Tree Genet. Genomes 7, 241–255 (2011).
Article Google Scholar
Alemu, A. et al. Genomic selection in plant breeding: key factors shaping two decades of progress. Mol. Plant 17, 552–578 (2024).
Article CAS PubMed Google Scholar
Grattapaglia, D. Status and perspectives of genomic selection in forest tree breeding. In Genomic selection for crop improvement: new molecular breeding strategies for crop improvement (eds. Varshney, R. K., Roorkiwal, M. & Sorrells, M. E.) 199–249 (Springer International Publishing, 2017).
Jiang, D., Wu, G., Mao, K. & Feng, J. Structure of genetic diversity in marginal populations of black poplar (Populus nigra L.). Biochem. Syst. Ecol. 61, 297–302 (2015).
Article CAS Google Scholar
Fahrenkrog, A. M. et al. Genome-wide association study reveals putative regulators of bioenergy traits in Populus deltoides. New Phytol. 213, 799–811 (2017).
Article CAS PubMed Google Scholar
Sterken, R. et al. Combined linkage and association mapping reveals CYCD5;1 as a quantitative trait gene for endoreduplication in Arabidopsis. Proc. Natl Acad. Sci. USA 109, 4678–4683 (2012).
Article CAS PubMed PubMed Central Google Scholar
Guerra, F. P. et al. Association genetics of chemical wood properties in black poplar (Populus nigra). New Phytol. 197, 162–176 (2013).
Article CAS PubMed Google Scholar
McKown, A. D. et al. Genome-wide association implicates numerous genes underlying ecological trait variation in natural populations of Populus trichocarpa. New Phytol. 203, 535–553 (2014).
Article CAS PubMed Google Scholar
Quan, M. et al. Genome-wide association studies reveal the coordinated regulatory networks underlying photosynthesis and wood formation in Populus. J. Exp. Bot. 72, 5372–5389 (2021).
Article CAS PubMed Google Scholar
Balding, D. J. A tutorial on statistical methods for population association studies. Nat. Rev. Genet. 7, 781–791 (2006).
Article CAS PubMed Google Scholar
Riedelsheimer, C. et al. Genome-wide association mapping of leaf metabolic profiles for dissecting complex traits in maize. Proc. Natl Acad. Sci. USA 109, 8872–8877 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ye, Z.-H. & Zhong, R. Molecular control of wood formation in trees. J. Exp. Bot. 66, 4119–4131 (2015).
Article CAS PubMed Google Scholar
Cheng, Z. et al. Genome-wide search and structural and functional analyses for late embryogenesis-abundant (LEA) gene family in poplar. BMC Plant Biol. 21, 110 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lan, T., Gao, J. & Zeng, Q.-Y. Genome-wide analysis of the LEA (late embryogenesis abundant) protein gene family in Populus trichocarpa. Tree Genet. Genomes 9, 253–264 (2013).
Article Google Scholar
Li, Z. et al. The impact of Elsinoë ampelina infection on key metabolic properties in Vitis vinifera ‘Red Globe’ berries via multiomics approaches. Mol. Plant Microbe Interact. 35, 15–27 (2022).
Article CAS PubMed Google Scholar
Drost, D. R. et al. Genetical genomics of Populus leaf shape variation. BMC Plant Biol. 15, 166 (2015).
Article PubMed PubMed Central Google Scholar
Liu, P. et al. Enhanced genome-wide association reveals the role of YABBY11-NGATHA-LIKE1 in leaf serration development of Populus. Plant Physiol. 17, 1702–1718 (2023).
Article Google Scholar
Sun, Y., Zhou, Q., Zhang, W., Fu, Y. & Huang, H. ASYMMETRIC LEAVES1, an Arabidopsis gene that is involved in the control of cell differentiation in leaves. Planta 214, 694–702 (2002).
Article CAS PubMed Google Scholar
Theodoris, G., Inada, N. & Freeling, M. Conservation and molecular dissection of ROUGH SHEATH2 and ASYMMETRIC LEAVES1 function in leaf development. Proc. Natl Acad. Sci. USA 100, 6837–6842 (2003).
Article CAS PubMed PubMed Central Google Scholar
Grattapaglia, D., Plomion, C., Kirst, M. & Sederoff, R. R. Genomics of growth traits in forest trees. Curr. Opin. Plant Biol. 12, 148–156 (2009).
Article CAS PubMed Google Scholar
Sun, M. et al. Genomic selection of eight fruit traits in pear. Hortic. Plant J. 10, 318–326 (2024).
Article CAS Google Scholar
Ornella, L. et al. Genomic prediction of genetic values for resistance to wheat rusts. Plant Genome 5, 136–148 (2012).
Article CAS Google Scholar
Weber, S. E., Frisch, M., Snowdon, R. J. & Voss-Fels, K. P. Haplotype blocks for genomic prediction: a comparative evaluation in multiple crop datasets. Front. Plant Sci. 14, 1217589 (2023).
Article PubMed PubMed Central Google Scholar
Al Kalaldeh, M. et al. Using imputed whole-genome sequence data to improve the accuracy of genomic prediction for parasite resistance in Australian sheep. Genet. Sel. Evol. 51, 32 (2019).
Article PubMed PubMed Central Google Scholar
Thumma, B. R., Joyce, K. R. & Jacobs, A. Genomic studies with preselected markers reveal dominance effects influencing growth traits in Eucalyptus nitens. G3 Genes Genomes Genet. 12, jkab363 (2022).
Article CAS Google Scholar
Chen, Z.-Q., Klingberg, A., Hallingbäck, H. R. & Wu, H. X. Preselection of QTL markers enhances accuracy of genomic selection in Norway spruce. BMC Genom.24, 147 (2023).
Article CAS Google Scholar
Kim, G. W. et al. Genomic selection with fixed-effect markers improves the prediction accuracy for Capsaicinoid contents in Capsicum annuum. Hortic. Res. 9, uhac204 (2022).
Article CAS PubMed PubMed Central Google Scholar
Merrick, L. F., Burke, A. B., Chen, X. & Carter, A. H. Breeding with major and minor genes: genomic selection for quantitative disease resistance. Front. Plant Sci. 12, 713667 (2021).
Article PubMed PubMed Central Google Scholar
Sarinelli, J. M. et al. Training population selection and use of fixed effects to optimize genomic predictions in a historical USA winter wheat panel. Theor. Appl. Genet. 132, 1247–1261 (2019).
Article CAS PubMed PubMed Central Google Scholar
Su, J. et al. Multi-locus genome-wide association study and genomic prediction for flowering time in chrysanthemum. Planta 259, 13 (2024).
Article CAS Google Scholar
Bernardo, R. Genomewide selection when major genes are known. Crop Sci. 54, 68–75 (2014).
Article Google Scholar
Shahinnia, F. et al. Genome-wide association study and genomic prediction of resistance to stripe rust in current central and northern European winter wheat germplasm. Theor. Appl. Genet. 135, 3583–3595 (2022).
Article CAS PubMed PubMed Central Google Scholar
Spindel, J. E. et al. Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116, 395–408 (2016).
Article CAS PubMed PubMed Central Google Scholar
Xu, H. et al. Genome-wide association study and genomic selection of spike-related traits in bread wheat. Theor. Appl. Genet. 137, 131 (2024).
Article CAS PubMed Google Scholar
Veerkamp, R. F., Bouwman, A. C., Schrooten, C. & Calus, M. P. L. Genomic prediction using preselected DNA variants from a GWAS with whole-genome sequence data in Holstein–Friesian cattle. Genet. Sel. Evol. 48, 95 (2016).
Article PubMed PubMed Central Google Scholar
Brøndum, R. F. et al. Quantitative trait loci markers derived from whole genome sequence data increases the reliability of genomic prediction. J. Dairy Sci. 98, 4107–4116 (2015).
Article PubMed Google Scholar
de Roos, A. P. W., Hayes, B. J., Spelman, R. J. & Goddard, M. E. Linkage disequilibrium and persistence of phase in Holstein–Friesian, Jersey and Angus Cattle. Genetics 179, 1503–1512 (2008).
Article PubMed PubMed Central Google Scholar
Wientjes, Y. C. J., Veerkamp, R. F. & Calus, M. P. L. Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations. BMC Genet. 16, 87 (2015).
Article PubMed PubMed Central Google Scholar
Raymond, B. et al. Genomic prediction for numerically small breeds, using models with pre-selected and differentially weighted markers. Genet. Sel. Evol. 50, 49 (2018).
Article PubMed PubMed Central Google Scholar
Van Soest, P. J. & McQueen, R. W. The chemistry and estimation of fibre. Proc. Nutr. Soc. 32, 123–130 (1973).
Article PubMed Google Scholar
Ferreira, T. & Rasband, W. ImageJ user guide. Image, 1, 155–161 (Fiji, 2012).
Rodríguez-Álvarez, M. X., Boer, M. P., van Eeuwijk, F. A. & Eilers, P. H. C. Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat. Stat. 23, 52–71 (2018).
Article Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Browning, B. L., Tian, X., Zhou, Y. & Browning, S. R. Fast two-stage phasing of large-scale sequence data. Am. J. Hum. Genet. 108, 1880–1890 (2021).
Article CAS PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Lee, T.-H., Guo, H., Wang, X., Kim, C. & Paterson, A. H. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. BMC Genomics 15, 162 (2014).
Article PubMed PubMed Central Google Scholar
Zhang, C., Dong, S.-S., Xu, J.-Y., He, W.-M. & Yang, T.-L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019).
Article CAS PubMed Google Scholar
Wang, J. & Zhang, Z. GAPIT version 3: boosting power and accuracy for genomic association and prediction. Genomics Proteom. Bioinforma. 19, 629–640 (2021).
Article Google Scholar
Huang, M., Liu, X., Zhou, Y., Summers, R. M. & Zhang, Z. BLINK: a package for the next level of genome-wide association studies with both individuals and markers in the millions. GigaScience 8, 1–12 (2018).
Google Scholar
Tao, Y. et al. Large-scale GWAS in sorghum reveals common genetic control of grain size among cereals. Plant Biotechnol. J. 18, 1093–1105 (2020).
Article CAS PubMed Google Scholar
Zhang, L. et al. GWAS of grain color and tannin content in Chinese sorghum based on whole-genome sequencing. Theor. Appl. Genet. 136, 77 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, R., Jia, G. & Diao, X. geneHapR: an R package for gene haplotypic statistics and visualization. BMC Bioinforma. 24, 199 (2023).
Article CAS Google Scholar
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS J. Integr. Biol. 16, 284–287 (2012).
Article CAS Google Scholar
Dong, S.-S. et al. LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files. Brief. Bioinform. 22, bbaa227 (2021).
Article PubMed Google Scholar
Habier, D., Fernando, R. L., Kizilkaya, K. & Garrick, D. J. Extension of the bayesian alphabet for genomic selection. BMC Bioinforma. 12, 186 (2011).
Article Google Scholar
Pérez, P. & de los Campos, G. Genome-wide regression and prediction with the BGLR statistical package. Genetics 198, 483–495 (2014).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The work was supported by the National Key Research and Development Plan of China (2021YFD2200202), the Key Research and Development Project of Jiangsu Province (BE2021366), and the National Natural Science Foundation of China (32471823).

Author information

These authors contributed equally: Chenchen Guo, Tongming Yin.

Authors and Affiliations

State Key Laboratory of Tree Genetics and Breeding, Nanjing Forestry University, Nanjing, China
Chenchen Guo, Tongming Yin, Huaitong Wu, Xiaogang Dai, Yingnan Chen & Suyun Wei
Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing, China
Chenchen Guo, Tongming Yin, Huaitong Wu, Xiaogang Dai, Yingnan Chen & Suyun Wei
Key Laboratory of Tree Genetics and Biotechnology of Educational Department of China, Nanjing Forestry University, Nanjing, China
Chenchen Guo, Tongming Yin, Huaitong Wu, Xiaogang Dai, Yingnan Chen & Suyun Wei
Key Laboratory of Tree Genetics and Silvicultural Sciences of Jiangsu Province, Nanjing Forestry University, Nanjing, China
Chenchen Guo, Tongming Yin, Huaitong Wu, Xiaogang Dai, Yingnan Chen & Suyun Wei
College of Information Science and Technology & Artificial Intelligence, Nanjing Forestry University, Nanjing, China
Suyun Wei

Authors

Chenchen Guo
View author publications
Search author on:PubMed Google Scholar
Tongming Yin
View author publications
Search author on:PubMed Google Scholar
Huaitong Wu
View author publications
Search author on:PubMed Google Scholar
Xiaogang Dai
View author publications
Search author on:PubMed Google Scholar
Yingnan Chen
View author publications
Search author on:PubMed Google Scholar
Suyun Wei
View author publications
Search author on:PubMed Google Scholar

Contributions

T.Y. and S.W. conceived and designed the research. C.G. and S.W. analyzed the data and drafted the manuscript. C.G., H.W., X.D. and Y.C. collected the phenotypic data and performed the experiments. S.W., T.Y. and C.G. revised and finalized the manuscript. All authors revised the manuscript critically for intellectual content and read and approved the final version.

Corresponding author

Correspondence to Suyun Wei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Matteo Dell’Acqua and David Favero.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3

Supplementary Data 4

Supplementary Data 5

Supplementary Data 6

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, C., Yin, T., Wu, H. et al. Genomic selection with GWAS-identified QTL markers enhances prediction accuracy for quantitative traits in poplar (Populus deltoides). Commun Biol 8, 1242 (2025). https://doi.org/10.1038/s42003-025-08700-w

Download citation

Received: 10 January 2025
Accepted: 08 August 2025
Published: 18 August 2025
Version of record: 18 August 2025
DOI: https://doi.org/10.1038/s42003-025-08700-w

This article is cited by

Genome-wide association studies on body-weight and body-size traits among Zhedong white geese based on a modified genotyping-by-sequencing method
- Yunzhou Yang
- Shaojia Zhai
- Daqian He
BMC Genomics (2025)
Genome-wide association study of a Guinea grass (Megathyrsus maximus) diversity panel reveals the genetic basis of agronomic and nutritional traits
- Lina M. López-Contreras
- Kate E. Denning-James
- Monica Carvajal-Yepes
BMC Plant Biology (2025)