Abstract
Mendel1 studied in detail seven pairs of contrasting traits in pea (Pisum sativum), establishing the foundational principles of genetic inheritance. Here we investigate the genetic architecture that underlies these traits and uncover previously undescribed alleles for the four characterized Mendelian genes2,3,4,5,6,7, including a rare revertant of Mendel’s white-flowered a allele. Primarily, we focus on the three remaining uncharacterized traits and find that (1) an approximately 100-kb genomic deletion upstream of the Chlorophyll synthase (ChlG) gene disrupts chlorophyll biosynthesis through the generation of intergenic transcriptional fusion products, conferring the yellow pod phenotype of gp mutants; (2) a MYB gene with an upstream Ogre element insertion and a CLE peptide-encoding gene with an in-frame premature stop codon explain the v and p alleles, which disrupt secondary cell wall thickening and lignification, resulting in the parchmentless, edible-pod phenotype; and (3) a 5-bp exonic deletion in a CIK-like co-receptor kinase gene, in combination with a genetic modifier locus, is associated with the fasciated stem (fa) phenotype. Furthermore, we characterize genes and alleles associated with diverse agronomic traits, such as axil ring anthocyanin pigmentation, seed size and the ‘semi-leafless’ form. This study establishes a foundation for fundamental research, education in biology and genetics, and pea breeding practices.
Similar content being viewed by others
Main
Pea is an Old World crop that was first cultivated about 10,000 years ago in the Fertile Crescent8. About three quarters of the cultivated area is for dry seed, and the remaining quarter is for use as a vegetable, with a total export value of about US $3 billion in export value in 2022 (https://www.fao.org/faostat/en/#data/). Pea also has a minor use as a fodder crop and is often grown in home gardens. The nutritional and environmental advantages of this pulse crop, including the rising popularity of pea protein and its nitrogen-fixing root nodule symbiosis, have been discussed elsewhere9,10.
Pea is predominantly inbreeding, with large flowers and many easily distinguishable characteristics, making it an ideal model for Mendel’s studies of inheritance1,11. The seven variants that Mendel studied in detail were clearly distinguished in the seed catalogues of the time12, representing different agronomic forms, end uses or market types, as they still do today. Mendel’s work on peas was described by Allan Franklin as “The best experiments ever done”13. In addition to its important historical contribution to the development of genetics, pea continues to serve as an excellent plant model system, with approximately 100 pea genes characterized at the molecular level14. However, much remains unknown about the molecular nature of the seven pairs of contrasting traits that Mendel studied, even though the genetic loci were named more than a century ago15. The four cloned genes R (seed shape), Le (plant height), I (seed colour) and A (flower colour) have been characterized for some time2,3,4,5,6,7,16, but the extent and distribution of their natural allelic variation remains poorly understood in a genomic context14,17,18. The gene identities of Mendel’s three other traits, P and V (pod form), Gp (pod colour) and Fa (fasciation), are uncharacterized. Candidates for Gp and P have been proposed on the basis of specific genome-wide association studies (GWASs) and bi-parental mapping studies19,20,21; however, further work was needed to confirm or reject these proposals.
Here we present a sequence-based population genomic map and a trait-based phenotypic variation map, coupled with haplotype–phenotype association analyses across a wide range of traits in one of the world’s major Pisum germplasm collections14. We focus on elucidating the gene identities and genomic context of alleles underlying Mendel’s seven well-known traits. We further demonstrate how this approach can be expanded to uncover the molecular basis of a wide range of agronomic and horticultural characters with breeding potential.
Variation map of a Pisum collection
It is widely accepted that Pisum has two wild species Pisum fulvum and Pisum elatius and two independently domesticated taxa Pisum abyssinicum and P. sativum (Supplementary Table 1). We selected a core diversity panel from the widely used John Innes Pisum Germplasm Collection14. This panel includes 500 accessions, selected using Corehunter 3 on the basis of prior genotyping data22,23. The selection was further supplemented with 130 lines previously chosen for other diversity studies (https://pcgin.org/), parents of mutant and mapping populations, and 67 lines comprising selected P. abyssinicum accessions and all accessions designated P. elatius, P. fulvum or Pisum humile (Fig. 1a). We conducted next-generation short-read whole-genome resequencing for these 697 Pisum accessions, generating approximately 80 Gb of clean reads with an average coverage of around 20× for each accession (Supplementary Table 2). We then built a genomic variation map encompassing 154.8 million high-quality single-nucleotide polymorphisms (SNPs) with respect to the ZW6 assembly18, as well as to the inbred pea cultivar Caméor version 1a17 (Supplementary Tables 3 and 4 and Supplementary Fig. 1). Population genomic diversity analysis reveals the pattern of accession relationships and population structure at high resolution23 (Supplementary Tables 5 and 6). We identify eight major Pisum groups (G1–G8) (Fig. 1b,d), with P. fulvum and P. abyssinicum forming distinct branches within G8. The remaining G8 accessions consist mainly of P. elatius, and G7 includes P.sativum together with other sub-types. P. sativum accessions are widely distributed across the groups G1 to G7. These accessions do not have a simple tree-like relationship, but instead form a reticulated network, reflecting the complex history of crossing and introgression between subgroups (Fig. 1e).
a, Taxa types and other classifications indicated by colour on the right, including wild taxa (P. fulvum, P. elatius and other wild taxa listed in Supplementary Tables 1 and 5) and domesticated taxa (P. abyssinicum and P. sativum), further divided into cultivars, landraces and other, which mostly comprises genetic stocks. The number in brackets denotes the number of accessions for each classification. b, Admixture analyses at K = 3 (average of 5 runs), K = 5 (average of 3 runs) and K = 8 (one run that splits K = 5 groups). Accessions strongly assigned to admixture groups are colour-coded, with grey indicating admixture (see Supplementary Table 5). c, Distribution of phenotypes for Mendel’s seven pea traits, with initials labelled as follows: R (round, pale) versus W (wrinkled, black), seed shape; Y (yellow) versus G (green), cotyledon colour; P (pigmented, purple) versus W (white, pale), flower colour; I (inflated, pale) versus C (constricted, black), pod shape; G (green) versus Y (yellow), pod colour; A (axial, pale) versus T (terminal, black), flower position; and T (tall) versus D (dwarf), internode length. The bar length is proportional to internode length. d, Principal component analysis (PCA) of PLINK distance matrix for all accessions, with accessions having Q value > 0.75 colour-coded. e, Splits Tree61 analysis of accessions with Q value > 0.75, with colours matching PCA groups. f, Pisum genomic variation map across all seven chromosomes, including SNPs, insertions and deletions (<50 bp), large-scale structural variations (SVs) and the linkage disequilibrium (LD)-based haplotype map.
Within the diversity panel, we recorded phenotypic variation for each of Mendel’s seven pairs of contrasting traits (Fig. 1c and Supplementary Table 7). The association of this phenotypic diversity with genomic data (Fig. 1f and Supplementary Tables 8–12), coupled with linkage analysis from bi-parental mapping populations and gene functional investigations, elucidated the genetic architecture and allelic variants associated with each of these traits (Fig. 2 and Extended Data Fig. 1). We show that, for each trait, a small number of genetic loci contribute to the trait variation.
a, Images of the contrasting phenotypes of the seven traits. b, Manhattan plots from the GWAS showing the genomic regions with strong peaks associated with phenotypic differences of each trait as scored in this study and plotted against the ZW6 assembly. c, Gene models for R, I, A, Le, P and Fa and associated genomic regions (Gp and V), depicting the wild-type and natural mutant alleles underlying each of the seven traits. Text and illustrations marked in red indicate those identified in this study. Details are provided in the text and Supplementary Information. aa, amino acids.
Novel alleles in four characterized genes
We confirm the previous interpretations for the four characterized Mendelian genes (R, I, Le and A)2,3,4,5,6,7 and identify novel alleles (Fig. 2 and Supplementary Tables 13–16). For the R gene, explaining the round versus wrinkled seed phenotype, we detect a single strong, broad signal at the expected genomic position of PsSBEI2 and present the full-length sequence of the Ips-r element inserted in the last exon (Supplementary Fig. 2). R versus r is also the primary determinant of seed protein content variation (Supplementary Fig. 3), suggesting that altered seed starch quality and quantity indirectly affect the ratio of storage protein classes and overall protein content24. For I versus i4,5,6, which corresponds to yellow versus green cotyledons, we characterize two key alleles: i-1 (the insertion of a 5,696-nt TAR element, a Ty1-Copia long terminal repeat (LTR) retrotransposon) and i-2 (a novel 408-bp deletion in the promoter) of the Mg-dechelatase gene (Supplementary Figs. 4 and 5). The phenotypic variability associated with these alleles, driven by transcriptional differences and confounding genetic and environmental factors is described in Supplementary Figs. 6 and 7. For the A gene, which encodes a bHLH transcription factor that is responsible for the presence or absence of anthocyanin pigmentation (A versus a)7, we present several novel alleles (Supplementary Fig. 8), including a rare intragenic suppressor mutation that restores the most abundant a allele to a functional wild-type A allele (Supplementary Fig. 9). Finally, for the Le gene, which encodes GA 3-oxidase1 (also known as GA 3β-hydroxylase), we observe five haplotypes (Supplementary Fig. 10), but the reduced-height le variants were exclusively found in haplotype 1, which carries the known G > A substitution at chromosome (chr.) 5:639,901,919.
Three previously uncharacterized traits
We investigated the gene identities and allelic variants underlying three of Mendel’s seven traits that remained poorly characterized16: ‘the difference in the colour of the unripe pod’ (Gp); ‘the difference in the shape of the ripe pod’ (conditioned by either of two loci, P or V); and ‘the difference in the position of the flowers’ (thought to be conditioned by either of two loci, Fa or Fas).
Pod colour
Although Gp is usually discussed in relation to pod colour, Mendel noted that this is just one feature of the gp mutant1. In mature flowering and fruiting plants, yellow tissues are seen in the petiole, rachis, tendrils and leaflet midribs of young leaves, as well as in the pedicel, peduncle and sepals (Fig. 3a and Supplementary Fig. 11). There are also substantial differences in chloroplast development25, as well as in the physiological and biochemical properties of pod and leaf tissues between green (Gp/Gp) and yellow-podded (gp/gp) varieties (Fig. 3b,c). Here we found that even the green leaves of gp lines show disturbed development of thylakoid membranes (Supplementary Fig. 12), which correlates with a productivity difference between Gp and gp isolines (Supplementary Fig. 13).
a, General view of near-isogenic plants (BC6 S1 generation from the cross JI0015 gp/gp x Caméor Gp/Gp) developed in this study. Pot diameter is 9 cm. b, Transmission electron microscopic (TEM) sections of pod mesocarp cells. Scale bars, 1 μm. c, TEM sections of leaflet spongy mesophyll cells. Note the poorly developed thylakoid membranes (arrows) in gp compared to Gp. Scale bars, 0.5 μm. d, An approximately 100-kb genomic deletion adjacent to ChlG is illustrated for gp compared to the reference genome ZW6 (Gp). The deletion event in gp lines is illustrated on the Gp reference genome by the dashed box. The approximately 100-kb deletion event was called according to genome assembly comparisons between JI0015 and JI2822 (ref. 62). Far right, expression of transcript types T0 and T1–T5 (Gp/Gp wild type; top right) and t0 and t1–t6 (gp/gp mutant; bottom right); numbers at ends of (or in) bars indicate the expression (in transcripts per million (TPM)). More details are provided in Supplementary Figs. 15–19 and Supplementary Tables 23–27. e, Crossing scheme for a complementation test between Caméor M4 TILLING line 411.1 carrying one lethal allele of ChlG and gp (JI0015), with the two types of expected F1 genotype. ChlGWT and ChlGW121* represent the wild-type and TILLING alleles of ChlG. WT represents the presence of the wild-type (Caméor) sequence between ChlG and the TIR-NBS-LRR gene, and Δgp represents the approximately 100-kb deletion, which co-segregates with gp. The question being addressed is whether ChlGW121*-WT complements gp (ChlGWT-Δgp). f, F1 pods segregating for green versus yellow. The number after the underscore is the plant number; the parental lines (TILL_6 het and JI0015) and wild-type Caméor are also shown. g, Codominant PCR marker test confirming that all plants presumed to be F1 are Gp/gp heterozygotes (top) and a dCAPS marker PCR test confirming that only the yellow-podded F1 plants inherited the ChlGW121* TILLING allele (bottom). M, DNA size marker (0.5 – 3 kb; 100 bp ladder lane from 1 kb to 0.5 kb and below).
All yellow-podded lines in the John Innes Pisum germplasm collection were found to be allelic to gp, confirming that there is only one known yellow pod locus and here we show that there is only one gp allele. Genetic mapping and association genomics analysis found that all yellow-podded lines carried a deletion of around 100 kb within the GWAS interval that co-segregated with gp (Supplementary Figs. 14 and 15 and Supplementary Tables 17–22). With respect to the ZW6 assembly, this large genomic deletion removes one Ogre LTR retrotransposon and an open reading frame encoding a partial NBS-LRR gene, as well as part of exon 5 and the whole of exons 6 and 7 from a gene encoding a TIR-NBS-LRR protein (NLR (also known as Psat03G0414100)) (Fig. 3d).
This deletion is adjacent to the chlorophyll synthase gene (ChlG (also known as Psat03G0413700)), but the ChlG gene is structurally intact in all gp lines, with its encoded amino acid sequence being identical to the wild type. Mapping RNA-sequencing (RNA-seq) reads of the gp lines JI0015 and JI2366 to their matched genome assemblies predicted novel transcripts and isoform variants, including alternative splicing within the NLR–ChlG region (Fig. 3d and Supplementary Fig. 16). Transcriptional fusions, generated by intergenic splicing between the truncated NLR and ChlG, breakpoint read-through of the truncated NLR transcript, and intron read-through of ChlG transcripts were confirmed by PCR with reverse transcription and cDNA sequencing (Supplementary Fig. 17 and Supplementary Table 23–25). It can be predicted that the encoded fusion or truncated ChlG proteins would not participate in chlorophyll biosynthesis, nor would they insert into the thylakoid membrane owing to either internalization or the lack of the transit peptide (Supplementary Fig. 18). RNA-seq and quantitative PCR (qPCR) data showed that in gp pods, functional ChlG transcript abundance was reduced to about 6% of that in the wild type (Fig. 3d). In gp mutants, the intact t0 transcript was more abundant in leaves compared with pods but the fused NLR–ChlG t1 and t2 transcript levels were similar in pods and leaves (Supplementary Fig. 19 and Supplementary Tables 26 and 27). We propose that transcriptional interference and aberrant transcripts disrupt ChlG function and chlorophyll synthesis, causing the yellowness of otherwise green tissues in the gp mutant.
To test whether Gp corresponds to ChlG, we obtained a TILLING (targeting-induced local lesions in genomes) mutant26 with a premature stop codon (W121*) in ChlG (Fig. 3e). This mutant could not be recovered as a homozygote, although the mutant allele was transmitted through both pollen and egg cells. We conclude that the homozygous mutation is embryo lethal, but not lethal in either gametophyte. We reasoned that the phenotype of a ∆gp/Gp, ChlGwt/ChlGW121* double heterozygote would be informative; if Gp did not correspond to ChlG, then the plant should be viable and green-podded, refuting our hypothesis. Conversely, if Gp did correspond to a functional ChlG, then it should be yellow-podded. Of the 16 F1 progeny that we derived from the cross between gp/gp and the TILLING mutant heterozygotes, 10 had yellow pods, and all yellow-podded F1 plants carried the ChlGW121* null allele (Fig. 3f,g). This result supported our hypothesis and confirmed that the gp mutant does not provide a fully functional ChlG.
The evidence presented above demonstrates that a ChlG deficiency mediates the mutant phenotype and establishes that ChlG is allelic to Gp. The wild-type alleles of other genes in the vicinity cannot rescue the gp mutant phenotype. While the detailed molecular and regulatory mechanism underlying this defect in chlorophyll synthesis remains to be established, our current understanding predicts that ablation of the NLR gene in a gp mutant, thereby removing the fused NLR–ChlG transcripts, would restore the wild-type green pod colour.
Pod shape
In 1537, Ruel described the difference in the shape of the ripe pod as “Valvulae etia recetes eorum quae nullo pedameto fulciuntur, ante que durescat, edendo sunt”27; roughly translated, this means “Those where the valves provide little support are to be eaten before they harden”, indicating that as today, these were vegetable peas. The lack of a sclerenchyma layer (pod parchment) is conditioned by the recessive allele at either (or both) of the genes: P and V16. It is uncertain which of these genes Mendel was discussing; he could have worked with either, or perhaps both. Our GWAS analysis identified regions that are statistically correlated with this phenotype, two of which correspond significantly to the expected positions of P and V (Fig. 2b), suggesting that both p and v alleles are relatively common. Additional signals detected may correspond to genes affecting pod wall thickness (N) or structure (Sin)28 (Extended Data Fig. 2a).
Within our 8.3 Mb GWAS peak on chromosome 1, the gene Psat01G0420500 had the greatest significance, which is consistent with a 0.92 Mb interval defined in the JI2822 × JI0816 F2 mapping population (Extended Data Fig. 2b–d). Psat01G0420500 encodes a dodeca-CLE peptide that is identical to the tracheary element differentiation inhibitory factor (TDIF) of Arabidopsis CLE4121,29 (designated here as PsCLE41; Supplementary Fig. 20). One allele within this single-exon gene, which carries an in-frame premature stop codon (R79*) upstream of the TDIF motif (Extended Data Fig. 2e–g), fully explains the p phenotype. CLE41 or CLE44 peptides are known to repress the formation of xylem30 in Arabidopsis thaliana and specify positional information that determines the rate and orientation of cell divisions in vascular tissue in conjunction with the receptor kinase PXY31. TDIF is proposed to function as a non-cell autonomous signalling peptide controlling cell fate32 and lignification33. This suggests a model for P whereby this TDIF peptide interacts with a PXY-like protein to specify pea pod sclerenchyma development via the established TDIF–PXY–WOX signalling pathway34,35.
The genomic interval corresponding to V, as identified by GWAS, spans a broad region (chromosome 6, 610–650 Mb) within which the most significant 2-Mb segment (chromosome 6, 629–631 Mb) overlaps with a 1.25-Mb interval (chromosome 6, 629.23–630.48 Mb) defined by an F2 mapping population36, narrowing the V candidate region (Extended Data Fig. 3a–e). A detailed haplotype–phenotype association study within this interval excluded several previously proposed gene candidates (Supplementary Tables 28 and 29 and Supplementary Fig. 21). We found that accessions with parchmentless pods, with or without the R79* mutation (ppvv or PPvv), are clustered into a single haplotype (Extended Data Fig. 3g) for Psat05G0804500, a homologue of the Arabidopsis MYB26 gene (referred to here as PsMYB26, which has also been designated as PsMYB73 (ref. 37; Supplementary Fig. 22). AtMYB26, which encodes a master transcription factor that directly activates its downstream NAC-domain transcription factors in A. thaliana38, has been reported to promote secondary wall thickening and lignification of the endothelium, and MYB26 is required for sclerenchyma formation in legume pods39. PsMYB26 exhibits tissue-specific expression in the pod endocarp and is the most highly differentially expressed gene between wild-type (P/P V/V) pods and mutant (P/P v/v) pods (Extended Data Fig. 3f,h and Supplementary Tables 30 and 31). RNA-seq and qPCR with reverse transcription (RT–qPCR) analyses of both P/P v/v and p/p v/v mutant lines both show reduced expression of PsMYB26 and the PsNAC (Supplementary Information) compared with wild-type P/P V/V lines (Extended Data Fig. 4a–f and Supplementary Fig. 23). Furthermore, a virus-induced gene-silencing (VIGS) experiment followed by RT–qPCR analyses demonstrated the downregulation of PsMYB26 and PsNAC genes in the silenced lines (Extended Data Fig. 4g). PsMYB26 is downregulated in p/p V/V mutant lines, suggesting that PsMYB26 and PsNAC have a role in controlling the parchmentless phenotype and supporting an epistatic relationship between PsCLE41 and PsMYB26 (Extended Data Fig. 4f–h).
A 23 kb Ogre-type LTR retrotransposon insertion, located upstream of PsMYB26 and with the same polarity, is present in all P/P v/v genotypes in our diversity panel (Supplementary Fig. 24 and Supplementary Tables 32 and 33). Although PsMYB26 and the associated Ogre insertion are strong candidates for V, further work is needed to fully elucidate the detailed molecular mechanisms underlying the v allele.
Fasciation
Mendel used the name Pisum umbellatum when describing “the position of the flowers” on the stem of pea, a term previously used by Gerard40 to describe the fasciated form with an umbellate inflorescence. Fasciation in pea can vary in its severity, from stem bifurcation to an extreme clustering of flowers at the apex. There are several pea genes that, when mutant, confer a fasciated phenotype; of these, Fa (chromosome 4, linkage group IV) is considered to be the gene Mendel studied41,42.
GWAS analysis identified a broad signal (chromosome 4, 0–40 Mb) (Fig. 2b) that underwent further refinement through investigation of F2 populations. Bulked segregant analysis (BSA) narrowed this region down to a 15-Mb interval (Supplementary Fig. 25) and fine mapping led to the delineation of a 1.33 Mb candidate interval (chromosome 4, 18.18–19.51 Mb, ZW6) (Extended Data Fig 5a–e and Supplementary Tables 18–20, 34 and 35). Within this 1.33-Mb interval, we found that all fasciated accessions were clustered together within haplotype 5 (Extended Data Fig. 5f); however, accession JI1713, also in haplotype 5, usually is not fasciated (see below). Analysis of each gene within this interval showed that only one gene, Psat04G0031700, co-segregated with fasciation. All fasciated accessions are clustered into Hap 3 of this gene, which is characterized by a 5-bp deletion in exon 2, creating a frameshift and premature stop codon, rendering the protein non-functional and explaining fasciation in fa lines (Extended Data Fig. 5g,h and Supplementary Table 36). This gene encodes a cell membrane-localized senescence-associated receptor-like kinase, a class of CLAVATA3 INSENSITIVE RECEPTOR KINASES (CIK) signalling (co-)receptor kinases, known for their role in maintaining the structure of the shoot apical meristem43 (designated as PsCIK2/3; Supplementary Fig. 26). Field phenotyping and microscopic observations of fasciated versus wild-type plants showed that the bunched apical flowers of the mutant are borne on a wider stem with additional vascular strands, derived from a broadened apical meristem (Extended Data Fig. 6a–e). Comparative transcriptome analysis in Caméor revealed high expression of PsCIK2/3, CLV3 and PsCLV2 in the stem (Extended Data Fig. 6f and Supplementary Tables 30 and 31). PsCLV1, also expressed in stems, showed its highest expression in root, whereas PsWUS is uniquely expressed in apical bud. RT–qPCR analysis showed that PsCIK2/3 expression was significantly downregulated in both the apical bud and stem of the mutant line (JI0814, fa/fa) compared with a wild-type line (Caméor), whereas PsCLV3 was significantly ‘upregulated’ in the apical bud of the mutant line (Extended Data Fig. 6f–h). The ‘upregulation’ likely reflects the enlarged size of the fasciated apical meristem, although further investigation in pea is needed to clarify this point. Interactions between PsCIK2/3 and both PsCLV1 and PsCLV2 (Extended Data Fig. 6i,j) were revealed by yeast two-hybrid assays. We hypothesize that PsCIK2/3 functions within the conserved meristem homeostasis CLV3-WUS genetic regulatory pathway (Supplementary Fig. 27), regulating shoot apex development and meristem structure maintenance44.
A second minor GWAS signal was detected on chromosome 6, a region that was not previously associated with fasciation (Fig. 2). This genomic region was also identified in our bi-parental mapping populations (Extended Data Fig. 7a). In the JI2822 (Fa) × JI0816 (fa) F2 population, we observed that out of 397 individuals scored, 32 exhibited a wild-type phenotype but carried the recessive allele at fa (Supplementary Tables 18–20), which is consistent with the GWAS and BSA studies. This suggests a model whereby the recessive allele of a gene in this region of chr6LGII masks the fasciated phenotype. Accordingly, we designated this enigmatic second locus as modifier of fa (mfa). In this model, individuals that are recessive for both loci—the fa/fa mfa/mfa genotype—have a wild-type appearance (Extended Data Fig. 7b,c). This may also explain why some accessions, such as JI1713, that carry the 5 bp deletion in Psat04G0031700 (PsCIK2/3) usually are not fasciated, incidentally accounting for the secondary GWAS peak. Previous studies have highlighted complexity in the segregation of fasciation, with reports of both reversals of dominance and 2-factor segregation ratios (15:1), rather than the expected 1-factor segregation ratio (3:1), in F2 populations for some crosses45. These unusual features may, in part, be explained by the previously unrecognized gene Mfa. The nature of Mfa remains to be determined, but it resides within the interval ZW6 chr. 6:244,689,457-253,701,016 identified in this study.
In the F3 generation, the Fa–Mfa genetic model successfully explains the observed phenotypes on the basis of the genetic marker data (Extended Data Fig. 7). The fa/fa mfa/mfa genotypes can appear wild type, on the basis of the position of lateral inflorescence, but may also exhibit some features of fasciation, demonstrating an incomplete penetrance or variable expressivity of mfa, which requires further investigation. We propose that mfa/mfa homozygotes either delay or prevent the formation of structures that would cause a fa/fa plant to be scored as a fasciated phenotype (Supplementary Notes).
Complex and quantitative traits
It has been argued that Mendel’s motivation in studying inheritance was related to an applied plant breeding programme46. To connect further the traits and genes discovered in pea with their agronomic interests and breeding values, we analysed 72 additional agriculturally relevant traits, including seed, pod, flower, leaf, root and plant architecture characteristics measured within our Pisum diversity panel (Extended Data Fig. 8a–c and Supplementary Table 37). A comprehensive GWAS established hundreds of significant marker–trait associations (Supplementary Table 38) including 14 previously cloned genes (Fig. 4a and Supplementary Tables 39 and 40) and determined the physical locations of 22 loci that were previously known from classical genetics to within an average genomic interval of 12 Mb. In addition to the four newly characterized genes associated with three of Mendel’s pea traits, our study uncovered dozens of potentially important new loci, many with breeding implications. For example, the SDN locus (seed number per pod) and SDY locus (seed total weight per plant) related to yield components (Extended Data Fig. 8g,i and Supplementary Table 39). The organ size locus (Os1), which controls pod width and seed weight, is validated below. These results demonstrate the high quality of our dataset, and the robustness of the association genomics analyses, laying a solid foundation for future functional elucidation of pea traits and their application in breeding programs.
a, Summary of the most significant trait–marker associations underlying a variety of agronomic traits presented as a combined Manhattan plot. Gene symbols shown in a circle correspond to Mendel’s loci. b, Manhattan plot of GWAS data for seed protein content, showing a peak overlapped with the R gene locus. c, Manhattan plot of GWAS data for the presence or absence of axial ring pigmentation, on a subset of phenotypic data excluding accessions carrying white flowers (a/a). These data were collected at Harbin (northern China, 2022). A peak at the expected genomic position of D is significantly associated with the accumulation of axillary anthocyanin, and the peak at chromosome 6 is the location of A. d, Genomic interval of D locus on chromosome 2 defined by recombinant inbred lines (RIL) mapping and GWAS analyses, further defined by bioinformatic analysis of FN mutants as a MYB gene cluster7,47,48, with the genes PsMYB104 and PsMYB106 both deleted in the d mutant line FN1218/6. The region outlined in red line indicates the approximate position of the deletion detected in FN1218/6 from mapping of sequence reads. e, Manhattan plot of GWAS data for Af/af (semi-leafless phenotype). Scale bar, 5 cm. f, Manhattan plot of GWAS data for hundred seed weight (HSW) and pod width (PW). The HSW and pod width genomic intervals span the same 8 Mb genomic region, named Organ Size 1 (PsOs1). Scale bar, 2 cm. g, Narrowed genomic interval of PsOs1 on chromosome 2 defined by two F2 mapping populations and BSA analysis (Methods) as a 1.01-Mb region encompassing 11 protein-coding genes, of which Psat02G0011300 (marked in yellow) is the most highly expressed gene. Photographs in c,e,f show the corresponding contrasting phenotypes.
Axil ring pigmentation
In his 1866 paper, Mendel noted the pleiotropic effects of the seed coat and flower colour trait (A versus a) and specifically referred to the presence or absence of axil ring pigmentation as one of these effects. The A gene regulates the presence or absence of anthocyanin pigmentation throughout the plant and a is epistatic to D, which regulates the pattern of axil ring pigmentation45. Axil pigmentation patterns in pea (Supplementary Fig. 29) are reminiscent of leaf marking in Trifolium47 and Medicago48, which are controlled by similar MYB transcription factors.
GWAS analysis revealed two strong signals associated with axil ring pigmentation (in coloured flower lines) (Fig. 4c). One of these corresponds to A (chromosome 6), whereas the other is at the expected position of D (chromosome 2), where there is a cluster of MYB genes49 (Supplementary Fig. 30 and Supplementary Table 41). The potential role of one of these MYB genes was investigated further by VIGS, which showed that the MYB-encoding gene Psat02G0138300 (also known as PsMYB16) affects the accumulation of the axil ring anthocyanin pigmentation (Supplementary Fig. 31). Furthermore, deletion of another two MYB genes at the same locus, PsMYB104 and PsMYB106 (ref. 37, in the induced Fast Neutron mutant line FN1218/6 resulted in the complete absence of axil ring pigmentation (Supplementary Figs. 32–34). We show that the FN1218/6 deletion is allelic to the d allele in JI0073 and JI2202 (P. abyssinicum, a taxon that lacks axil ring pigmentation) (Fig. 4d), implicating these genes as corresponding to D.
The results presented here reveal the genetic complexity of axil ring pigmentation regulated by D. There are multiple alleles of D within the MYB gene cluster, and many spontaneous conversions from one allelic form to another have been reported45, suggesting that it is the combination of alleles at several of these MYB genes that determines the presence, absence or pattern of this pigmentation. Both a and a2 are epistatic to D, and we can postulate that the MYB genes involved in the D and d phenotypes are part of a MYB (D)–bHLH (A)–WD40 (A2) complex7,50.
Organ size
Mendel examined the segregation of traits that have clear alternative states but noted that seed size (among other traits) differed between his parental lines, although he considered that this quantitative difference was not suitable for his analyses. Seed size in pea defines some market classes, such as the ‘marrowfat’ types, which have large, irregular-shaped seeds and a high protein content. Seed size has been the subject of quantitative trait locus (QTL) analyses51, and we have investigated this further within our diversity panel.
We found a significant locus on chromosome 2 that influences both pod width and hundred seed weight (HSW) (Fig. 4f and Supplementary Fig. 35), which is in a similar location to a previously described seed size QTL in Medicago and pea52. We designated this locus as PsOs1. Combining fine mapping and differential gene expression analysis, we identified Psat02G0011300, which encodes a SIAMESE-related protein (SIM or SMR), a cyclin-dependent protein kinase inhibitor (CKI), that influences cell division and enlargement during the cell cycle and consequently alters plant cell size53, as a gene candidate for PsOs1 (Fig. 4g, Supplementary Figs. 36–38 and Supplementary Tables 42 and 43). VIGS-based functional validation, coupled with a transgenic overexpression line in Arabidopsis (Supplementary Figs. 39–42) demonstrate the key role of PsOs1 in regulating seed weight and pod width.
Architectural innovation in pea breeding
The adoption of afila types represents the most important innovation in modern pea breeding54. These types include the ‘leafless’ (af/af st/st) and ‘semi-leafless’ (af/af St/St) varieties54. Figure 4b shows a strong GWAS signal for this character at the expected position of Af (afila) at the end of chromosome 2. In addition to the five haplotypes (Hap1–5) corresponding to the wild-type phenotypes (Af), our analysis identified three deletion haplotypes (Hap6, Hap7 and Hap8), which correspond to hap_4, hap_2 and hap_3, respectively54 (Supplementary Figs. 43 and 44), all of which remove the genes PALM1a and PALM1b, associated with the recessive allele (af). These deletion haplotypes are the most abundant and appear in similar relative proportions here and in the previous analysis54.
The presence of multiple af haplotypes confirmed independent origins of the afila alleles. Here, we refined the deletion end points at a nucleotide level resolution for the 30 afila lines showing that these haplotypes are not further differentiated by their deletion end points; the gene content of the deleted segments is thus more precisely defined (Supplementary Tables 44 and 45). This is important because afila alleles delete different combinations of flanking genes, notably PsNaOD1, PsNaOD2 and PsNaOD3, which affect seed yield and seed weight55.
Discussion
Despite the clarity of his 1866 paper, there is some dispute about what Mendel did in his detailed research. It has been argued that Mendel was not primarily interested in inheritance56,57, or that he had a pre-formed theory of inheritance that he sought to demonstrate, even to the extent of fabricating data to conform with his theory58. These views are mutually exclusive, and we reject them both46,59.
We have shown a remarkable diversity of mutational mechanisms in the genes associated with the seven pairs of contrasting traits that Mendel studied. There are several point mutations in a, one affecting the pattern of splicing and two different single nucleotide insertions affecting the reading frame, whereas le corresponds to an amino acid substitution caused by a missense mutation. There is a single nucleotide substitution that generates a premature stop codon in the CLE41 gene at the P locus, and insertion events of class I (i and v) and class II (r) transposons have been observed2,5. We have also uncovered additional novel types of variation, corresponding to DNA deletions that lead to loss of function, such as the remarkable case of gp, with a large DNA deletion upstream of ChlG, a promoter deletion in the i-2 allele, the fa allele, with a small deletion within an exon, and new alleles of a, with one or more deleted exons. An unexpected finding in this study was the existence of an intragenic suppressor allele of A that implies that the a allele was in existence long enough for this unlikely second site mutation to have occurred. This intragenic suppressor mutation corresponds to a shift in the position of an intron, which is rarely identified, even in inter-specific comparisons of many genes60.
The biological processes that these genes represent range from variation in the activity of enzymes in primary metabolism (r, i and gp), hormone interconversion (le), transcription factors for regulation of secondary metabolism (a) and cell wall thickening (v), the regulation of cell fate by a small signalling peptide (p), and a cell membrane (co-)receptor kinase (fa). The two green-versus-yellow phenotypic differences (cotyledon or pod colour) correspond to disruption in either the final step of chlorophyll synthesis (gp) or the first step of chlorophyll degradation (i). This difference between synthesis versus degradation explains which phenotype, green or yellow, corresponds to the dominant versus recessive allele. Although the elucidation of the biochemical and regulatory mechanisms underlying these genes is beyond the scope of this study, the genomic and genetic discoveries presented here are essential for advancing our understanding of Mendel’s pea traits. For example, on the basis of the discovery of the fused aberrant transcripts arising from the NLR–CHLG genomic region, we propose that transcript stability is altered by transcriptional interference during chlorophyll synthesis or through a nonsense-mediated decay pathway, leading to an increased degradation rate of ChlG transcripts.
A longstanding question in relation to Mendel’s pea work was whether the phenotypic variation he described corresponded to rare variants of genes that explain only a minor proportion of the genetic variation for that trait. Our GWAS analyses emphatically show that this is not the case. Indeed, in one case in which genetic heterogeneity was expected (fasciation), the variation that we detected corresponded to a single genetic locus (Fa), albeit with a previously unsuspected modifier locus (Mfa). There are three caveats to this claim. The first is that the parchmentless pod trait is (as has long been known) determined by either p or v, or the combination of these two distinct and independent genetic loci. A second caveat is that for the green versus yellow cotyledon phenotype, there are clearly multiple GWAS peaks, albeit with lower significance than that of I. This probably reflects the influence of the seed maturation process on the penetrance of this phenotype, as was noted by Mendel in his 1866 paper. Finally, we observed an unusual feature of the GWAS peak corresponding to Gp, where there is a broad shoulder corresponding to most of the short arm of this chromosome. The reason for this is unknown.
This raises two general questions regarding GWAS analyses in defining genetic variation underlying traits: whether broad GWAS peaks provide sufficient resolution to identify a manageable number of candidate genes; and how the positions of significant GWAS signals correspond to previously described genetic variants. We have seen that for the seven Mendelian traits (and D), the GWAS peaks are significant, and all correspond well to the expected genetic loci. Furthermore, in our broad survey of many other agronomic traits for genotype–phenotype associations, nearly all the GWAS peaks correspond to the location of previously described genetic loci. This demonstrates that pea is an excellent model system for association genomics studies and GWAS is a suitable first step for trait–gene discovery and functional elucidation. The reliability of GWAS in pea is partly due to the fact that an unusually high proportion of pea genes are single copy17, even though the pea genome is large and gene density is low throughout the chromosomes, maintaining a strong extended linkage disequilibrium. Presumably this is in part because of the strict inbreeding habit of pea.
We have shown how complementary approaches can narrow down these intervals to identify candidate genes. For the genes characterized in this study, GWAS intervals alone were insufficient to delineate small sets of candidate genes. Additional resources and experimental approaches are necessary, such as induced mutants and specific bi-parental mapping populations, comparative transcriptomes, and genetic functional validation. Future work will require innovative approaches and new technologies, such as long-read DNA and RNA sequencing, which could address the limitations of short-read sequencing technologies and are crucial for investigating how large-scale structural variations and transposons contribute to phenotypic changes. Furthermore, a mature genetic transformation system and targeted gene editing in pea are urgently needed. These tools would enable a detailed examination of the biochemical genetics underlying complex mutations, such as the aberrant transcripts produced at the gp locus, the MYB gene clusters at the D locus, and the LTR retrotransposon insertion upstream of the gene encoding the transcription factor MYB26 (V locus).
We leveraged the rich reservoir of genetic diversity from a global Pisum diversity panel and established a high-quality genomic and phenotypic variation map. The large numbers of genotype–phenotype associations that we have found represent the beginning of a new phase of systematic trait dissection at the molecular level in pea. This genotype-to-phenotype strategy can be expanded, with the identified genes and alleles projected into other crops, particularly legumes. This study is essential for pea basic research, education in biology and genetics, and breeding practices.
Methods
Plant materials and methods
Germplasm panel
A total of 697 accessions, maximizing genetic diversity, were selected from the JI Pisum Germplasm Collection for this study (Supplementary Table 1 and https://www.seedstor.ac.uk/). These germplasm accessions were introduced in 2019 to the Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, China, where they are grown and investigated annually.
DNA extraction for whole-genome resequencing
Genomic DNA was extracted from approximately 50 mg leaf tissue of 3-week-old seedlings. Extraction used the oKtopure system (LGC Biosearch Technology) following tissue desiccation with silica for 48 h. A bespoke protocol was used with the following volumes per sample: 250 µl lysis buffer, 170 µl binding buffer, 20 µl sbeadexTM suspension, 300 µl PN1 wash buffer, 300 µl PN2 wash buffer, 300 µl PN2 wash buffer (×3 wash cycles) and using 75 µl final elution buffer. For each accession, a minimum of 6 μg of genomic DNA was used to construct a 150-bp paired-end sequencing library with an insert size of 500 bp, following the manufacturer’s protocols (employing PCR-free methods), which was subsequently sequenced on the DNBSEQ Platform at BGI-Shenzhen resulting in ~80 Gb clean reads with a coverage of ~20× for each accession.
Phenotyping
DNA was extracted from a single plant whose seed was bulked up for progeny phenotyping. The diversity panel was planted in three different sites, Norwich, UK (52.62° N, 1.28° E), Shenzhen (Southern China, 22.61° N, 114.51° E) and Harbin (northern China, 45.86° N, 126.83° E). In China, four rounds of phenotyping were conducted. Specific subsets of accessions and some F2 populations were grown indoors in the greenhouse of Shenzhen Agricultural Field Farm, with 16 h of light/8 h of darkness. Phenotypes collected at the three stations (2020–2023) and a historical JIC phenotype dataset were curated in Seedstor (https://www.seedstor.ac.uk/). In Shenzhen, peas were planted in winter (October) and harvested in March the following year, whereas in Norwich and Harbin they were planted in spring (March to April) and harvested in August to October of the same year.
For the phenotyping of pod colour (green versus yellow-podded lines), a field trial of three 1 m2 microplots of 100 seeds each was sown in spring 2023, in which a 1:1 ratio of BC6 S3 Gp/Gp and gp/gp seeds (selfed seed of S2 homozygotes) were mixed and sown at random in each plot. At the pod filling stage, the Gp plants were tagged and at plot harvest seed was collected from individual plants to determine the yield of Gp and gp homozygotes. Seeds were weighed and counted on a Data Count R25+ machine (https://data-technologies.com/). Pod length and width were measured on 25 randomly selected pods. For the phenotyping of organ size, pod width (PW) and HSW were measured in mature pods of the F2 and F2:3 populations after harvest. In the F2 populations, PW was assessed using 15 representative pods, divided into 3 groups of 5, with the total width of each group measured sequentially. For the F2:3 populations, PW was determined using 5 representative pods, with their total width measured in a similar manner. HSW was calculated by randomly weighing 100 seeds from each accession and repeating the process 3 times to obtain an average weight for each accession. Other more specific phenotypes were collected as described in Supplementary Table 37 and in line with published descriptors (https://www.seedstor.ac.uk/search-phenotypes.php).
Construction of the pea genomic variation map
Read mapping, SNP calling and SNP annotation
The trimmed clean reads of each accession were aligned against the reference genome of pea (P. sativum) cultivar, ZW6 (ref. 18) and Caméor v.1.0 (ref. 17), using BWA-MEM (v.0.7.17) with default parameters63. Unmapped, non-unique and duplicated reads were filtered out using SAMtools64,65 (v.1.9) and Picard (v.2.20.3-SNAPSHOT) before variants were called by a standard pipeline of Genome Analysis Toolkit (GATK65 v.4.1.2) and Sentieon66 (v.202112.01). SNPs were further filtered to remove low-quality variants defined as (1) SNPs with more than two alleles; (2) SNPs with QD < 2.0, FS > 60.0, MQ < 40.0, SOR > 3.0, MQRankSum < −12.5, ReadPosRankSum < −8.0; (3) SNPs with observed heterozygosity (Hobs) exceeding the maximum calculated value (Hobs_max) based on the inbreeding coefficient (F), where F was calculated as 1 − (Hobs/Hexp), with Hexp defined as 2p(1 − p) using the frequency of the non-reference allele, and Hobs_max was determined as 10 × (1 − Fmedian) × Hexp for variants with F > 0 and minor allele frequency (MAF) > 0.05; (4) SNPs with missing rate >20% and MAF < 0.01. SnpEff67 (v.4.3t) was used to annotate the SNPs, and functional significance was then categorized on the basis of their positions with respect to genes (intergenic regions, exons, introns, splicing sites, untranslated regions, upstream and downstream regions) and mutation consequences (missense, start codon gain or loss, stop codon gain or loss and splicing mutations).
Identification of indels, gene PAV and gene CNVs, and SV
Small insertion–deletion mutants (indels; ≤50 bp) were called using GATK (v.4.1.2) and filtered following the criteria: QD < 2.0 || low_QD || FS > 200.0 || high_FS || ReadPosRankSum < −20.0 || low_ReadPosRankSum before they were annotated using SnpEff (v.4.3t). Read depth variation from read mapping analysis was used to identify gene presence and absence variation (PAV) and gene copy number variation (CNV) through normalization and correction in statistical analyses, following five steps: (1), mapped read depth at each gene was counted for each accession; (2), a correction for read depth variation (RDV) was applied, accounting for highly similar genes through all-versus-all coding sequence (CDS) alignment using BLASTN. Recently duplicated genes were collapsed into representative genes to minimize depth bias, which were further normalized by dividing the corrected read depth of the gene by the average sequencing depth of the accession; (3) the distribution of read depth versus GC content was used to correct read depth bias for each gene resulting from differential GC contents; (4), read depth variation was corrected for genomic regions with insertions or deletions in the genome reference; (5), subspecies-unique and shared CNVs were characterized by calculating the number of accessions with different copy numbers for each gene within each subspecies.
Different categories of structural variants (SVs: duplication, inversion, translocation and large-scale deletion or insertion) were detected on the basis of read mapping (read depth and read pair relationships) on PCR-duplicate-marked bam files using Delly (v 0.8.7) with default parameters; a summary of SVs identified is given in Supplementary Table 11.
Linkage disequilibrium analysis and pea haplotype map
A two-step LD pruning process was implemented to generate a high-quality core SNP dataset for the construction of a haplotype map68. Initially, SNPs were pruned on the basis of linkage disequilibrium (LD) using PLINK69, with a window size of 10 kb, a window step of one SNP, and an r2 threshold of 0.8. A second round of LD pruning was conducted with a window size of 50 kb, a window step of one SNP, and the same r2 threshold of 0.8. For population LD-based haplotype analysis, the filtered SNPs were phased using Beagle (v.21Apr21.304)70. Subsequently, haplotype blocks were delineated utilizing PLINK with specific parameters (--blocks no-pheno-req --blocks-max-kb 1000 --geno 0.1 --blocks-min-maf 0.05). To merge adjacent blocks maintaining significant LD, D’ statistic values were calculated between all SNPs of consecutive blocks. If the lower quartile (Q1) exceeded 0.98, the adjacent blocks were merged. After filtering for the inbreeding coefficient, HAPPE71 was employed to identify haplotype clusters (haplogroups) for each block.
Construction of mapping and validation populations
JI2822 × JI0816 F2 population
Lines JI0816 and JI2822 (Supplementary Table 18), both of short stature, are maintained in the John Innes Pisum germplasm collection (https://www.seedstor.ac.uk/). JI0816, also known as WBH 1185, has pink flowers, a fasciated stem and yellow pods lacking pod parchment, corresponding to the mutant alleles b, fa, gp and p, respectively. JI2822, a recombinant inbred line derived from the cross JI0015 × JI0399, is wild type at these four loci. JI0015 and JI0816 share the gp allele, indicating that these two lines had a common parent; therefore segments of the genetic map are devoid of segregating alleles. 1,000 F2 seeds from 9 F1 plants (JI2822 × JI0816) were sown at the JIC field station in spring 2022. DNA preps from 942 plants were prepared from individual leaflets using the Qiagen DNeasy protocol (https://www.qiagen.com). Of these, 405 were genotyped using an Axiom SNP array as described49. The phenotypic and genotypic data are available in Supplementary Tables 18–20, and the sequences corresponding to the Axiom markers are available in supplementary table 3 of Ellis et al.49.
JI0015 × JI0399 and JI2822 × JI2833
Three populations have been used for mapping Gp. The first to be used was the previously described recombinant inbred population JI0015 × JI0399 (Supplementary Table 21), later genotyped by Neogen, using an Infinium array as described previously51. The second was an F2 population derived from a cross between two of these RILs (JI2822 Gp/Gp and JI2833 gp/gp), which was screened using PCR for markers already mapped in JI0015 × JI0399 in order to identify informative individuals (Supplementary Table 22). These, together with selected RILs with informative recombination events were genotyped with Axiom markers as described elsewhere49,65. Gp also segregates in the JI2822 × JI0816 F2 population as described above. The marker data are available in the supplementary file: Gp mapping in JI0015 × JI0399 (Supplementary Tables 21 and 22).
Other F2 mapping populations and BSA
We selected parental lines with contrasting pairs of traits to map genetic loci of interest in F2 populations using mapping by sequencing72 of bulked segregants. For genetic loci controlling uncharacterized Mendel traits: flower position (axial versus terminal), pod colour (yellow versus green), and pod shape (inflated versus constricted), crosses were made between Caméor (axial) × JI0814 (fasciated) and JI1995 (green pod) × JI2366 (yellow pod). F2 populations for the P/V loci (pod shape) were derived from the cross between JI0074 (P/P v/v) as the male parent and JI1995 (P/P V/V) as the female parent, and the cross between JI2822 × JI0816. F2 populations for the D locus (one (Dco) or two (Dw) axial rings of anthocyanin pigmentation) were derived from three crosses, with JI0191 (Dw), JI0794 (Dw) and JI1669 (Dw) as male parents and JI0328 (Dco) as the female parent. F2 populations for the Fn/Fna loci (flower number per node, fpn) were derived from four crosses, with JI0441 (1fpn), JI2410 (3fpn), JI0745 (2fpn) and JI0746 (3fpn) as male parents and JI1995 (2fpn) as the female parent. The marker and BSA analysis of the F2 population is as described36.
Approximately 300 plants from the F2 population of each of these crosses were planted in Shenzhen, China. Wild-type and mutant bulked DNA samples were prepared by mixing equal amounts of DNA from 30 accessions with the dominant and recessive phenotypes, respectively. DNA was isolated from fresh leaves using the CTAB method73. 50× depth genome sequences for each of the parents and the bulked samples were generated. Short reads were aligned against the ZW6 reference genome using BWA-MEM (v.0.7.17) and SNPs were identified using Samtools (v.1.9). The variation dataset was analysed using the G’s value method of the QTLseqr package (v.0.7.5.2).
Genetic mapping of Gp
Green versus yellow pod colour segregates in the recombinant inbred (RIL) population derived from the cross between JI0015 (gp/gp) and JI0399 (Gp/Gp). The JI0015 × JI0399 RIL population comprises 90 recombinant inbred lines, which, together with their parents, were genotyped using an Infinium array (Neogen) that detected 13,204 biallelic SNPs. This enabled us to position 5,209 PsCam markers on a genetic map (JI0015 × JI0399) and place Gp between the markers PsCam005046 and PsCam056084 (and their co-segregating markers). Additional mapping was undertaken, using an Axiom SNP array with 84,691 features49 of selected F2 progeny of a cross between JI2822 (Gp) and JI2833 (gp) together with RILs from JI0015 × JI0399 known to have recombination events at informative locations. JI2822 and JI2833 are both RILs from the JI0015 × JI0399 population. With respect to the ZW6 assembly18, this placed Gp between the Axiom markers AX-183865165 (chr. 2:320968993) and AX-183571028 (chr. 3:325580858) (JI0015 × JI0399). Analysis of an F2 population derived from crosses between JI2822 (Gp) and JI0816 (gp) placed Gp between the Axiom markers AX-183571050 (chr. 3:321020350) and AX-183879077, (chr. 3:324762848; Supplementary Table 18).
We performed different association genomics analysis for pod colours, including the SNP-based GWAS, LD-based haplotype GWAS, kmer-derived IBS-based haplotype GWAS, and the SV-based GWAS (Supplementary Fig. 14), all resulting in consistent and significant single GWAS peaks for pod colour located in the expected position of Gp, as seen in Manhattan plots (Supplementary Fig. 14).
Allelism tests for gp
Crosses were made between pairs of yellow-podded lines in the JIC germplasm collection (Supplementary Table 18). Seed and vegetative phenotypes were used to identify F1 progeny plants, and those accessions allelic, or non-allelic, to gp were identified by their yellow or green pod colour, respectively.
Near-isogenic lines for Gp versus gp
The JI0015 gp allele was introgressed into the Caméor background by sequential back-crossing and F1 progeny testing using a codominant PCR marker assay with one forward (25994_F) and two reverse (25994_15R and 25994_399R) primers (Supplementary Table 18). Gp (596 bp) and gp (688 bp) alleles were distinguished in a 35 cycle, 10s–30s–60s Touchdown PCR reaction that reduces the initial 62 °C annealing temperature to 50 °C in the first 10 cycles.
Marker development and QTL mapping for PsOs1
The organ size-related quantitative trait locus (PsOs1) was fine-mapped using 21 Kompetitive Allele Specific PCR (KASP) markers for SNPs distinguishing accessions JI0074 and JI1995 after whole-genome resequencing in the candidate region. Each KASP marker was designed with two allele-specific forward primers (Supplementary Table 47) and one common reverse primer, on the basis of 200-bp sequences upstream and downstream of target genic SNPs, following the standards of LGC Biosearch Technologies. The genetic linkage map was constructed using JoinMap v.4.0 software. Windows QTL Cartographer v.2.5 software facilitated inclusive composite interval mapping (ICIM) for identifying and analysing QTLs. A logarithm of odds (LOD) score of ≥3.0 was deemed indicative of a QTL.
Genome-wide association study
The multi-location and multi-season phenotypic dataset was used to perform GWASs with the SNP matrix using GEMMA (v.0.98.1)74, and employing the following parameters (gemma-0.98.1-linux-static -miss 0.9 –gk -o kinship.txt and gemma-0.98.1-linux-static -miss 0.9 -lmm -k kinship.txt). The structural variation matrix was used to test for association with phenotypic variation for each of the selected traits using the same parameters as above. The haplotype map was used to test for association with phenotypic variation for each of the selected traits using RTM-GWAS75 with the following parameters (rtm-gwas-gsc –vcf in.vcf –out out.matrix and rtm-gwas-assoc –vcf in.vcf --covar out.matrix.evec --no-gxe). We used GEMMA’s Wald tests and directly visualized the resulting P values as –log10-transformed values in Manhattan plots. In-house R scripts were employed for data plotting. To identify SNPs of interest, we applied a Bonferroni correction to the significance threshold (α = 0.1) based on the total number of tested variants (9,214,461 SNPs). This yielded a threshold of –log10(0.1/9,214,461), and any SNP surpassing this cutoff was considered noteworthy and highlighted in the Manhattan plots.
Orthologues and gene family analysis
Phylogenetic analyses were conducted on key gene families in this study, such as MYB, CLE and CIK/SERK, following a consistent workflow. Relevant A. thaliana orthologous genes containing the required domains were retrieved from TAIR (https://www.arabidopsis.org), and profile hidden Markov models (HMMs) were constructed using HMMER (v.3.1b1) on the basis of multiple sequence alignments generated by MAFFT (v.7.475). These HMMs were then employed to identify putative homologues in the pea (P. sativum) ZW6 genome. Multiple sequence alignments for each family were trimmed with trimAl (v.1.5.rev.0), and the best-fit amino acid substitution models were selected using ModelTest-NG (v.0.1.7). Phylogenetic trees were constructed by IQ-TREE (v.2.1.2) with 1,000 ultrafast bootstrap replicates. For synteny analysis for each gene family, we used OrthoFinder (v.2.5.4) to identify orthologous clusters among pea and related legumes (for example, Vicia sativa, Medicago truncatula, Cicer arietinum, Lotus japonicus, Vigna radiata, Phaseolus vulgaris and Glycine max), and visualized collinearity blocks with JCVI (v.1.2.7).
Gene functional validation experiments
Fast neutron mutants
Several Fast Neutron mutants from a population described by Domoney et al.76, were included in this project. These were: FN1453/1sil-like; FN1091/4 lacking axil ring pigmentation, allelic to d; FN1218/6 lacking axil ring pigmentation, allelic to d; FN2026/7coch2 candidate; FN2073/5 lacking axil ring pigmentation, not allelic to d; and FN2076/5VicA FN deletion line.
Crosses were made between pairs of lines lacking axil ring pigmentation (Supplementary Fig. 32) to test for complementation. Where possible, vegetative phenotypes were used to identify F1 progeny plants, and those accessions allelic, or non-allelic, to d were identified by the absence, or presence of pigmented axil rings, respectively.
Complementation test
A reverse genetics screen for the ChlG gene in pea was carried out in an ethane methane sulfonate-mutagenised targeting induced local lesions in genomes (TILLING) population in background Caméor26. Line 411.1, with a G>A mutation 1,900 bp after the ATG, resulting in a W121* nonsense mutation, was identified. Eight M4 seeds were sown and seedlings were genotyped with a cleaved amplified polymorphic sequence (CAPS) marker (Supplementary Table 47). No seedlings were homozygous mutants (signified by a single undigested 1,125-bp band), 6 were heterozygous (signified by 3 bands of sizes 1,125 bp, 699 bp and 426 bp) and 2 were homozygous wild type (signified by 2 bands of sizes 699 bp and 426 bp). A complementation test was carried out by crossing heterozygous seedlings with a homozygous JI0015 gp mutant (13 crosses with male JI0015 and F1 identified by long internodes, and 3 crosses with female JI0015 and F1 identified by yellow cotyledons). Nine out of 16 F1 progeny plants had yellow pods, indicating non-complementation.
Virus-induced gene silencing
VIGS in peas was performed using a published methodology as described previously77. To target genes of interest, a 200–500 bp fragment from the CDS region of each gene were amplified. The primers for VIGS constructs, including VIGS-PsChlG, VIGS-PsOs1, VIGS-PsMYB26 and PsMYB16, are provided in Supplementary Table 47. SpeI/XbaI and EcoRI were used to linearize the pCAPE2 vector, which was kindly provided by Li et al.78, and corresponding fragments of gene targets were ligated into the vector to construct the vectors for VIGS assays. For VIGS-PsChlG, the negative control vector, pCAPE2-Con, was constructed in the same way by replacing the PsChlG fragment in pCAPE2-PsChlG with a 529-bp insert derived from a cDNA fragment of Bean Yellow Mosaic Virus (GenBank accession AJ622899). The positive control vector, pCAPE2-PDS, targeting the phytoene desaturase gene, was also provided by Li et al.78. These vectors were transferred into Agrobacterium tumefaciens (GV3101) (Shanghai Weidi Biotechnology) and VIGS assays carried out following the protocol described by Constantin et al.79. In brief, Agrobacterium strains carrying these vectors were shaken separately until OD600 = 1.2, followed by the collection and resuspension of the bacteria in injection buffer (NaCl: 10 mM, CaCl2: 10 mM, acetosyringone: 0.1 mM) to a concentration of OD600 = 1.2. After resting for 2–3 h, the solution of pCAPE2-target gene, pCAPE2-PDS (positive control), and pCAPE2-Con (negative control) was mixed with pCAPE1, separately, in equal proportions, and injected into 10-day-old compound leaves of the acceptant lines (Yunnan2070 or JI1995). Specifically, pCAPE1 and pCAPE2 are plasmid vectors used to induce gene silencing in plants such as M. truncatula and P. sativum. After 24 h of darkness, they were transferred to long day conditions. New leaves of positive control plants bleached in about 10 days, indicating successful silencing of PDS. VIGS was employed with the same procedure for PsMYB16 gene within the D locus and VIGS-MYB26 for the V candidate gene. For PsOs1, which is described in detail below, all the gene-specific primers used for VIGS constructs are listed in Supplementary Table 47.
Transformation, gene overexpression and silencing of PsOs1
The PsOs1 coding sequence of JI0074 was amplified (primers listed in Supplementary Table 47) and integrated into the pCAMBIA1305 vector, resulting in the pCAMBIA1305-PsOs1JI0074 construct. The plasmid was then introduced into A. tumefaciens GV3101, which was subsequently employed to transform A. thaliana (Col-0) via the floral dip technique. T3 generation homozygous transgenic Arabidopsis lines were selected for measurement of thousand-seed weight and the dimensions of elongated siliques.
GUS staining, GFP fluorescence observations and flow cytometry
The pCAMBIA1305-PsOs1JI1995 vector was constructed using the same methodology, with primers detailed in Supplementary Table 47. Both vectors, pCAMBIA1305-PsOs1JI1995 and pCAMBIA1305-PsOs1JI0074, were introduced into the A. tumefaciens strain GV3101. In these experiments, H2B-mCherry served as a nucleus marker. The agrobacteria were resuspended and infiltrated into Nicotiana benthamiana leaf epidermal cells using an infiltration buffer consisting of 10 mM MES (pH 5.6), 10 mM MgCl2, and 150 μM acetosyringone, at an OD600 of 0.8. Fluorescence was observed 48 h after infiltration using a confocal laser-scanning microscope.
To compare the promoter activities of JI0074 and JI1995 alleles of PsOs1, we cloned sequences 3,000-bp upstream of the coding region and inserted them into pCAMBIA1300-GUS, resulting in the constructs ProJI0074-GUS and ProJI1995-GUS. These were expressed in Nicotiana tabacum leaves and subsequently stained using a GUS Staining Kit (Coolaber Biotech). GUS activity was quantified using the GUS Gene Quantitative Detection Kit (Coolaber Biotech). For a detailed examination of PsOs1 expression patterns in Arabidopsis, various Arabidopsis tissues were sampled from ProJI0074-GUS transgenic plants. After ethanol decolourization, observations and photographs were taken under a microscope. Details of the primers used are provided in Supplementary Table 47.
Intact nuclei from pea pods were isolated using LB01 lysis buffer (Coolaber Biotech), followed by RNA removal and subsequent propidium iodide staining. The nuclei were then quantified using a CytoFLEX flow cytometer. A minimum of 20,000 nuclei were counted for each sample, and each experiment was replicated at least three times. Data analysis was conducted using FLOWJO software, and representative images were presented. The endoreduplication index (EI) was calculated using the formula: EI = [(0 × percentage of 2C nuclei) + (1 × percentage of 4C nuclei) + (2 × percentage of 8C muclei) + (3 × percentage of 16C nuclei) + (4 × percentage of 32C nuclei)]/100.
Yeast two-hybrid experiment
Yeast two-hybrid assays were conducted according to the protocols outlined in the Yeast Protocols Handbook (Clontech). The CDS of PsCIK2/3 was cloned into the bait plasmid pBT3-SUC, while the CDS of PsCLV1 or PsCLV2 was cloned into the prey plasmid pPR3-N. The primer sequences used for cloning are provided in Supplementary Table 47. These plasmids were co-transformed into the yeast strain NMY51 in different combinations. Transformants were initially screened on SD/-Trp/-Leu medium to confirm successful co-transformation. Interaction assays were then performed on SD/-Trp/-Leu/-His/-Ade medium containing the chromogenic substrate X-α-Gal at 30 °C to detect protein-protein interactions.
Anatomical studies and TEM
Confocal images were collected with Leica TCS SP8 confocal laser-scanning microscope (Leica). After sampling, the shoot apices of Caméor and fa mutant line JI0814, the young leaves (gp/gp, JI2366; Gp/Gp JI0817), and the pod walls of JI0074 and JI1995 were immediately preserved in formaldehyde/alcohol/acetic acid fixative. Paraffin sectioning was performed following established methodologies. Staining was conducted using safranin and fast green (JI0074 and JI1995) and toluidine blue (Caméor and JI0814). Prepared slides were scanned using a NanoZoomer, and cell quantification was carried out using NDP.view2 software. For pods, the resin block was sliced at 60–80 nm on an ultrathin slicer, and the slices were picked up on a 150-mesh copper mesh. The copper mesh was stained with a 2% uranyl acetate saturated alcohol solution in the dark for 8 min; washed 3 times with 70% alcohol; washed 3 times with ultrapure water; stained with a 2.6% lead citrate solution in the dark for 8 min; washed 3 times with ultrapure water, and slightly dried with filter paper. The copper mesh sections were placed in a copper mesh box and dried overnight at room temperature. The observation was under a transmission electron microscope and images collected for analysis.
For TEM studies, pea leaflets and pods (18 days after flowering) were removed from BC3 S2 gp/gp and Gp/Gp plants, after 9 h of daylight. Tissue (1 mm2) pieces were placed in a solution of 2.5% (v/v) glutaraldehyde in 0.05 M sodium cacodylate, pH 7.3 for fixation. Samples were left overnight at room temperature and then processed for embedding (Leica EM TP embedding machine) by washing out the fixative with three successive 15-min washes in 0.05 M sodium cacodylate, followed by fixation in 1% (w/v) OsO4 in 0.05 M sodium cacodylate for 2 h at room temperature. After three 15 min washes in distilled water, samples were dehydrated in an ethanol series (30%, 50%, 70%, 95% and two changes of 100% ethanol), then infiltrated with LR White resin (London Resin Company) by successive changes of resin:ethanol mixes at room temperature (1:1 for 1 h, 2:1 for 1 h, 3:1 for 1 h, 100% resin for 1 h, then 100% resin for 16 h, and 100% resin for a further 8 h). Samples were polymerized in LR White resin at 60 °C for 16 h, then sectioned with a diamond knife (Leica UC7 ultramicrotome). Ultrathin sections (approximately 90 nm) were placed on 200 mesh formvar and carbon-coated copper grids (Agar Scientific). Sections were stained with 2% (w/v) uranyl acetate for 1 h and 1% (w/v) lead citrate for 1 min, washed in distilled water and air dried. Grids were viewed in a FEI Talos 200 C transmission electron microscope (FEI) at 200 kV and imaged using a Gatan OneView 4K × 4K digital camera (Gatan) to record DM4 files.
In situ hybridization
Tissues were rinsed with PBS and immediately placed in the in situ hybridization fixative solution for more than 12 h. Paraffin section preparation and in situ hybridization of the probes were performed according to standard protocols80. The sequences of the digoxigenin (DIG)-labelled antisense riboprobes used for in situ hybridization are provided in Supplementary Table 47. The hybridization signal was detected with an alkaline phosphatase-conjugated anti-DIG antibody (200-052-156, Jackson ImmunoResearch). Finally, images were obtained and analysed using a Pannoramic MIDI digital slide scanner (3DHISTECH).
RNA-seq, iso-seq and gene expression
RNA extraction and pea transcriptome
We built a transcriptome atlas from the reference line Caméor (Supplementary Table 30), and selected various accessions that display the contrasting pairs of traits studied here (Supplementary Table 31). In China, plant tissues (seed, root, nodule, leaflet, stem, flower, pod, stipule, tendril and apical bud) at different development stages (seedling, flowering and podding) were collected and fixed in Trizol before RNA extraction. Tissues were ground in liquid nitrogen and the FastPure Universal Plant Total RNA Isolation Kit (Vazyme Biotech) was used to extract total RNA, the quality of which was assessed by gel electrophoresis. For each sample, we performed short-read RNA-sequencing using the DNBSEQ Platform at BGI Group to generate 6–8 Gb raw RNA reads for each accession.
At JIC, RNA was prepared from young developing pods (flat pod stage, ~60–70 mm in length) of each of the parental and RI lines derived from the cross between JI0015 (gp/gp) and JI0399 (Gp/Gp). Developing seeds were removed from the pods, which were then rapidly frozen in liquid nitrogen. High-quality RNA lacking genomic DNA was extracted from 97 individual pod samples, using a Spectrum Plant Total RNA Kit (Sigma-Aldrich), and used for PCR with reverse transcription and RNA-seq experiments focussed on the identification and characterisation of gene candidates for gp. For the latter analysis, green-podded and yellow-podded RILs (95 in total) were assigned to three groups for each phenotype, ensuring that lines with contrasting plant phenotypes (e.g. plant height) were randomly distributed among the replicate groups (G1, G2 and G3 for green-podded RILs; Y1, Y2 and Y3 for yellow-podded RILs, with 15–17 RILs per pool). Equal amounts of RNA from every line within a group were pooled. RNA-seq (Illumina HiSeq4000) and initial bioinformatic analyses were carried out by the Earlham Institute.
We performed Iso-seq sequencing for a subset of accessions for the target organ at specific developmental stages. We used the Iso-Seq (v.4.0.0) pipeline to process PacBio SMRT Cell subreads and generate high-quality, full-length transcripts. First, subreads from each SMRT Cell were processed with ccs (v.3.4.1) to produce one circular consensus sequence per zero-mode waveguide, applying a minimum read quality of 0.9. Next, primer removal and demultiplexing were performed with lima (v.2.9.0) in Iso-Seq mode, removing unwanted primer combinations and orienting reads from 5′ to 3′. Full-length reads were then refined by trimming poly(A) tails and removing concatemers. When multiple SMRT cells were sequenced, the resulting full-length non-concatemer (FLNC) BAM files were merged before clustering. Iso-Seq cluster (v.4.0.0) was applied to produce polished consensus transcripts, partitioned into high-quality (HQ) and low-quality (LQ) sets on the basis of predicted accuracy. The final consensus transcripts were aligned to the ZW6 and JI0074 reference genomes using pbmm2 (v.1.14.99). Last, iso-seq collapse was used to collapse redundant isoforms and generate GFF files, which were converted to GFF3 with gffread (0.12.7) for downstream analyses.
RT–qPCR
RT–qPCR was conducted to analyse gene expression. Total RNA was extracted using the FastPure Plant Total RNA Isolation Kit (Vazyme Biotech) following the manufacturer’s instructions, including an on-column DNase I digestion step to remove genomic DNA. Subsequently, 1 μg of RNA was used for cDNA synthesis with the All-in-One First-Strand cDNA Synthesis SuperMix for qPCR kit (TransGen Biotech). Green qPCR SuperMix kit (TransGen Biotech) was used for amplification on a CFX384TM Real-Time System (Bio-Rad). The method of collecting plant material for the detection of P/PsCLE41 and V/PsMYB26 was as follows: (1) pods were collected 10 days after flowering. After removing the seeds, the pods were cut into pieces and frozen in liquid nitrogen; (2) apical buds were collected 14 days after emergence and, after removing the extra young leaves under a microscope, the buds were cut into pieces and frozen in liquid nitrogen; (3) stems were collected 14 days after emergence at the third node from the top of the stem, cut into pieces and frozen in liquid nitrogen; (4) seed cotyledons were collected 12 days after flowering; after removing the testa, the cotyledons were cut into pieces and frozen in liquid nitrogen.
To validate the Gp transcriptional fusion and aberrant transcripts, the total RNA was reverse transcribed to cDNA using HiScript III First Strand cDNA Synthesis Kit (+gDNA wiper, Vazyme Biotech). RT–qPCR analysis was conducted using Taq Pro Universal SYBR qPCR Master Mix (Vazyme Biotech), employing specific primers, with PsACTIN serving as the internal standard. Expression levels of genes were quantified relative to the control based using the 2−ΔΔCT method. All RT–qPCR results represent the mean ± s.d. from three separate biological experiments. The primers used for RT–qPCR primers used are provided in Supplementary Table 47.
DNA methylation sequencing
Bisulfite treatment and libraries were prepared accordingly to the standard protocol of methylation library construction. PE150 sequencing was performed using Illumina NovaSeq X Plus sequencing platform 25B chip. Quality control and adapter trimming of the raw whole-genome bisulfite sequencing (WGBS) data was then performed using Trimmomatic (v.0.39). The resulting WGBS reads were mapped to the JI0074 reference genome using Bismark (v.0.23.0) and PCR duplicates were removed from the aligned reads.
Statistical methods
General statistical analysis
Statistical analyses were conducted in R software suite (v.4.2; https://www.r-project.org/) unless otherwise stated. The correlation between different traits was tested by calculating the coefficients of Pearson correlation, as well as the P values, using the cor.test package, with the method set to ‘Pearson’ for the correlation analyses between quantitative traits. Traits collected at different locations and in different years were analysed by calculating their rank correlations by setting the option ‘method’ to ‘Spearman’. The correlation between qualitative traits was assessed using the chi-square test using the ‘chisq.test’ package in R. Gene expression levels in different lines or organs under different treatments or at different developmental stages (at least three biological replicates for each sample) were analysed using DESeq2 (ref. 81), in which the genes with a false discovery rate (Bonferroni) lower than 0.01 were defined as significantly regulated genes, unless there is an alternative explanation in a specific legend. The min–max scaling (normalization) approach was used to calculate the expression level for comparison of each gene across stages and organs by the formula: Xscaled = (X − Xmin)/(Xmax − Xmin).
PCA (main text Fig. 1) was performed on the PLINK distance matrix using an Excel add-in downloaded from RIKEN, now available at https://systemsomicslab.github.io/compms/others/main.html#Statistics.
Statistics and reproducibility
All experiments were designed with explicit consideration of statistical power and reproducibility. Each experiment was independently repeated at least three times (biological replicates) with consistent results across repetitions, including microscopy analyses used at least three samples or technical replicates per experimental condition (such as Fig. 3b,c,g and Extended Data Figs. 4a–c,e and 6b,h,i); phenotypic observations were validated across ≥3 independent growth cycles (Norwich, Shenzhen, Harbin, multiple sites and multiple years). Micrographs shown are representative of at least three independent experiments. Complete protocols and raw data supporting the conclusions are available in the Data availability and Supplementary Information. Data are presented as mean ± s.e.m., and statistical significance was determined using two-tailed Students’ t-tests (‘t.test’ package in R software v.4.2) in the analyses of the phenotypes, such as seed weight and pod width, between different accessions.
Population structure analysis
The core high-quality SNP dataset was used for population structural analyses. PCA and t-distributed stochastic neighbour embedding analyses were first performed using beta Python modules sklearn.decomposition and sklearn.manifold. ADMIXTURE82 (version 1.3.0) was employed to analyse the population structure, with K increasing from 2 to 16. The Q value represents the estimated proportion of an individual’s genome that originates from each inferred ancestral population cluster.
Genetic differentiation (Fst) and nucleotide diversity (π) were calculated with VCFtools (version 0.1.13). Fst scores were calculated within nonoverlapping 100-kb windows and π was calculated for each individual site and averaged across the genome for each group. LD was calculated on SNP pairs within a 500-kb window using PopLDdecay83 (version 3.31; https://github.com/BGI-shenzhen/PopLDdecay) and the decay was measured by the distance at which the Pearson’s correlation efficient (r2) dropped to half of the maximum. Splits tree analysis of the PLINK distance matrix was performed using SplitsTree4 (ref. 61).
Germplasm availability
All the germplasm described and used in this work is available to order from the John Innes Centre Germplasm Resources Unit (https://www.seedstor.ac.uk/). The 697 sequenced single seed descent lines derived from John Innes Pisum Germplasm accessions, were imported in 2019 to the Agricultural Genomics Institute at Shenzhen (AGIS), Chinese Academy of Agricultural Sciences (CAAS), where they have been grown and phenotyped annually since then.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All whole-genome sequence data, transcriptome data, and population variant map data have been deposited at the National Genomics Data Center (NGDC) Genome Sequence Archive (GSA) under BioProject accession PRJCA023166. The datasets are available under GSA accession CRA014669, CRA023374 and CRA023375. Long-term phenotype curation information is available on SeedStor (https://www.seedstor.ac.uk). Data and analyses are also available from our established portal at https://mendelpea.com/dataAvailable and https://mendelpea.com/toolkits. Source data are provided with this paper.
Code availability
Code associated with this project is available at Github: https://github.com/ShifengCHENG-Laboratory/MendelPeaG2P.
References
Mendel, G. Versuche über plflanzenhybriden. Verh. Naturforsch. Vereines Brünn 4, 3–47 (1866).
Bhattacharyya, M. K., Smith, A. M., Ellis, T. H. N., Hedley, C. & Martin, C. The wrinkled-seed character of pea described by Mendel is caused by a transposon-like insertion in a gene encoding starch-branching enzyme. Cell 60, 115–122 (1990).
Martin, D. N., Proebsting, W. M. & Hedden, P. Mendel’s dwarfing gene: cDNAs from the le alleles and function of the expressed proteins. Proc. Natl Acad. Sci. USA 94, 8907–8911 (1997).
Armstead, I. et al. Cross-species identification of Mendel’s i locus. Science 315, 73 (2007).
Sato, Y., Morita, R., Nishimura, M., Yamaguchi, H. & Kusaba, M. Mendel’s green cotyledon gene encodes a positive regulator of the chlorophyll-degrading pathway. Proc. Natl Acad. Sci. USA 104, 14169–14174 (2007).
Shimoda, Y., Ito, H. & Tanaka, A. Arabidopsis stay-green, Mendel’s green cotyledon gene, encodes magnesium-dechelatase. Plant Cell 28, 2147–2160 (2016).
Hellens, R. P. et al. Identification of Mendel’s white flower character. PLoS ONE 5, e13230 (2010).
Diamond, J. Evolution, consequences and future of plant and animal domestication. Nature 418, 700–707 (2002).
Dahl, W. J., Foster, L. M. & Tyler, R. T. Review of the health benefits of peas (Pisum sativum L.). Br. J. Nutr. 108, S3–S10 (2012).
Tsyganov, V. E. & Tsyganova, A. V. Symbiotic regulatory genes controlling nodule development in Pisum sativum L. Plants 9, 1741 (2020).
Abbott, S. & Fairbanks, D. J. Experiments on plant hybrids by Gregor Mendel. Genetics 204, 407–422 (2016).
Ellis, T. H. N., Hofer, J. M. I., Swain, M. T. & van Dijk, P. J. Mendel’s pea crosses: varieties, traits and statistics. Hereditas 156, 33 (2019).
Franklin, A. What Makes a Good Experiment? (Univ. Pittsburgh Press, 2016).
Smýkal, P. et al. Pea (Pisum sativum L.) in the genomic era. Agronomy 2, 74–115 (2012).
White, O. E. The present state of knowledge of heredity and variation in peas. Proc. Am. Phil. Soc. 56, 487–588 (1917).
Ellis, T. H. N., Hofer, J. M. I., Timmerman-Vaughan, G. M., Coyne, C. J. & Hellens, R. P. Mendel, 150 years on. Trends Plant Sci. 16, 590–596 (2011).
Kreplak, J. et al. A reference genome for pea provides insight into legume genome evolution. Nat. Genet. 51, 1411–1422 (2019).
Yang, T. et al. Improved pea reference genome and pan-genome highlight genomic features and evolutionary characteristics. Nat. Genet. 54, 1553–1563 (2022).
Shirasawa, K., Sasaki, K., Hirakawa, H. & Isobe, S. Genomic region associated with pod color variation in pea (Pisum sativum). G3 11, jkab081 (2021).
Zhang, P. et al. Fine mapping PsPS1, a gene controlling pod softness that defines market type in pea (Pisum sativum). Plant Breed 141, 418–428 (2022).
Liu, N. et al. Reference genome sequence and population genomic analysis of peas provide insights into the genetic basis of Mendelian and other agronomic traits. Nat. Genet. 56, 1964–1974 (2024).
De Beukelaer, H., Davenport, G. F. & Fack, V. Core hunter 3: flexible core subset selection. BMC Bioinformatics 19, 203 (2018).
Jing, R. et al. The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) marker analysis. BMC Evol. Biol. 10, 44 (2010).
Turner, S. R., Barratt, D. H. P. & Casey, R. The effect of different alleles at the r locus on the synthesis of seed storage proteins in Pisum sativum. Plant Mol. Biol. 14, 793–803 (1990).
Price, D. N., Smith, C. M. & Hedley, C. L. The effect of the gp gene on fruit development in Pisum sativum L. I. Structural and physical aspects. New Phytol. 110, 261–269 (1988).
Dalmais, M. et al. Utilldb, a Pisum sativum in silicoforward and reverse genetics tool. Genome Biol. 9, R43 (2008).
Ruel, J. De Natura Stirpium Libri Tres (Froben Press, Basel, 1537).
Ram, H., Hedau, N. K., Chaudhari, G. V. & Kant, L. Peas with zero shelling edible pods: a review. Sci. Hortic. 288, 110333 (2021).
Yaginuma, H., Hirakawa, Y., Kondo, Y., Ohashi-Ito, K. & Fukuda, H. A novel function of TDIF-related peptides: promotion of axillary bud formation. Plant Cell Physiol. 52, 1354–1364 (2011).
Ito, Y. et al. Dodeca-CLE peptides as suppressors of plant stem cell differentiation. Science 313, 842–845 (2006).
Etchells, J. P. & Turner, S. R. The PXY–CLE41 receptor ligand pair defines a multifunctional pathway that controls the rate and orientation of vascular cell division. Development 137, 767–774 (2010).
Hirakawa, Y. et al. Non-cell-autonomous control of vascular stem cell fate by a CLE peptide/receptor system. Proc. Natl Acad. Sci. USA 105, 15208–15213 (2008).
Fisher, K. & Turner, S. Pxy, a receptor-like kinase essential for maintaining polarity during plant vascular-tissue development. Curr. Biol. 17, 1061–1066 (2007).
Smit, M. E. et al. A PXY-mediated transcriptional network integrates signaling mechanisms to control vascular development in arabidopsis. Plant Cell 32, 319–335 (2020).
Etchells, J. P., Smit, M. E., Gaudinier, A., Williams, C. J. & Brady, S. M. A brief history of the TDIF‐PXY signalling module: balancing meristem identity and differentiation during vascular development. New Phytol. 209, 474–484 (2016).
Karaca, D. E. Molecular Mapping of the v Locus in Pea (Pisum sativum l.). Master of Science Thesis, Washington State Univ. (2019).
Yang, Y. et al. The pea R2R3-MYB gene family and its role in anthocyanin biosynthesis in flowers. Front. Genet. 13, 936051 (2022).
Yang, C. at al. Transcription factor MYB26 is key to spatial specificity in anther secondary thickening formation. Plant Physiol. 175, 333–350 (2017).
Parker, T. A., Lo, S. & Gepts, P. Pod shattering in grain legumes: emerging genetic and environment-related patterns. Plant Cell 33, 179–199 (2021).
Gerard, J. The Herball, or, Generall Historie of Plantes (John Norton, 1597).
Sinjushin, A. A. & Gostimsky, S. A. Fasciation in pea: basic principles of morphogenesis. Russ. J. Dev. Biol. 37, 375–381 (2006).
Sinjushin, A. & Gostimskii, S. Relationship between different fasciated lines of pea. Pisum Genetics 39, 16–18 (2007).
Hu, C. et al. A group of receptor kinases are essential for CLAVATA signalling to maintain stem cell homeostasis. Nat. Plants 4, 205–211 (2018).
Osipova, M. A. et al. WUSCHEL-RELATED HOMEOBOX5 gene expression and interaction of cle peptides with components of the systemic control add two pieces to the puzzle of autoregulation of nodulation. Plant Physiol. 158, 1329–1341 (2012).
Blixt, S. Mutation genetics in Pisum. Agric. Hort. Genet. 30, 1–293 (1972).
van Dijk, P. J., Jessop, A. P. & Ellis, T. H. N. How did Mendel arrive at his discoveries? Nat. Genet. 54, 926–933 (2022).
Albert, N. W., Griffiths, A. G., Cousins, G. R., Verry, I. M. & Williams, W. M. Anthocyanin leaf markings are regulated by a family of R2R3‐MYB genes in the genus Trifolium. New Phytol. 205, 882–893 (2015).
Wang, C. et al. The antagonistic MYB paralogs RH1 and RH2 govern anthocyanin leaf markings in Medicago truncatula. New Phytol. 229, 3330–3344 (2021).
Ellis, N. et al. Recombinant inbred lines derived from wide crosses in pisum. Sci. Rep. 13, 20408–20408 (2023).
Ramsay, N. A. & Glover, B. J. MYB–bHLH–WD40 protein complex and the evolution of cellular diversity. Trends Plant Sci. 10, 63–70 (2005).
Sawada, C. et al. An integrated linkage map of three recombinant inbred populations of pea (Pisum sativum L). Genes 13, 196–196 (2022).
D’Erfurth, I. et al. A role for an endosperm-localized subtilase in the control of seed size in legumes. New Phytol. 196, 738–751 (2012).
Yi, D. et al. The Arabidopsis SIAMESE-RELATED cyclin-dependent kinase inhibitors SMR5 and SMR7 regulate the DNA damage checkpoint in response to reactive oxygen species. Plant Cell 26, 296–309 (2014).
Tayeh, N. et al. afila, the origin and nature of a major innovation in the history of pea breeding. New Phytol. 243, 15 (2024).
Burstin, J. et al. Developmental genes have pleiotropic effects on plant morphology and source capacity, eventually impacting on seed protein content and productivity in pea. Plant Physiol. 144, 768–781 (2007).
Olby, R. Mendel no Mendelian? Hist. Sci. 17, 53–72 (1979).
Kampourakis, K. Mendel and the path to genetics: Portraying science as a social process. Sci. Educ. 22, 293–324 (2013).
Fisher, R. A. Has Mendel’s work been rediscovered? Ann. Sci. 1, 115–137 (1936).
Cheng, S. Gregor Mendel: the father of genetics who opened a biological world full of wonders. Mol. Plant 15, 1641–1645 (2022).
Stoltzfus, A., Logsdon, J. M., Palmer, J. D. & Doolittle, W. F. Intron “sliding” and the diversity of intron positions. Proc. Natl Acad. Sci. USA 94, 10739–10744 (1997).
Huson, D. H. & Bryant, D. Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23, 254–267 (2006).
Rayner, T. et al. Rebalancing the seed proteome following deletion of vicilin-related genes in pea (Pisum sativum L.). J. Exp. Bot. https://doi.org/10.1093/jxb/erae518 (2024).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Freed, D., Aldana, R., Weber, J. A. & Edwards, J. S. The Sentieon Genomics Tools—a fast and accurate solution to variant calling from next-generation sequence data. Preprint at BioRxiv https://doi.org/10.1101/115717 (2017).
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, 80–92 (2012).
Wang, W. et al. Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557, 43–49 (2018).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Feng, C. et al. HAPPE: a tool for population haplotype analysis and visualization in editable Excel tables. Front. Plant Sci. 13, 927407 (2022).
Schneeberger, K. Using next-generation sequencing to isolate mutant genes from forward genetic screens. Nat. Rev. Genet. 15, 662–676 (2014).
Aboul-Maaty, N. A.-F. & Oraby, H. A.-S. Extraction of high-quality genomic DNA from different plant orders applying a modified CTAB-based method. Bull. Natl. Res. Cent. 43, 25 (2019).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
He, J. et al. An innovative procedure of genome-wide association analysis fits studies on germplasm population and plant breeding. Theor. Appl. Genet. 130, 2327–2343 (2017).
Domoney, C. et al. Exploiting a fast neutron mutant genetic resource in Pisum sativum (pea) for functional genomics. Funct. Plant Biol. 40, 1261 (2013).
Grønlund, M., Olsen, A., Johansen, E. I. & Jakobsen, I. Protocol: using virus-induced gene silencing to study the arbuscular mycorrhizal symbiosis in Pisum sativum. Plant Methods 6, 28 (2010).
Li, X. et al. BIGGER ORGANS and ELEPHANT EAR-LIKE LEAF1 control organ size and floral organ internal asymmetry in pea. J. Exp. Bot. 70, 179–191 (2019).
Constantin, G. D. et al. Virus‐induced gene silencing as a tool for functional genomics in a legume species. Plant J. 40, 622–631 (2004).
Brewer, P. B., Heisler, M. G., Hejátko, J., Friml, J. & Benková, E. In situ hybridization for mRNA detection in Arabidopsis tissue sections. Nat. Protoc. 1, 6 (2006).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Zhang, C., Dong, S.-S., Xu, J.-Y., He, W.-M. & Yang, T.-L. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics 35, 1786–1788 (2019).
Zhang, H., Lin, X., Han, Z., Qu, L.-J. & Chai, J. Crystal structure of PXY–TDIF complex reveals a conserved recognition mechanism among CLE peptide–receptor pairs. Cell Res. 26, 543–555 (2016).
Ellis, T. H. N. et al. Diversity of pod shape in Pisum. Diversity 13, 203 (2021).
Berdnikov, V. A. et al. The neoplastic pod gene (Np) may be a factor for resistance to the pest Bruchus pisorum L. Pisum Genet. 24, 3 (1992).
Sinjushin, A. & Liberzon, A. Contribution to genetic control of flower number in pea (Pisum sativum L.). Ratar. Povrt. 53, 4 (2016).
Weller, J. L. et al. A conserved molecular basis for photoperiod adaptation in two temperate legumes. Proc. Natl Acad. Sci. USA 109, 21158–21163 (2012).
Balarynová, J. et al. The loss of polyphenol oxidase function is associated with hilum pigmentation and has been selected during pea domestication. New Phytol. 235, 1807–1821 (2022).
Acknowledgements
The authors thank J. J. Doyle, M. Bennett, S. Huang, B. Lucas, D. Sanders and G. Moore for their invaluable support and comments for this project; colleagues for assistance in pea field trial and phenotyping work from experimental stations across northern and southern China; Y. Zhong and S. Liu; and L. Zhang for providing ZW1 genome assembly and annotation for comparison in this work; E. Jones and E. Crawford for plant phenotype data; M. Trick and S. Griffiths for valuable discussions at the John Innes Centre (JIC); the JIC NBI Computing Infrastructure for Science and JIC Bioinformatics groups for support in data handling and analysis; the JIC Field Trials and Horticultural Services teams for support with field and glasshouse experiments; the Molecular Genetics, Genotyping and DNA Extraction Platforms for support in experimental biology; and Bioimaging and Scientific Photography Platforms for phenotype visualization. This work was supported by the Program for Guangdong ‘ZhuJiang’ Introducing Innovative and Entrepreneurial Teams (2019ZT08N628), the National Natural Science Foundation of China (32022006), the Agricultural Science and Technology Innovation Program (CAAS-ASTIP-2021-AGIS-ZDRW202101), the Shenzhen Science and Technology Program (AGIS-ZDKY202002), the National Key Research and Development Program of China (2023YFF1000100), and the National Key R&D Program of China (grant number 2023YFA0914600) to S. Cheng, and the National Natural Science Foundation of China (32401853) to Y. Shi. The work in the UK was possible due to the long-term investment of the UK Research Infrastructure Biotechnology and Biological Sciences Research Council (UKRI-BBSRC) through Institute Strategic Programme (ISP) grants, Institute Development Grant funds, and the Germplasm Resources National Capability Programme (BBS/E/J/000PR8000) and the National Bioscience Research Infrastructure grant (BBS/E/JI/23NB0001). We also acknowledge support from UKRI-BBSRC grants BB/J004561/1, BB/W510695/1 and BBS/E/J/000PR9799, the UK Department for Environment, Food, and Rural Affairs (Defra) through the Pulse Crop Genetic Improvement Network (grants CH0103 and CH0111) and the Provision and Maintenance of the Pea Genebank to Facilitate R&D Need grant (C5515), and JIC through its Institute Strategic Fund. We acknowledge the source of the TILLING mutant used in the analysis of gp as UMR1403-INRAE-IPS2, UMR1347-Agroecologie, France and the European Union FP6 Integrated Project Grain Legumes, FOOD-CT-2004-506223.
Author information
Authors and Affiliations
Contributions
S. Cheng and N.C. coordinated and managed the international consortium. S. Cheng conceived the project and designed the manuscript. S. Cheng, C.F., Y. Shi and M.J. led the genomics, association genetics analysis, field trial and phenotyping, mapping genetics and functional validation work. C.F., Y. Shi and M.J. led genomics and bioinformatics pipeline development, haplotype–phenotype and association genetics analyses under supervision of S. Cheng. S. Cheng, C.F., B.C. and M.J. led the gene identity and variation discovery for Gp, P, V, Fa, D locus and Os1, as well as for the four cloned genes R, I, A and Le. B.C., Y. Shi, L.L., L. Wang and Y. Sun led the field trial and phenotyping, gene cloning, RNA-seq and VIGS gene-silencing experiments. B. Song, H.Z., H.C., X.Z., X.L., J. Hou, T.W., Q.Y., P.L., C.Z., Z.W. and Z.H. participated in the whole-genome resequencing and bioinformatics analysis. X.W.D. participated in bioinformatics analysis and computation support from Peking University. J. Hofer and N.E. undertook genetic and genomic analyses of A, D, Gp and Fa, including the allelism tests and TILLING validation of the nature of gp. M.D., A.F. and C.L. identified and provided ChlG TILLING mutant seeds. B. Steuernagel coordinated and managed bioinformatic and genomic data analysis at JIC. N.E. commented on experimental results and assisted in data analysis of all the project results. N.C., C.D., N.E. and L. Wingen selected the germplasm panel. M. Vickers and R.W. assembled the JI0015 genome reference under supervision of B. Steuernagel and, together with J.C. C.L., A.B., M.D., C.S., C.M., R.S., M. Vigouroux and G.T., supported the Gp genetic and genomic explorations under the coordination of C.D. E.V., M.A., A.H., N.E., J. Hofer, N.C., P.H.-P., S. Chundakkad and J.E.B. contributed to germplasm phenotyping the sequenced panel. M.A., L.S. and N.C. contributed through germplasm curation and R.H. contributed by digitizing germplasm data. S. Cheng, N.C. and C.D. secured the project funding. S. Cheng, N.E., C.F., J. Hofer, M.J. and N.C. prepared the main figures, extended data figures, supplementary figures, supplementary tables and Methods. S. Cheng and N.E. drafted the manuscript and S. Cheng finalized the manuscript, with additional help from C.F., B.C., J. Hofer, N.C., C.D. and B. Song. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Aureliano Bombarely, Nataliya Yelina and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Schematic illustration of the genetic loci for each of Mendel’s seven traits plotted along the seven chromosomes (linkage groups).
The four previously cloned genes (R, I, A, Le) are annotated in black text, while the remaining genes with their gene identities and variations, or proposed candidates and variants, elucidated in this study (P, V, Gp, Fa) are highlighted in red text. The Mfa genetic locus, where the gene identity remains unknown, is also marked. Difference in the form of the ripe pods on chromosome 1 (LGVI, PP/pp) and 5 (LGIII, VV/vv); yellow versus green cotyledons (II/ii) on chromosome 2 (LGI); round seed versus wrinkled seed (RR/rr) and the colour of unripe pods (GpGp/gpgp) on chromosome 3 (LGV); difference in the position of the flower (FaFa/fafa) on chromosome 4 (LGIV); tall versus dwarf plants (LeLe/lele) on chromosome 5 (LGIII); seed coat (and flower) colour (AA/aa) on chromosome 6 (LGII).
Extended Data Fig. 2 Gene identity and functional variation underlying parchmentless pods (P vs. p).
a, Manhattan plot from GWAS analysis for the parchmentless pod trait, based on the ZW6 genome reference. b, Close-up view of the Manhattan plot in the most significant region identified in panel a. c, F2 genetic mapping interval derived from the cross JI2822 x JI0816, showing the mapped locus between markers AX-183563747 and AX-183563750 (chr1: 380,049,894-380,967,975) (Supplementary Tables 18–20). d, Map of gene positions within the P interval with Psat01G0420500 encoding a tracheary element differentiation inhibition factor CLE41 indicated in yellow. e, Allelic and haplotype variations for Psat01G0420500. Note that Hap1 carries a silent A-to-C transversion at chr1_380699321, close to chr1_380699320 of Hap3, where the T-to-A transversion is responsible for the Arg79* nonsense mutation. f, Haplotypes of Psat01G0420500 corresponding to accessions with ‘parchmentless’ phenotypes. g, Predicted amino acid sequence of Psat01G0420500 indicating the position of the Arg79* mutation in relation to the TDIF84 motif. h, Gene expression patterns for the 10 candidate genes from the genomic interval, in various organs and developmental stages of Caméor, showing that Psat01G0420500 (PsCLE41) is expressed exclusively in pod. Three biological replicates were used for each sample. The Min-Max scaling (normalization) approach was used to calculate the expression level for each gene across stages and organs by the formula: X_scaled = (X - X_min)/(X_max - X_min), where X is the original gene expression value, X_scaled is the scaled value, X_min is the minimum value of X, and X_max is the maximum value of X.
Extended Data Fig. 3 V and parchmentless pods.
a, Manhattan plot of GWAS analysis based on the ZW6 genome reference for a subset of accessions carrying only the R79* allele (haplotype 3 in Extended Data Fig. 2e) of gene Psat01G0420500 and wild type accessions (i.e. no v/v mutants), showing the P GWAS signal but not the V GWAS signal. b, Manhattan plot of GWAS analysis from a subset of accessions excluding those with haplotype 3 (Extended Data Fig. 2e) of gene Psat01G0420500 (i.e. no p/p but only the v/v mutants). This analysis shows only the V GWAS signal but not the P GWAS signal. c, Close-up view of the local details of the chromosome 5 GWAS peak corresponding to V. d, Genetic mapping of V vs v from previous studies36. e, Candidate genes (19) within the V genetic interval, with Psat05G0804500 (PsMYB26) highlighted in orange. f, Gene expression level in the pod tissues (8 days post flowering, three biological replicates for each sample) compared between the wild-type line (P/P V/V, JI1995) and the mutant line (P/P v/v, JI0074), across the 19 candidate genes under the interval. g, Allelic and haplotype variation in Psat05G0804500 (PsMYB26) across the diversity panel. All parchmentless accessions, except those carrying the R79* allele of P, are associated with a 23 kb Ogre retrotransposon element insertion. h, Gene expression patterns for the 19 candidate genes in various organs and developmental stages (three biological replicates for each sample) of Caméor, showing that Psat05G0804500 (PsMYB26) is expressed exclusively in pod and endocarp (red box). The normalization and statistical approach is the same as described in Extended Data Fig. 2h.
Extended Data Fig. 4 Anatomical and molecular characterization of pod endocarp development and associated pathways.
a-c, Microscopic imaging showing the anatomical patterns of the pod endocarp in wildtype (P/P V/V, JI2776) at different developmental stages: 3 days, 8 days, and 12 days post flowering. d, Gene expression levels, measured by RNA-seq approach, of key genes involved in the conserved well-established TDIF-PXY-WOX signalling pathway (PsCLE41, PsSERK, PsPXY, PsWOX4, PsWOX14) and two key component genes (PsMYB26, PsNAC) which have been reported to be involved in the secondary cell wall thickening and lignification in Arabidopsis. Expression data were obtained from various organs at different developmental stages (three biological replicates for each sample) in Caméor. The normalization and statistical approach was the same as described in Extended Data Fig. 2h. e, Comparative microscopic imaging of pod endocarp anatomical patterns at 12 days post flowering in four genotypes: P/P V/V (JI0190), p/p V/V (JI0466), P/P v/v (JI0074), and p/p v/v (JI0134). f, qRT_PCR analysis of PsCLE41, PsPXY, PsMYB26, and PsNAC in lines with different genotypes (P/P V/V, p/p V/V, P/P v/v, p/p v/v). The pod samples were obtained 8 days post flowering. Two different lines were selected for each genotype, with five biological replicates (n = 5) for each sample. Data are presented as mean ± SEM, and statistical significance was determined using a two-sided t-test (which applies to g). g, qRT_PCR results of PsMYB26 and PsNAC compared between the control and the VIGS-silenced lines. Three to five biological replicates were used for each sample both in the control and silenced lines. h, Proposed model illustrating the functional roles of PsCLE41 and PsMYB26 in pod endocarp development and lignin biosynthesis.
Extended Data Fig. 5 Identification of PsCIK2/3 as a candidate gene for Fa associated with fasciation.
a, Manhattan plot of GWAS based on the ZW6 genome reference, revealing a significant peak for fasciation between 0 and 40 Mb on chromosome 4; b, Close up of Manhattan plot of GWAS in the region of the peak in panel a; c, Bulked segregant mapping analyses (BSA) from sequencing fasciated and wild-type bulks of the F2 populations derived from the cross: Caméor (Fa/Fa) x JI0814 (fa/fa), and JI2822 (Fa/Fa) x JI0816 (fa/fa), further refining the genetic interval for Fa; d, Fine mapping of the Fa locus using two populations. In Caméor x JI0814 (Mapping 1), the region was narrowed down to chr4: 18,144,306-19,945,776 using 8 pairs of KASP markers (Supplementary Table 34); in the JI2822 x JI0816 population (Mapping 2), the interval was further confined chr4:18,180,969-19,506,907 (marker interval AX-183636277-AX183633456, Supplementary Table 18). e, Local detail of the fine-mapped genomic interval from panel d, showing 20 protein-coding genes, with Psat04G0031700 (encoding a Senescence-Associated Receptor-Like Kinase, PsCIK2/3) highlighted in orange; f, Population-based haplotype clustering analysis across the diversity panel for the 1.33Mb Fa region, identifying a cluster of fasciated accessions in Hap5; g, Haplotype clustering analysis of Psat04G0031700 (PsCIK2/3) reveals a 5 bp deletion associated with the fasciated phenotype, clustering all fasciated accessions into Hap3. h, Amino acid sequence alignment of PsCIK2/3 proteins from the wild-type line (JI2822, Fa, Psat04G0031700), the mutant line (JI0816, fa, Psat04G0031700-5bp), and the ortholog from Arabidopsis (AT2G23950.1, AtCIK2).
Extended Data Fig. 6 Functional characterization of PsCIK2/3.
a, Flowering stage phenotype of Caméor (Fa/Fa, left) and JI0814 (fa/fa, right). b, Longitudinal section of 40-day-old stems from Caméor and JI0814, stained with safranin and fast green; the red crosses mark the region of the shoot apical meristem (SAM) in both genotypes. c, Transverse section of the apical meristem of 14-day-old paraffin-embedded stems, from Caméor (left) and JI0814 (right), stained with toluidine blue. d, Quantification of the number of vascular bundles in the longitudinal section of 40-day-old stems of Caméor and JI0814. e, Cross-sectional area of the apical meristem in 14-day-old stems of Caméor and JI0814. Three biological duplicates were used. ** represents a significant level at P < 0.01 using a Student’s t-test in (d) and (e). f, Gene expression level from RNA-seq data for key genes involved in the CLV3-WUS signalling pathway measured across different organs and developmental stages (three biological replicates for each sample) in Caméor. The normalization and statistical approach is the same as described in Extended Data Fig. 2h. g, qRT_PCR analysis of gene expression for key genes involved in the CLV3-WUS signalling pathway, comparing the apical bud and stem between the wild-type line (Caméor) and the fasciated line (JI0814). Three biological replicates (n = 3) were used for each sample. The samples were obtained 14 days post budding. Data are presented as mean ± SEM, and statistical significance was determined using a two-sided t-test. The calculation used H3 as the reference gene, with P_value <= 0.001 marked as “***”, P_value <= 0.01 marked as “**”, P_value <= 0.05 marked as “*”, P_value >0.05 marked as “ns”. h, In situ hybridization of PsCIK2/3 in the apical bud compared between the wild-type line (JI2716) and the fasciated line (JI0814). i, Subcellular localization of PsCIK2/3 in Nicotiana benthamiana, showing co-localization with the cell membrane. j, Yeast two-hybrid assay showing interaction of PsCIK2/3 with PsCLV1 and PsCLV2.
Extended Data Fig. 7 Segregation analysis of Fa and Mfa.
a, Genotype data from the JI2822 x JI0816 F2 population, presented in an Excel spreadsheet format. The F2 individuals are sorted left to right according to their phenotype and their genotypic scores at Fa and Mfa. In the central upper part of the figure, homozygous JI0816 genotypes (fa/fa) are represented in yellow, homozygous JI2822 genotypes (Fa/Fa) are represented in green, and heterozygotes (Fa/fa) are represented in blue. In the central lower part of the figure, homozygous JI0816 genotypes (Mfa/Mfa) are represented in yellow, homozygous JI2822 genotypes (mfa/mfa) are represented in green, and heterozygotes (Mfa/mfa) are represented in blue. The limits of recombination intervals are marked by horizontal black lines. Wild-type (dark green) and fasciated (orange) phenotype scores are shown above the genotyping data. Homozygous and heterozygous genotypes at a proposed modifier locus, mfa, are shown below the genotyping data. F2 individuals informative for the positioning of Fa are marked with a red box; b, Tables explaining a one gene model of the summarised numerical data from panel a, where genotype fa/fa is fasciated; c, Tables explaining a two gene model of the summarised numerical data from panel a, showing the postulated Fa Mfa interaction, where the dominant allele Mfa is required for fasciation to occur. In this model fa/fa mfa/mfa is wild type but fa/fa Mfa/_ is fasciated. In both tables the numbers in red are F2 individuals with unexpected genotype/phenotype combinations, which were further tested and confirmed in the F3 population (Supplementary Notes and Supplementary Fig. 28).
Extended Data Fig. 8 Identification of genomic loci associated with major agronomic traits.
a, Multi-site phenotyping experiments were conducted to measure 79 traits in total from distinct climate zones at three different locations: Southern China (22°N, Shenzhen), Northern China (45°N, Harbin), and the UK (52°N, Norwich). Map created using the maps package (3.4.0) in R (version 4.2). b, Illustrative photographs and drawings of phenotypic data collected for different trait categories scored in this study. The points in the hexagon represent the total number of sub-traits collected for each category, with the red line indicating the total number of phenotypes assessed (Supplementary Table 37). c, Significant marker-trait associations (MTAs) and their genetic effects for component traits from seeds, pods, leaves, flowers, roots and plant architecture. The number of sub-traits for each category is shown in parentheses. Specific examples for some of the selected Manhattan plots are shown to explain: d, The acute vs. blunt pod tip phenotypes, corresponding to the Bt locus (a locus known from classical genetics alone)85. e, The pod neoplasm phenotype, the development of pustular-like growths, known as ‘neoplasms’. The locus known from classical genetics is Np86. f, Green pod vs purple pod phenotypes corresponding to the known genetic loci: Pur and Pu45. g, A new locus at the end of chromosome 3, underlying the seed number (SDN) per pod. h, Variations in flower number per axillary inflorescence corresponding to the known genetic loci: Fn and Fna87. i, A new locus on chromosome 3, underlying the total seed weight per plant (SDY), a yield component trait. j, Variation in flowering time corresponding to the Hr locus88. k, A new locus underlying flower size (FLS), on chromosome 1. l, Brown vs black hilum colour phenotypes, corresponding to the Pl locus89. m, A historical locus (Ser1) at chromosome 5, explaining the phenotypic variation in leaflet margin serration.
Supplementary information
Supplementary information
This file contains supplementary notes, Supplementary Figs. 1–44 and references
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Feng, C., Chen, B., Hofer, J. et al. Genomic and genetic insights into Mendel’s pea genes. Nature 642, 980–989 (2025). https://doi.org/10.1038/s41586-025-08891-6
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-08891-6
This article is cited by
-
Century-old genetics mystery of Mendel’s peas finally solved
Nature (2025)
-
Beyond the identification of Mendel’s genes
Nature Plants (2025)
-
Kostas Kampourakis, How We Get Mendel Wrong, and Why It Matters: Challenging the Narrative of Mendelian Genetics, Boca Raton: CRC Press, 2024, ISBN 9781032456904, 250 pp.
Journal of the History of Biology (2025)
-
Fine mapping and prediction of a candidate gene for wrinkled rind in melon (Cucumis melo L.)
Theoretical and Applied Genetics (2025)