Abstract
Human craniofacial shape is highly variable yet highly heritable with numerous genetic variants interacting through multiple layers of development. Here, we hypothesize that Mendelian phenotypes represent the extremes of a phenotypic spectrum and, using achondroplasia as an example, we introduce a syndrome-informed phenotyping approach to identify genomic loci associated with achondroplasia-like facial variation in the general population. We compare three-dimensional facial scans from 43 individuals with achondroplasia and 8246 controls to calculate achondroplasia-like facial scores. Multivariate GWAS of the control scores reveals a polygenic basis for facial variation along an achondroplasia-specific shape axis, identifying genes primarily involved in skeletal development. Jointly modeling these genes in two independent control samples, both human and mouse, shows craniofacial effects approximating the characteristic achondroplasia phenotype. These findings suggest that both complex and Mendelian genetic variation act on the same developmentally determined axes of facial variation, providing insights into the genetic intersection of complex traits and Mendelian disorders.
Similar content being viewed by others
Introduction
Genetic variation, in conjunction with environmental factors, influences developmental processes that drive phenotypic variation1,2. Rare major-effect variants and common variants have been identified through largely separate studies of monogenic and complex phenotypes, respectively3. However, recent advances have led to a far deeper understanding of the relationship between typical and syndromic development. One key conceptual hypothesis is that both typical and syndromic phenotypic variation occur predominantly along developmentally constrained directions of phenotypic change, or ‘axes of variation’. Both common and rare variants can act upon these axes, causing syndromic phenotypic variation to occur along the extremes of axes of typical phenotypic variation. For example, variants in the LRP5 gene are associated with monogenic forms of sclerosteosis and osteoporosis, as well as typical variation in bone mineral density in humans4. In mice, mutations that perturb cartilage development result in extreme craniofacial phenotypes along an axis of shape variation related to chondrocranial growth5,6,7. Findings consistent with this hypothesis3,8,9,10,11,12 highlight the potential importance of an integrated approach that incorporates both complex and Mendelian traits into the study of phenotypic variation.
The human facial shape shows great potential for such an integrated approach. Facial shape is an assemblage of highly variable, developmentally complex phenotypes that are largely genetically determined, involving both common and rare variants with a range of effect sizes13. Previous work suggests that rare polygenic facial traits and typical facial variation are genetically related and might share developmental axes, but it is unknown whether this is true for monogenic phenotypes as well14. Recent advances in three-dimensional (3D) image processing technology and genome-wide association studies (GWAS) have enabled the identification of hundreds of genetic loci associated with facial variation in the general population, yet collectively these only account for about 10% of facial phenotypic variance13. In addition, many rare large-effect variants have been discovered through the study of Mendelian disorders with craniofacial dysmorphism15. A well-known example is the recurrent pathogenic gain-of-function variant G380R in FGFR3 that causes achondroplasia (ACH), the most common form of skeletal dysplasia16. FGFR3 is a regulator of bone growth that is expressed in chondrocytes and mature osteoblasts, and increased FGFR3 signaling suppresses the proliferation and maturation of growth plate chondrocytes. This, in turn, impairs endochondral bone growth, resulting in rhizomelic limb shortening and short stature in ACH17,18. In the skull, premature fusion of skull base synchondroses leads to a shortened basicranium and a recognizable pattern of frontal bossing and midface hypoplasia in affected individuals19. The FGFR3 locus exhibits allelic heterogeneity, with some allelic disorders producing skeletal and/or facial features similar to those of ACH20,21. Genotype-phenotype correlations have been established for FGFR3, but these cannot explain the modest range of variability that exists within the characteristic ACH phenotype as nearly all ACH patients share the same pathogenic G380R FGFR3 variant16,22. This suggests the presence of additional background genetic determinants that influence the severity of the ACH phenotype23. The factors underlying this variable phenotypic expressivity remain largely unknown.
If both typical and syndromic phenotypic effects converge on developmentally constrained axes, and phenotypic variation occurs principally along those axes, we would expect that facial variation along a syndromic phenotypic axis would also be present to some degree in the general population. In this work, we tested this hypothesis by projecting the ACH facial phenotype onto an unselected control population to model ACH-derived facial variation as a quantitative trait, rather than a binary categorical (or monogenic) trait. Genetically mapping these traits in the control population revealed strong enrichment for genes involved in developmental processes that are key in the pathophysiology of ACH. Elements of the ACH phenotype could also be replicated in silico in two independent control samples, relying solely on the uncovered polygenic background. We discuss the implications of our findings to the broader field of Mendelian and complex trait genetics.
Results
ACH phenotype can be constructed from axes of typical facial variation
3D facial photographs were available from 43 individuals with ACH and 8246 unselected controls, all with European ancestry (Fig. 1a). We obtained homologous facial configurations by non-rigidly mapping an atlas (composed of 7160 points) to each individual image24. Controls were Procrustes aligned to a common coordinate system, and principal component analysis (PCA) was applied to capture the major axes of typical facial variation. Projecting individuals with ACH into the same coordinate space accurately described syndromic facial shape variation, with a 0.49 mm average error between the original ACH shapes and corresponding projections (Supplementary Fig. 1a). Regions of higher reconstruction error coincided with the clinically most distinct regions (e.g., nasion and forehead). As a comparison, the average PCA reconstruction error of the controls was 0.35 mm (Supplementary Fig. 1b). While ACH syndromic samples could be coded well as linear combinations of these axes (principal components) of typical variation, they showed greater variation overall and were generally found towards the tail-end of the distribution (Supplementary Fig. 2).
Definition of ACH-informed phenotype as a quantitative trait
We compared 3D facial images of the ACH samples to the unselected controls (Fig. 1b) at multiple scales, starting from a global description of facial shape and gradually focusing on more local segments of shape variation (Supplementary Fig. 3a) determined by hierarchical spectral clustering25. By regressing facial shape onto syndrome status (ACH or control), we found that facial shape was significantly different between ACH and control samples in 41 out of the 63 facial segments (adjusted p < 0.05) (Supplementary Fig. 3b). For each of these significant segments, we established an ACH trait axis as the vector spanning the ACH and control shape means. These axes describe the facial shape effects associated with ACH (“ACH-derived facial trait”), such as frontal bossing and midfacial hypoplasia. Moving along the axes is equivalent to changing phenotypic severity (Fig. 2a).
a Achondroplasia (ACH) trait axis spanning the ACH and control mean shapes. Morphs on the left and right sides of the axis represent the extremes of the phenotypic spectrum. Controls (ID1-3) can be scored along the axis by measuring the angle between their individual vectors and the ACH trait vector. The facial variation of the three control individuals is visualized as a heatmap. Red areas on the facial shape correspond to a local outward deviation from the control mean shape, blue indicates inward deviation. b Distribution of the facial trait scores for the full face (segment 1) for both the ACH (in green) and control (in beige) datasets. Values smaller than 1 indicate more ACH-like; values greater than 1 indicate less ACH-like. The mean facial shape of the 5 lowest and highest-scoring individuals is shown for both ACH and control samples. c Manhattan plot of genome-wide associations. For each SNP, the lowest p-value (Canonical correlation analysis, right-tailed chi-square) across all 41 significant facial segments is plotted. The full horizontal line represents the genome-wide significance threshold (p = 5e-8). Candidate genes are annotated to each genome-wide significant locus (n = 19). Source data are provided as a Source Data file.
We quantified the extent to which the ACH-derived traits or shape effects exist in the control population by measuring facial similarity of the unselected controls to the ACH trait axes using the cosine distance, hereafter referred to as “syndrome-informed phenotyping” (Fig. 2a). This approach generated univariate scores, with controls that display ACH-like facial features having a low score (values close to 0), while individuals with an inverse phenotype (e.g., protrusion of the midface) have higher scores (values close to 2) (Fig. 2b). The control sample showed great variation in the ACH trait scores, yet a clear overlap was observed with the ACH cohort scored along the same axes (Supplementary Fig. 4). Furthermore, the ACH trait scores explained 2.6% of full facial shape variation in the control population, which is substantial, considering that sex and age explained 11.3% and 4.9% of variance in the same cohort, respectively.
Multivariate GWAS reveals polygenic background of ACH-derived facial traits
We next sought to identify SNPs associated with facial variation in the general population along the ACH trait axes. We combined the trait scores for all 41 significant facial segments into a matrix for the US and UK subsamples separately and performed a multivariate GWAS meta-analysis using canonical correlation analysis. Both the US and UK subsamples were independently used as discovery and replication cohorts before being stringently meta-analyzed, meaning a built-in replication was present. In total, we identified 1780 SNPs that reached genome-wide significance (p < 5e − 8). Significant SNPs were merged into 19 genomic loci that reached genome- and study-wide significance, revealing a polygenic basis for facial variation along a characteristic shape axis derived from ACH, a monogenic disorder (Fig. 2c and Supplementary Data 1). The 19 lead SNPs combined explained 1.20% and 1.19% of the multivariate ACH-like phenotype and 0.8% and 1.1% of total facial shape variation in the US and UK subsamples, respectively. While some lead SNPs correlated most with ACH-derived shape changes of the full face, others showed more localized effects, affecting only a specific aspect of the ACH facial phenotype (Supplementary Fig. 5).
We replicated these findings in several ways. First, 11/19 loci reached genome-wide significance in both arms of the meta-design independently, using the US subsample as the discovery cohort and the UK subsample as replication and vice versa. Second, we projected the ACH phenotype in an independent control cohort of Tanzanian ancestry (n = 2595) and found significant replications for 6/19 loci, despite the lower power and different LD structure of this cohort. Lastly, all the significantly associated loci have candidate genes in the immediate vicinity with well-established roles in craniofacial development and/or have previously been identified in GWAS of facial shape variation. Detailed results of the different replication strategies are provided in Supplementary Data 1. No significant associations were found for SNPs near FGFR3; neither did we find significant enrichment for associations with genes that interact with FGFR3 directly (p = 0.48) (Supplementary Data 2). However, STRING analysis of the GWAS-associated candidate genes showed plausible interactions with the FGFR3 network at higher levels (Supplementary Fig. 6).
Genetic loci associated with ACH-derived facial variation are enriched for processes related to skeletal development
Gene-set enrichment analysis26 of the 19 associated loci showed significant enrichment for biological processes related to cartilage growth and development, and skeletal development overall (Supplementary Data 3). To evaluate targeted enrichment of certain biological processes, we compared our findings to those of a GWAS of typical facial variation performed in the same unselected control group by White et al. 27. While the typical facial variation GWAS showed enrichment for a broad spectrum of processes related to embryonic development, the current ACH-informed GWAS was enriched for a specific subset of these biological processes, with all but two terms (30/32, 94%) also significantly enriched in the previous study27. There was a significantly higher enrichment for skeletal system developmental processes compared to the previous uninformed facial shape GWAS by White et al. 27 (Fig. 3a), with consistently higher enrichments for processes related to cartilage development such as chondrocyte differentiation and development, chondrocyte hypertrophy, and cartilage condensation in the ACH-informed GWAS (Supplementary Data 3). Interestingly, these same biological processes are at the core of ACH pathophysiology28. For other branches of system development (e.g., nervous system development, circulatory system development), we observed no significant difference between the ACH-informed and uninformed facial shape GWAS27 (Fig. 3a). Similarly, the ACH-informed GWAS genes were specifically enriched for skeletal developmental genes when compared against all genes previously identified through GWAS of facial shape (Fig. 3b), as well as against known craniofacial genes implicated in Mendelian syndromes13 (Supplementary Fig. 7a, b and Supplementary Data 4). A similar targeted enrichment was not observed in comparison to a negative control GWAS of inflammatory bowel disease29 (Supplementary Fig. 7c).
Relative fold enrichment of Gene Ontology (GO) biological processes enriched in the Achondroplasia (ACH) GWAS (diamonds) compared to different background sets. Terms with at least one gene in the background (circles) are shown for additional context. a ACH-informed GWAS versus uninformed GWAS of typical facial variation by White et al. 27. b ACH-informed GWAS versus all genes previously identified through GWAS of facial shape. Node size corresponds to the number of genes mapped to each process. P-values were estimated using a right-tailed hypergeometric test, alpha levels were adjusted to a 5% false discovery rate (FDR) using an adapted Benjamini-Hochberg procedure. Source data are provided as a Source Data file.
Genetic loci associated with ACH-derived facial traits are not enriched for layer-specific murine growth plate gene expression
Following the observed enrichment for cartilage-related processes, we further tested whether the ACH-informed GWAS genes were preferentially expressed in certain epiphyseal layers or at specific chondrocyte maturation stages. We analyzed public gene expression data from murine growth plates and chondrocytes but found no correlation between MAGMA gene-level p-values and gene expression specificity per epiphyseal layer or chondrocyte maturation stage. Differential expression analysis showed no significant associations between gene-level p-values and changes in gene expression between the early and late chondrocyte maturation stages.
ACH-derived facial trait shows significant genetic correlations with other ACH-linked traits
We calculated the Spearman genetic correlation30 between the ACH-derived facial trait and five ACH-associated traits, including body height, infant head circumference, obstructive sleep apnea, lung volume, and sitting height ratio (Supplementary Data 5). Although these five traits are all associated with the pathogenic G380R FGFR3 variant in individuals with ACH, it is unknown if they also correlate with ACH-like facial features in the general population. We found significant genetic correlations (FDR-corrected p < 0.05) between the ACH-derived facial trait and body height, obstructive sleep apnea, and sitting height ratio (Table 1), likely pointing to the same skeletal system pathways that showed enrichment in the previous analysis. In individuals with ACH, reduced endochondral bone growth causes disproportionate short stature with increased sitting height ratio, and can also lead to narrowing of the upper airway, which in turn may increase the risk for obstructive sleep apnea31,32. The ACH-derived facial trait also showed a stronger genetic correlation with sitting height ratio than with height, while uninformed facial shape by White et al. 27 showed no differential correlation with these two traits. In line with the current findings, previous research revealed that genetic loci associated with sitting height ratio show an increased specificity for biological processes related to bone and cartilage, compared to height-associated loci33. We observed no significant genetic correlations between ACH-derived facial variation, inflammatory bowel disease29, and hormone-sensitive cancer34, both serving as negative control disorders without known associations to ACH.
ACH-like phenotype can be obtained in the absence of FGFR3 mutations
We extended the single SNP analysis to a multivariate genotype-phenotype (MGP) approach that maps the coordinated effects of marker variation of the GWAS-associated genes onto craniofacial shape7,35. In a sample of 1154 Diversity Outbred (DO) mice, the primary MGP-associated effect axis resembled an ACH-like phenotype, characterized by a shortened and rounder skull, even when Fgfr3 marker variation was not included in the model (Fig. 4a, b). Applying the same method to an independent sample of 6772 human cranial vault shapes36, the primary MGP-associated effect axis revealed a more globular appearance of the cranial vault combined with a relative increase in biparietal diameter (Fig. 4c, d). Similar features are also observed in individuals with ACH, where the narrowing of the skull base can lead to a more rounded calvarium (upper part of the neurocranium). A significant increase in biparietal diameter, but not antero-posterior diameter, of the skull has also been described in individuals with ACH37.
Genetic marker loadings for the multivariate genotype-phenotype mapping (MGP) of the GWAS candidate genes onto (a) mouse craniofacial shape and (c) human cranial vault shape. In (a), the relative contributions of the different founder genotypes of the Diversity Outbred mice are shown. In (c), the loadings per principal component are indicated. Genes are ordered by their relative contribution to the associated shape effects shown in (b) and (d), respectively. The top row shows the mean craniofacial shape colored according to the difference between the upper and lower extremes of the MGP shape axis. Red areas indicate a local inward deviation, and blue indicates an outward deviation. The middle row shows the upper extreme of the MGP shape axis. The bottom row shows the lower extreme of the same shape axis. Source data are provided as a Source Data file.
Discussion
Unraveling the complex relationship between genomic and phenotypic variation is a central problem in biology. In this work, we introduce a syndrome-informed phenotyping method to study the connections between the biology of phenotypic variation in the general population and syndromic variation observed in Mendelian disease, using facial variation in ACH as a case example. Facial features in ACH make up a distinct and recognizable phenotype22, represented by the significant facial shape differences between ACH individuals and controls for most parts of the face. In line with existing literature, we found that these differences largely result from changes in phenotypic extremeness9,38. ACH individuals could be well positioned along the extremes of the axes of typical facial variation, while some shape deviations remained in those regions of the face that constitute the characteristic ACH facial gestalt. Quantification of ACH-derived facial features in unselected controls showed that individuals vary along the ACH-derived shape axis and that ACH-like facial shape variation is clearly present in a subset of these control individuals. Though the ACH-derived phenotype axis is derived from a monogenic condition, GWAS of facial shape in an unselected control population revealed a polygenic background for the ACH-derived phenotype scores in this control population. Furthermore, ACH-like craniofacial variation could also be reproduced in two independent datasets of DO mice and control human cranial vaults relying solely on the uncovered polygenic background.
Finding a polygenic ACH-like signal in the outcross DO model is significant for several reasons. The DO is a heterogeneous stock of laboratory mice bred through repeated outcrossing of eight inbred founder strains to maximize genetic diversity39,40. As a random mosaic of haplotypes of these founders, each animal is genetically and phenotypically unique, closely resembling the variation found in human and other natural populations. Moreover, thanks to outcrossing over 27 + generations, recombination frequency is higher in the DO, allowing for genetic mapping of smaller regions and thus higher-resolution studies of complex traits, like craniofacial shape41. While mouse models are regularly used to validate complex disease GWAS loci using single or multiple gene knockout designs42, the MGP mapping in DO mice we performed here offers a different perspective by replicating a complex phenotype using a highly polygenic signal.
We observed no significant genetic associations with ACH-derived facial shape in the vicinity of the FGFR3 locus. Similarly, previous GWAS of human facial shape have not found significant associations near FGFR313. While the lack of associations between common variants near FGFR3 and ACH-derived variation does not rule out their contribution, it is also not unexpected given the clinical knowledge on genotype-phenotype correlations for FGFR3. Rare variants with large effects in FGFR3 do not always affect facial features, such as in hypochondroplasia, where the mutation causes short stature but no facial dysmorphism20. In addition, common variants in FGFR3 have been associated with idiopathic short stature21. This suggests that while both facial shape and height share a marked sensitivity to disturbances by specific large-effect variants in FGFR3, facial shape is not generally regulated by FGFR3, unlike height. These observations might indicate that facial and skeletal development have different tissue- and/or timepoint-specific sensitivities to disturbances by FGFR3, warranting further research.
All genes that were identified through the current GWAS had previously been linked to facial variation in the general population13, indicating that the ACH-informed phenotype is determined by genes that play a role in facial morphology more broadly. Interestingly, the polygenic background was specifically enriched for biological processes that are disturbed in ACH, such as chondrocyte hypertrophy and differentiation28, and genetic correlations were found between ACH-derived facial variation in controls and ACH-linked features. Strikingly, the genetic correlation with sitting height ratio, a trait enriched for chondrocyte developmental processes, was as high in the current GWAS as in a previous GWAS on typical facial variation with ten times more loci and thus much higher power27,33,43. These results are in line with previous findings that the effects of major mutations often co-align with the directions of effect linked to broader developmental processes that are affected by those mutations7.
The convergence of genetic effects onto shared axes of shape variation stems from the highly integrated nature of the human face6,44. While myriad genes can influence facial morphology, the potential directions (axes) in which facial shape can vary are constrained by the developmental processes on which they act6, as is illustrated by the genetic overlap between different polygenic facial phenotypes14. Here, we tested the phenotypic and genetic overlap between typical facial variation and a Mendelian facial phenotype. Our findings indicate that both typical human facial variation and facial variation associated with rare Mendelian syndromes occur along the same developmental axes. These developmental axes appear to be determined by a background of common polygenic variation. Rare Mendelian genetic variants with major effects appear to move individuals further toward the extreme end of these axes. The polygenic variation that underlies these developmental axes thus likely contributes to the range of variation seen in the corresponding Mendelian syndromes. Indeed, this may partially explain the occurrence of subclinical phenotypes in conditions such as orofacial clefts45, as well as a tendency for unaffected relatives of probands with craniofacial syndromes to sometimes themselves be misclassified as syndromic by an automated syndrome classification tool based on 3D facial imaging46. It is feasible that some disease-associated genetic variants perturb developmental processes that are not relevant to typical facial variation. In these instances, we would expect the direction of disease-associated facial shape variation to deviate significantly from typical facial variation, and we would not expect a polygenic basis for that direction of variation in the general population.
The finding that typical and syndromic facial variation are related through shared developmental axes also has implications for the mechanisms of variable expressivity and penetrance. This phenomenon likely occurs because developmental processes drive directions of variation on which multiple genomic and environmental influences may converge. For ACH, this would mean that the FGFR3 gain-of-function mutation produces a large-scale effect on an axis of variation that exists in the general population and is driven by variation in growth at the cranial synchondroses and cartilaginous growth centers in early craniofacial development. Mutations that alter chondrocyte proliferation or maturation in mice show directions of effect that broadly resemble ACH, including doming of the neurocranium, decreased cranial base flexion, and reduction in midfacial prognathism5,47. If individuals with ACH vary along this same multivariate axis of facial shape, modulation of the degree of cartilage proliferation could explain variation in phenotypic severity for ACH. Conversely, when individuals with ACH vary in directions orthogonal to this axis, however, this would point towards other developmental drivers of variation6.
The value of integrating common and Mendelian disease genetics was recently demonstrated by Blair et al. 9, who mapped heterogeneous symptom data to latent quantitative traits for various Mendelian diseases. Genomic association testing of the newly derived traits revealed common variants predictive of disease outcome; however, the inference of latent traits required phenotypes available at the biobank scale, limiting the applicability of that approach. The syndrome-informed framework we present here similarly uses quantitative trait scores for a Mendelian phenotype to reveal common variants associated with the Mendelian phenotype. Our method is applicable to many other phenotypes and, importantly, to relatively small sample sizes, which remains a major challenge in studies of rare diseases. Here, we studied ACH (n = 43) as proof of principle, but the syndrome-informed framework can also be generalized to other genetic disorders. For example, genomic analysis of Pierre Robin Sequence-derived phenotypic scores identified genetic variants near the SOX9 locus, which is commonly linked to the disorder, among other genetic loci that are thought to conjointly modulate the facial phenotype48. In addition, applying our approach to genetic conditions with a poorly understood pathophysiology could highlight developmental and biological pathways of importance. Similarly, by defining a shape axis based on a group of individuals with similar phenotypic features but unknown diagnoses, our approach could provide insight into the shared genetic etiology and impaired pathways in these individuals. These shape axes can be based on a multidimensional linear space as we did here, but deep learning frameworks of syndromic facial shape could also be used to extract phenotypes of interest49,50,51. Lastly, of particular importance, the polygenic background identified using a syndrome-informed approach may highlight interesting targets to identify putative modifiers of phenotypic expression in monogenic disorders23.
In conclusion, genetically mapping ACH-derived phenotypic effects in the general population highlighted a polygenic basis for a shape axis determined by a monogenic disorder. Jointly modeling these candidate genes, in turn, revealed that ACH-like phenotypes can be generated without FGFR3. These findings have important implications for unraveling the relationship between discrete and continuous variation and for understanding the role of causative genes in Mendelian disorders. If causative genes act on already existing axes of variation determined by developmental processes, then they are causes in only a limited sense in the background developmental context. Disease-associated variants may be more productively seen as belonging to a larger set of potential perturbations capable of shifting phenotypes along developmentally determined directions of variation. This framework also promotes understanding of variable expressivity and penetrance in genetic disease, which is of great value to aid diagnosis and improve patient outcomes.
Methods
Ethics statement
This study was approved by the ethical review board of KU Leuven and University Hospital Leuven (S60568, Leuven, Belgium), and the University of Calgary (REB14-0340, Calgary, Canada). Local institutional approval was granted for access to the FaceBase Repository (S60658, Leuven, Belgium). The previously collected data on humans were obtained under the appropriate local ethical approvals, and all participants gave written informed consent prior to participation. Ethical approval for the ALSPAC study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. The work on mouse craniofacial shape was performed according to protocols approved and reviewed by animal care committees at the University of Calgary (AC13-0268) and the University of Alberta (AUP1149).
Sample composition
We obtained 3D facial photos, demographic data (age, sex, self-reported ethnicity), and clinical and/or molecular testing results of 70 individuals with achondroplasia from the online FaceBase repository (FB00000861 [https://doi.org/10.25550/TJ0])52. From this group we excluded individuals of self-reported non-European descent (n = 21) and those with incomplete or missing metadata (n = 6) to retain a curated sample of 43 individuals. Figure 1a illustrates the dataset broken down by age and self-reported sex.
The unselected control sample consisted of previously collected 3D facial images, demographics (age, self-reported sex, genomic ancestry), and imputed genotype data of 8246 unrelated individuals of European descent originating from the United States (US) and the United Kingdom (UK)19. The US dataset (n = 4680) included samples from the 3D Facial Norms cohort53,54 and studies at the Pennsylvania State University and Indiana University-Purdue University Indianapolis. Detailed sample characteristics and recruitment details are provided by White et al. 27. The UK sample (n = 3566) consisted of participants from the Avon Longitudinal Study of Parents and Children (ALSPAC), a UK-based birth cohort study55,56. A total of 14,541 pregnant women with an expected delivery date between 1 April 1991 and 31 December 1992, were initially recruited. Extensive information and biological samples have been collected from these mothers and their offspring at various time points, of which details can be found on the study website through a fully searchable data dictionary (http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/). For both US and UK samples, European participants were identified by projecting them into a principal component space constructed using the 1000 G Phase 3 dataset, as described in detail by White et al. 27. Participants with missing covariate information (e.g., age, sex) or with insufficient image quality were excluded. Figure 1a illustrates the age and sex distribution of the final dataset.
The replication cohort (n = 2595) consisted of 3D facial images, demographics (age, sex, height, weight, ancestry) and imputed genotype data from unselected Tanzanian individuals. Detailed sample characteristics and recruitment details are provided by Cole et al. 57,58.
Genotyping and imputation
Genotyping and imputation of the European control sample were performed as described previously27. In brief, genotypes of the three different US subsamples, separately, were phased using SHAPEIT2 (v2.r900)59 and imputed to the 1000 Genomes Phase 3 reference panel60 using the Positional Burrows-Wheeler Transform pipeline (v3.1)61 of the Sanger Imputation Server (v0.0.6)62. SNP-level (INFO score < 0.8) and genotype per participant-level (genotype probability < 0.9) filters were used to omit poorly imputed variants. Finally, a single US cohort was obtained by merging the subsamples and filtering the SNPs based on missingness across individuals (–geno 0.5), minor allele frequency (–maf 0.01), and Hardy-Weinberg equilibrium (p < 1e-6), resulting in 7,417,619 SNPs for analysis.
For the UK dataset, imputed genotypes were obtained directly from the ALSPAC database. SHAPEIT246 was used for pre-phasing of haplotypes, and imputation against the 1000 Genomes Phase 1 reference panel (Version 3)63 was performed using IMPUTE264. After post-imputation quality control, the UK dataset contained 8,629,873 SNPs for analysis. Because restrictions are in place against merging the ALSPAC genotypes with any other genotypes, these were held separately during the analysis.
In total, 7,417,619 SNPs were overlapping between US and UK datasets, which were used in subsequent genetic association analyses. SNPs on the X chromosome were coded 0/2 for hemizygous males, to match with the 0/1/2 coding for females.
Facial phenotyping and segmentation
3D facial images were acquired using three digital stereophotogrammetry systems (3dMDface, Vectra H1, Creaform Gemini) and one laser scanning system (Konica Minolta VI-900). We non-rigidly registered an average facial atlas to each 3D image using the MeshMonk toolbox (v.0.0.6)24 to obtain a standard facial representation defined by 7160 homologous quasi-landmarks. Due to the bilaterally paired construction of the quasi-landmarks constituting the atlas, registered images were symmetrized by averaging the original configuration and its bilaterally reflected copy following Procrustes superimposition. Images were visually inspected and excluded if the registration process had failed.
All symmetrized quasi-landmark configurations of the combined US and UK control sample were aligned by generalized Procrustes analysis (GPA) and adjusted for self-reported sex, age, and age-squared in a partial least-squares regression (PLSR, function plsregress in MATLAB 2017b and 2023b). To retain the original shape information as closely as possible, we performed all analyses on Procrustes coordinates without any projection into the tangent space. We opted for PLSR because it does not assume the absence of collinearity and it is broadly applicable to diverse cases, however, multiple linear regression could also be used here. Next, the facial shape was divided into 63 predefined global-to-local segments obtained by hierarchical spectral clustering of the unselected controls, as described elsewhere25, providing facial segments at five hierarchical levels of scale. In each segment separately, symmetrized and covariate-adjusted shapes were aligned again using GPA to focus on local shape variation and avoid statistical biases associated with global alignments, as regions with more quasi-landmarks can dominate the superimposition relative to those with less (e.g., cheeks vs. nose). Dimensionality was reduced by principal component analysis (PCA) and the optimal number of principal components (PC) to retain was determined by parallel analysis. This yielded PCs that were highly correlated to the ACH trait axis, yet also devoid of random noise (Supplementary Methods, Supplementary Note 1 and Supplementary Figs. 8–10). We normalized the projections on each PC to have unit variance by dividing each projection by the standard deviation of all projections. In this variance standardization step, all PCs (i.e., axes of variation) are allowed to contribute equally, irrespective of the amount of variance that is carried by each PC. Angles measured in this normalized space closely align with perceived facial similarity65 (Supplementary Figs. 11, 12).
Next, the ACH sample was aligned onto the mean control shape using Procrustes superimposition, and shapes were corrected for the same sex and age covariates using the regression coefficients from the PLSR model of the control sample. We then applied the same facial segmentation to the ACH sample and participants were projected into each segment-specific principal component space, again normalizing by dividing each projection by the standard deviation of all projections from the control sample.
To evaluate the extent to which syndromic facial variation was preserved by projecting ACH samples into a PC space obtained from unselected controls, we quantified the reconstruction error. An individual’s PC coefficients were transformed back to the original landmark space by multiplying them with the eigenvectors and adding centroid size. Subsequently, the reconstruction error was quantified as the root mean square error across all 7160 quasi-landmarks between the original image and its corresponding reconstruction. We report the mean reconstruction error (in mm) across the full sample.
Achondroplasia-informed phenotyping
For each of the 63 facial segments separately, an ACH-derived facial trait was defined as follows. First, in the variation standardized space, we established an ACH facial trait as the shape axis passing through the averages of the ACH and control samples. We then obtained univariate trait scores for each control individual by computing the cosine distance between their individual vector (going from the average of the control sample to their individual PC projections) and the ACH trait vector (going from the average of the control sample to the ACH average)66,67. Cosine distance is equal to 1 minus the cosine of the angle between these two shape vectors, with scores close to 0 describing an ACH-like phenotype, while scores close to 2 indicate an inverse phenotype. The scores were computed in a leave-one-out scheme such that each individual was excluded from learning the trait vectors on which they were scored. To evaluate the proportion of facial shape variation in the control population described by the ACH axis, we regressed global facial shape onto the univariate trait scores obtained for segment 1 (full face) using PLSR and reported R-squared as the percentage of phenotypic variation explained.
We additionally tested whether ACH facial shapes differed significantly from a matched control sample of equal size (Supplementary Data 6). In a random order, we matched each ACH sample to a control sample of the same sex that was closest in age. The selected control was then omitted from the pool of potential matches. We co-aligned the covariate-adjusted and symmetrized quasi-landmarks of both groups using GPA and regressed facial shape onto group membership using PLSR. For each segment separately, a p-value was generated by a permutation test on R-squared with 10,000 permutations. We determined the effective number of independent tests based on the eigenvalues of the correlation matrix of the univariate traits per segment and adjusted p-values accordingly (Sidak correction)68 and significant differences (adjusted p < 0.05) were observed in 41 out of 63 facial segments. To calculate phenotypic extremeness, we projected all individual facial shapes into the control PC space, calculated the Mahalanobis distances to the control mean, and computed p-values associated with these distances using the chi-squared cumulative distribution function. Supplementary Data 7 enumerates every individual’s distance and upper tail probability, which is the likelihood that their Mahalanobis distance is larger or more extreme than what would be expected by chance. To ensure our shape effects were robust to differences in sample size and composition, we completed a series of morphometric simulations. The distances between the ACH and control mean shapes are stable with varying sample sizes, and there is a high correlation between an individual’s trait scores, irrespective of the size of the control or ACH sample used to calculate the trait scores. Detailed methods and results are provided in the Supplementary Information file (Supplementary Methods, Supplementary Note 1 and Supplementary Figs. 13, 14).
Genome-wide association study
For both US and UK datasets separately, we combined the ACH-derived trait scores across the 41 significant segments into a single phenotype matrix ([n x m] with nUS = 4680 controls, nUK = 3566 controls, and m = 41 facial segments). This phenotype matrix was tested for genome-wide SNP associations in a multivariate association framework using canonical correlation analysis (CCA) following White et al. 27. However, instead of performing a separate GWAS per facial segment, scores generated across multiple segments were now combined into a single multivariate GWAS. The GWAS was conducted following a two-stage design with built-in replication, with both US and UK cohorts alternating as the discovery and replication sets. First, we applied CCA in the discovery sample to obtain association p-values as well as the shape axis maximally correlated with each SNP. Next, the replication sample was projected onto this axis. Because the maximally correlated shape axes depend on the dataset, this enforced a strict consistency of the phenotype and, thus more stringent meta-analysis. The resulting univariate trait scores were subsequently tested for genetic associations in a linear regression model. Finally, discovery and replication p-values were aggregated in a meta-analysis using Stouffer’s method69. Per SNP, the GWAS design generated two meta-analysis p-values, metaUS, and metaUK, reflecting the sample that served as the discovery set. Because CCA does not accommodate adjustments for covariates, we corrected the dependent (facial shape) and independent variables (genotypes) for height, weight, facial size, four genomic ancestry axes, and a camera system using PLSR prior to GWAS.
The lowest meta-analysis p-value per SNP was selected and compared against the genome-wide Bonferroni threshold (p < 5e-8). To address the increased multiple testing burden, we empirically estimated the null distribution of the minimal meta-analysis p-value on 10,000 genotype permutations of 500 randomly selected SNPs. This allowed us to estimate the number of effective independent tests per SNP as 1.713 (SD = 0.080), obtained by dividing 0.05 by the fifth percentile of the empirical null distribution70 and averaging the estimate across the 500 SNPs. The study-wide significance threshold was subsequently estimated as p < 2.92e-8 (p < 5e-8 / 1.713). We observed 1780 SNPs at the level of genome-wide significance, which were clumped into 19 independent loci as follows. Starting from the lead SNP (lowest p-value), SNPs within 10 kb or within 1 Mb but in linkage disequilibrium (r2 > 0.01) were clumped into the same locus represented by the lead SNP. Next, considering the lead SNPs only, signals within 10 Mb and r2 > 0.01 were merged. Third, any locus with a singleton lead SNP was removed. For each of the lead SNPs, the nearest gene was assigned as the candidate gene.
In the multivariate GWAS setup, CCA extracts the linear combination of the ACH trait scores for the 41 significant facial segments that maximally correlate with the SNP being tested. From the CCA loadings, we examined which of the facial segments contributed most to the observed GWAS signals to delineate the associated shape effects.
Replication
In addition to the built-in replication, we further validated the identified GWAS loci in two ways. First, we assessed whether these loci had previously been associated with craniofacial shape more broadly, regardless of the specific phenotype. We used LDlink71 to determine the LD between our 19 lead SNPs and SNPs previously associated with craniofacial variation13. SNPs were retained if r2 > 0.1 (Supplementary Data 1). We also report studies that identified SNPs in a ± 250 kb window surrounding the 19 lead SNPs, but for which LD was below the set threshold. This approach provides an overview of existing evidence of broader craniofacial association for the identified loci.
In a second approach, we performed a strict replication of the identified loci in an independent cohort of Tanzanian ancestry (n = 2595); this time focusing on the same craniofacial phenotype. Similar to the phenotyping pipeline of the European controls, we symmetrized and corrected the images from the Tanzanian dataset for sex, age2, height, weight, centroid size, genomic ancestry (#5), and closed eyes58 using PLSR. We applied the predefined facial segmentation from the European cohort to these images and performed PCA followed by parallel analyses to build a shape space per facial segment. We subtracted the European mean shape from the ACH individuals and then added the Tanzanian mean to each image. For each facial segment, ACH individuals were projected into the respective PCA space, and the mean ACH shape was computed. Using a leave-one-out approach, all Tanzanian individuals were then projected onto the ACH phenotype axis, i.e., the axis connecting the mean Tanzanian and mean ACH shapes, to generate univariate trait scores per segment. We combined the scores for all significant segments (n = 41) into a phenotype matrix and performed a multivariate GWAS using CCA as described above. For each of the 19 lead SNPs, a replication p-value was calculated as follows. Per locus, we identified all SNPs in LD (r2 > 0.1) with or within a 50 kb window of the discovery lead SNP (or proxy SNPs defined by LDproxy71 if the lead SNP was not directly genotyped). Next, we estimated an LD matrix for the identified SNPs in the Tanzanian cohort to determine the number of independent SNPs tested, and we adjusted the replication p-values using the Sidak correction. The minimum adjusted p-value per locus was considered the locus p-value, which was then adjusted to a 5% false discovery rate (FDR) using the Benjamini-Hochberg procedure72 (Supplementary Data 1).
Protein network analysis
We searched the STRING database73 for known interactions with FGFR3. We focused on high-confidence interactions (confidence score 0.7) derived from curated databases or experimentally determined (Supplementary Data 2). SNP p-value data were aggregated to gene-association scores (gene-level p-values), and we evaluated enrichment for associations with the FGFR3 network using MAGMA (v.1.08)74. Next, we performed protein-protein interaction analysis of the GWAS candidate genes and evaluated potential associations, direct or indirect, with the FGFR3 network using default settings (confidence score 0.4, including all interaction sources).
Gene set enrichment analysis
We used GREAT (v.4.0.4)26 to associate the 19 genetic loci to Gene Ontology (GO) annotations and calculated the enrichment of biological processes for these annotations. To assess targeted enrichment of processes specific to our ACH-informed phenotyping approach, we compared gene set enrichment of all biological processes that reached significance (FDR-adjusted p < 0.05) in the hypergeometric test to a recent GWAS of typical facial variation in the current European control sample by White et al. 27. In addition, we repeated GO term enrichment against three background sets of craniofacial-associated genes as summarized by Naqvi et al. 13. The first set consisted of all genes identified in 25 previously published GWAS of facial shape, the second set contained genes with known roles in Mendelian craniofacial disorders and/or orofacial clefting, and the third set is a combination of both GWAS and disease-associated genes. As a negative control, we repeated the analysis with results from a recent GWAS of inflammatory bowel disease as foreground set29.
A database of genes and annotated ontology terms was downloaded from the StringDB website (https://stringdb-static.org/download/protein.enrichment.terms.v11.5.txt.gz). For each term, τ, a hypergeometric p-value was calculated as
with N the total number of genes in the background set, n the total number of genes in the foreground set, K the number of genes with annotation τ in the background set, and k the number of genes with annotation τ in the foreground set. Alpha levels were adjusted to a 5% false discovery rate (FDR) using the Benjamini-Hochberg procedure72 with Li and Ji’s adaptation75, where Meff was estimated from the binary gene-term matrix.
Gene expression in chondrocytes
We computed gene-level p-values (gene-association scores) based on the ACH-informed GWAS summary statistics using MAGMA74, as well as from published GWAS summary statistics of typical facial shape in the same unselected control population27, height76, and inflammatory bowel disease29.
We downloaded published microarray data from murine growth plate dissections from the GEO data repository77 (GSE87605 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE87605]). Probe identifiers were mapped to mouse genes using the Mouse Genome Informatics database78. Mouse gene names were subsequently mapped to their human homologs using the Ensembl BioMaRt tool79. We calculated gene expression specificity scores per epiphyseal layer by dividing the expression of each gene per layer by the total expression of that gene. We calculated gene expression Z-scores per chondrocyte maturation stage by averaging gene expression across the four available samples for day 3 of embryonic development (early maturation stage) and day 10 of embryonic development (late maturation stage). Finally, we evaluated the Pearson correlation between gene-level p-values, specificity scores for expression per epiphyseal layer, and Z-scores per chondrocyte maturation stage.
From the GEO data repository77, we downloaded murine chondrocyte RNAseq data (GSE225796 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE225796]) and used the DESeq function (DESeq2 package) in R (v.4.2.0) to perform differential gene expression analysis. DESeq2 transforms read counts based on size factors and dispersion, fits a negative binomial generalized linear model (GLM), performs a Wald significance test, and assesses differentially expressed genes based on a false discovery rate cutoff of 0.05 using the Benjamini-Hochberg procedure.
Genetic correlation
To assess the extent to which genome-wide profiles of association were shared with known ACH-related traits, we computed the Spearman correlation between two vectors of linkage disequilibrium (LD)-block stratified association p-values. This approach provides a multivariate, robust alternative to LD score regression (LDSC)80,81 for computing genetic correlations and is applicable to unsigned summary statistics yielded by CCA30. We collected publicly available genome-wide summary statistics for five traits with known associations with the achondroplasia phenotype (body height76, head circumference82, lung volume83, obstructive sleep apnea syndrome84 and sitting height ratio85) and for two putative unrelated traits (hormone-sensitive cancer34 and inflammatory bowel disease29) to serve as negative controls. Details on the selected traits and links to relevant publications are summarized in Supplementary Data 5. LD scores were readily obtained from the 1000 Genomes European data46, and SNPs were filtered to HapMap3 SNPs, excluding SNPs in the Major Histocompatibility Complex region86. For each LD block, we computed the mean SNP -log10(p-value) and computed a rank-based Spearman correlation using the average association value for that LD block. We estimated the standard error of the Spearman correlation using a bootstrapping approach with 100 resampling cycles.
Multivariate genotype-phenotype mapping
We applied the GWAS candidate genes to the recent multivariate genotype-phenotype (MGP) model in Diversity Outbred (DO) mice7 in R (version 4.2.0). DO mice were generated by breeding eight founder inbred strains, to better mimic genetic diversity in humans39,40. Composition (n = 1154 samples), genotyping (n = 123,309 markers), and landmarking (n = 54 3D landmarks) of the DO sample are described in detail by Aponte et al. 7. In a regularized partial least squares model, the MGP method identifies axes of shape variation that maximally covary with genetic marker variation for the selected gene set. The regularization parameter was determined at 0.06 based on 10-fold cross-validation. For each of the genes, the MGP model outputs their overall contribution, or marker loadings, to the estimated shape axes. The principal axis of shape covariation is visualized directly onto the mouse craniofacial shape as a heatmap, representing the displacement along the surface normals with reference to the mean DO shape.
For 6772 multi-ancestry participants of the Adolescent Brain Cognitive Development (ABCD) study87, the outer head surface was extracted from magnetic resonance images as described by Goovaerts et al. 36. The MeshMonk toolbox24 was used to perform rigid and subsequently nonrigid surface registration using a full-head template comprising 28,218 quasi landmarks. From this, we cropped out the area covering the cranial vault (n = 11,410 quasi landmarks), encompassing the supraorbital ridge, and extending toward the occipital bone. Shapes were then adjusted for age, sex, weight, height, cranial size, scanner site, and the first 10 genomic PCs using PLSR after GPA alignment. Following PCA and parallel analysis, 65 orthogonal axes of cranial vault shape variation were retained and normalized to unit variance. Using CCA, we optimized the linear combinations of ACH lead SNPs and vault shape PCs to extract a maximally correlated latent phenotype. For SNPs not found in the ABCD sample, we searched for proxy SNPs within the 1000 Genomes Phase 3 European sample and selected the SNP in the strongest LD with the original SNP and with at least r2 > 0.9. The latent phenotypic traits were visualized directly onto the head surface as a heatmap, representing the displacement along the surface normals with reference to the mean head surface.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Phenotype, genotype, and demographic data of the Achondroplasia cohort were collected previously and were obtained from the online FaceBase repository (FB00000861 [https://doi.org/10.25550/TJ0]). Access to these data requires proper institutional ethics approval and approval from the FaceBase data access committee. The UK control sample (n = 3566) consisted of participants from the Avon Longitudinal Study of Parents and Children (ALSPAC). The ALSPAC data were collected previously and will be made available to bona fide researchers on application to the ALSPAC Executive Committee (http://www.bris.ac.uk/alspac/researchers/data-access). The US control dataset (n = 4680) was collected previously and included samples from the 3D Facial Norms cohort and studies at the Pennsylvania State University and Indiana University-Purdue University Indianapolis. All the genotypic markers for the 3D Facial Norms dataset are available through the Database of Genotypes and Phenotypes (dbGaP) controlled-access repository (phs000949.v1.p1 [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000949.v1.p1]). The 3D facial surface models are available through the FaceBase Consortium (FB00000491.01 [https://doi.org/10.25550/VWP]). The participants making up the PSU and IUPUI datasets were previously collected without broad data-sharing consent. Given the highly identifiable nature of both facial and genomic information and unresolved issues regarding risk to participants, we opted for a more conservative approach to participant recruitment. Broad data sharing of the raw data from these collections would thus be a legal and ethical violation of the informed consent obtained from the participants, and therefore, these data are not publicly accessible. This restriction is not because of any personal or commercial interests. Additional details can be requested from M.D.S. and S.W. for the PSU and IUPUI datasets, respectively. Phenotype data for the Tanzanian sample were collected previously. These data are accessible through the online FaceBase repository (FB00000667.01 [https://doi.org/10.25550/TX4]). Access to these data requires proper institutional ethics approval and approval from the FaceBase data access committee. Genotype data for the Tanzania sample were deposited in dbGAP (phs000622.v1.p1 [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000622.v1.p1]). All the data and detailed information for the ABCD Study, including MRI scans, genetic markers, and covariates, were collected previously and are available under restricted access through the ABCD data repository (https://nda.nih.gov/abcd/) upon completion of the relevant data use agreements. The ABCD data repository grows and changes over time. The ABCD data used in this report came from data release 3.0 (https://doi.org/10.15154/1519007 and https://doi.org/10.15154/1528459) and is described on https://nda.nih.gov/study.html?id=1926. Publicly available data used were the 1000 G Phase 3 data (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/), the list of HapMap 3 SNPs excluding the MHC region (http://ldsc.broadinstitute.org/static/media/w_hm3.noMHC.snplist.zip), GWAS summary statistics from various sources (GWAS catalog accession numbers GCST90007256 (Normal facial variation GWAS), GCST004132 (Inflammatory bowel disease GWAS), GCST90102436 (Hormone-sensitive cancer GWAS), GCST90016668 (Lung volume GWAS) and GCST000817 (Body height GWAS); https://egg-consortium.org/Head-circumference-2022.html (Infant head circumference GWAS), https://www.finngen.fi/en/access_results (Obstructive sleep apnea GWAS)), hierarchical spectral facial clustering (https://doi.org/10.6084/m9.figshare.7649024), a database of genes and annotated ontology terms from the StringDB website (https://stringdb-static.org/download/protein.enrichment.terms.v11.5.txt.gz), murine chondrocyte RNAseq data from the GEO data repository (GSE225796 [https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE225796]), microarray data from murine growth plate dissections from the GEO data repository (GSE87605), the multivariate genotype-phenotype model in Diversity Outbred mice (https://github.com/j0vid/MGP_shiny and https://doi.org/10.25550/1-731C). Summary data from the GWAS on the sitting height ratio is available from Dr Joel Hirschhorn (Joel.Hirschhorn@childrens.harvard.edu). The GWAS summary statistics generated in this study have been deposited on the GWAS Catalog under accession codes GCST90454200 (US database) and GCST90454201 (UK database). The post-GWAS analysis data and phenotyping robustness data generated in this study are provided in the Supplementary Data files and the Source Data file. Source data are provided in this paper.
References
Rahim, N. G., Harismendy, O., Topol, E. J. & Frazer, K. A. Genetic determinants of phenotypic diversity in humans. Genome Biol. 9, 215 (2008).
Hallgrimsson, B., Mio, W., Marcucio, R. S. & Spritz, R. Let’s Face it—complex traits are just not that simple. PLOS Genet. 10, e1004724 (2014).
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Littman, J., Yang, W., Olansen, J., Phornphutkul, C. & Aaron, R. K. LRP5, Bone mass polymorphisms and skeletal disorders. Genes 14, 1846 (2023).
Parsons, T. E., Downey, C. M., Jirik, F. R., Hallgrimsson, B. & Jamniczky, H. A. Mind the gap: Genetic manipulation of basicranial growth within synchondroses modulates calvarial and facial shape in mice through epigenetic interactions. PLOS ONE 10, e0118355 (2015).
Hallgrímsson, B. et al. Deciphering the palimpsest: Studying the relationship between morphological integration and phenotypic covariation. Evol. Biol. 36, 355–376 (2009).
Aponte, J. D. et al. Relating multivariate shapes to genescapes using phenotype-biological process associations for craniofacial shape. ELife 10, e68623 (2021).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: From polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Blair, D. R., Hoffmann, T. J. & Shieh, J. T. Common genetic variation associated with Mendelian disease severity revealed through cryptic phenotype analysis. Nat. Commun. 13, 3675 (2022).
Crouch, D. J. M. et al. Genetics of the human face: Identification of large-effect single gene variants. Proc. Natl. Acad. Sci. USA 115, E676–E685 (2018).
Marouli, E. et al. Rare and low-frequency coding variants alter human adult height. Nature 542, 186–190 (2017).
O’Neal, W. K. & Knowles, M. R. Cystic fibrosis disease modifiers: Complex genetics defines the phenotypic diversity in a monogenic disease. Annu Rev. Genom. Hum. Genet 19, 201–222 (2018).
Naqvi, S. et al. Decoding the human face: Progress and challenges in understanding the genetics of craniofacial morphology. Annu. Rev. Genom. Hum. Genet. 23, 383–412 (2022).
Boehringer, S. et al. Genetic determination of human facial morphology: Links between cleft-lips and normal variation. Eur. J. Hum. Genet. 19, 1192–1197 (2011).
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Bellus, G. A. et al. Achondroplasia is defined by recurrent G380R mutations of FGFR3. Am. J. Hum. Genet. 56, 368–373 (1995).
Hoover-Fong, J. E. et al. Growth in achondroplasia including stature, weight, weight-for-height and head circumference from CLARITY: achondroplasia natural history study—a multi-center retrospective cohort study of achondroplasia in the US. Orphanet J. Rare Dis. 16, 522 (2021).
Zhou, Z.-Q., Ota, S., Deng, C., Akiyama, H. & Hurlin, P. J. Mutant activated FGFR3 impairs endochondral bone growth by preventing SOX9 downregulation in differentiating chondrocytes. Hum. Mol. Genet. 24, 1764–1773 (2015).
Rignol, G. et al. Longitudinal imaging of the skull base synchondroses demonstrate prevention of a premature ossification after recifercept treatment in mouse model of achondroplasia. JBMR 6, e10568 (2021).
Bober, M. B., Bellus, G. A., Nikkel, S. M. & Tiller, G. E. Hypochondroplasia. (University of Washington, Seattle, Seattle (WA), 1993).
Bellus, G. A. et al. Distinct missense mutations of the FGFR3 Lys650 codon modulate receptor kinase activation and the severity of the skeletal dysplasia phenotype. Am. J. Hum. Genet. 67, 1411–1421 (2000).
Pauli, R. M. Achondroplasia: a comprehensive clinical review. Orphanet J. Rare Dis. 14, 1 (2019).
Dudakovic, A., Nam, H. K., van Wijnen, A. J. & Hatch, N. E. Genetic background dependent modifiers of craniosynostosis severity. J. Struct. Biol. 212, 107629 (2020).
White, J. D. et al. MeshMonk: Open-source large-scale intensive 3D phenotyping. Sci. Rep. 9, 6085 (2019).
Claes, P. et al. Genome-wide mapping of global-to-local genetic effects on human facial shape. Nat. Genet. 50, 414–423 (2018).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
White, J. D. et al. Insights into the genetic architecture of the human face. Nat. Genet. 53, 45–53 (2021).
Schibler, L. et al. New insight on FGFR3-related chondrodysplasias molecular physiopathology revealed by human chondrocyte gene expression profiling. PLoS ONE 4, e7633 (2009).
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
Naqvi, S. et al. Shared heritability of human face and brain shape. Nat. Genet. 53, 830–839 (2021).
Jordan, A. S., McSharry, D. G. & Malhotra, A. Adult obstructive sleep apnoea. Lancet 383, 736–747 (2014).
Tenconi, R. et al. Sleep-disordered breathing and its management in children with achondroplasia. Am. J. Med. Genet. A 173, 868–878 (2017).
Chan, Y. et al. Genome-wide Analysis of Body Proportion Classifies Height-Associated Variants by Mechanism of Action and Implicates Genes Important for Skeletal Development. Am. J. Hum. Genet. 96, 695–708 (2015).
Ahmed, M. et al. Considering hormone-sensitive cancers as a single disease in the UK biobank reveals shared aetiology. Commun. Biol. 5, 614 (2022).
Mitteroecker, P., Cheverud, J. M. & Pavlicev, M. Multivariate analysis of genotype–phenotype association. Genetics 202, 1345–1363 (2016).
Goovaerts, S. et al. Joint multi-ancestry and admixed GWAS reveals the complex genetics behind human cranial vault shape. Nat. Commun. 14, 7436 (2023).
DiMario, F. J., Ramsby, G. R., Burleson, J. A. & Greensheilds, I. R. Brain morphometric analysis in achondroplasia. Neurology 45, 519–524 (1995).
de Beer, G. The development of the vertebrate skull. Nature 142, 4–5 (1938).
Churchill, G. A., Gatti, D. M., Munger, S. C. & Svenson, K. L. The diversity outbred mouse population. Mamm. Genome 23, 713–718 (2012).
Svenson, K. L. et al. High-resolution genetic mapping using the Mouse Diversity outbred population. Genetics 190, 437–447 (2012).
Katz, D. C. et al. Facial shape and allometry quantitative trait locus intervals in the Diversity Outbred mouse are enriched for known skeletal and facial development genes. PLoS ONE 15, e0233377 (2020).
Loos, R. J. F. & Yeo, G. S. H. The bigger picture of FTO – the first GWAS-identified obesity gene. Nat. Rev. Endocrinol. 10, 51–61 (2014).
Del Pino, M., Ramos Mejía, R. & Fano, V. Leg length, sitting height, and body proportions references for achondroplasia: New tools for monitoring growth. Am. J. Med. Genet. A 176, 896–906 (2018).
Cole, J. B. et al. Human facial shape and size heritability and genetic correlations. Genetics 205, 967–978 (2017).
Roosenboom, J. et al. Facial characteristics and olfactory dysfunction: Two endophenotypes related to nonsyndromic cleft lip and/or palate. Biomed. Res. Int. 2015, 863429 (2015).
Hallgrímsson, B. et al. Automated syndrome diagnosis by three-dimensional facial imaging. Genet. Med. 22, 1682–1693 (2020).
Hallgrímsson, B. et al. The brachymorph mouse and the developmental-genetic basis for canalization and morphological integration. Evol. Dev. 8, 61–73 (2006).
Naqvi, S. et al. Precise modulation of transcription factor levels identifies features underlying dosage sensitivity. Nat. Genet. 55, 841–851 (2023).
Ferry, Q. et al. Diagnostically relevant facial gestalt information from ordinary photos. ELife 3, e02020 (2014).
Mahdi, S. S. et al. Multi-scale part-based syndrome classification of 3D facial images. IEEE Access 10, 23450–23462 (2022).
Hsieh, T.-C. et al. GestaltMatcher facilitates rare disease matching using facial phenotype descriptors. Nat. Genet. 54, 349–357 (2022).
Hallgrimsson, B., Spritz, R. A., Klein, O. & Washington, M. Developing 3D craniofacial morphometry data and tools to transform dysmorphology. FaceBase Consortium https://doi.org/10.25550/TJ0 (2017).
Weinberg, S. M., Marazita, M. L. & Raffensperger, Z. D. 3D Facial Norms. FaceBase Consortium https://doi.org/10.25550/VWP (2015).
Weinberg, S. M. et al. The 3D facial norms database: Part 1. A web-based craniofacial anthropometric and image repository for the clinical and research community. Cleft Palate Craniofac. J. 53, e185–e197 (2016).
Boyd, A. et al. Cohort Profile: the’children of the 90s’–the index offspring of the Avon longitudinal study of parents and children. Int. J. Epidemiol. 42, 111–127 (2013).
Fraser, A. et al. Cohort profile: The Avon longitudinal study of parents and children: ALSPAC mothers cohort. Int. J. Epidemiol. 42, 97–110 (2013).
Cole, J. B. et al. Genomewide association study of African children identifies association of SCHIP1 and PDE8A with facial size and shape. PLOS Genet. 12, e1006174 (2016).
Liu, C. et al. Genome scans of facial features in East Africans and cross-population comparisons reveal novel associations. PLoS Genet. 17, e1009695 (2021).
Delaneau, O., Zagury, J.-F. & Marchini, J. Improved whole-chromosome phasing for disease and population genetic studies. Nat. Methods 10, 5–6 (2013).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Durbin, R. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics 30, 1266–1272 (2014).
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
McVean, G. A. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).
Hill, H. et al. How different is different? Criterion and sensitivity in face-space. Front. Psychol. 2, 41 (2011).
Kuruvilla, F. G., Park, P. J. & Schreiber, S. L. Vector algebra in the analysis of genome-wide expression data. Genome Biol. 3, research0011.1 (2002).
Hoskens, H. et al. 3D facial phenotyping by biometric sibling matching used in contemporary genomic methodologies. PLOS Genet. 17, e1009528 (2021).
Galwey, N. W. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet. Epidemiol. 33, 559–568 (2009).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Kanai, M., Tanaka, T. & Okada, Y. Empirical estimation of genome-wide significance thresholds based on the 1000 Genomes Project data set. J. Hum. Genet. 61, 861–866 (2016).
Machiela, M. J. & Chanock, S. J. LDlink: a web-based application for exploring population-specific haplotype structure and linking correlated alleles of possible functional variants. Bioinformatics 31, 3555–3557 (2015).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Szklarczyk, D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: Generalized Gene-Set Analysis of GWAS Data. PLOS Comput. Biol. 11, e1004219 (2015).
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
Edgar, R., Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Blake, J. A. et al. Mouse genome database (MGD): Knowledgebase for mouse-human comparative biology. Nucleic Acids Res. 49, D981–D987 (2021).
Kinsella, R. J. et al. Ensembl BioMarts: a hub for data retrieval across taxonomic space. Database 2011, bar030 (2011).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Yang, X.-L. et al. Three novel Loci for infant head circumference identified by a joint association analysis. Front. Genet. 10, 947 (2019).
Liu, Y. et al. Genetic architecture of 11 organ traits derived from abdominal MRI using deep learning. Elife 10, e65554 (2021).
Strausz, S. et al. Genetic analysis of obstructive sleep apnoea discovers a strong association with cardiometabolic health. Eur. Respir. J. 57, 2003091 (2021).
Bartell, E. et al. Genetics of skeletal proportions in two different populations. Preprint at https://doi.org/10.1101/2023.05.22.541772 (2023).
International HapMap 3 consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Casey, B. J. et al. The adolescent brain cognitive development (ABCD) study: Imaging acquisition across 21 sites. Dev. Cogn. Neurosci. 32, 43–54 (2018).
Acknowledgements
This work was supported by the Research Fund KU Leuven (BOF-C1, C14/15/081 & C14/20/081 to PC); the Research Program of the Research Foundation-Flanders (FWO, G0D1923N to PC); NIH-NIDCR (R01-DE027023 to SMW; U01DE024440 to R.A.S., O.D.K., and B.H.). Achondroplasia sample. FaceBase data collection and analyses were supported by NIH-NIDCR (U01DE024440 to R.A.S., O.D.K., and B.H.). Unselected control sample. Pittsburgh personnel, data collection, and analyses were supported by the National Institute of Dental and Craniofacial Research (U01-DE020078 to M.L.M. and S.M.W.; R01-DE016148 to M.L.M. and S.M.W.; R01-DE027023 to S.M.W.). Funding for genotyping by the National Human Genome Research Institute (X01-HG007821 & X01-HG007485 to M.L.M.). Penn State personnel, data collection, and analyses were supported by the Center for Human Evolution and Development at Penn State, the Science Foundation of Ireland Walton Fellowship (04.W4/B643 to M.S.), the US National Institute of Justice (2008-DN-BX-K125 to M.S.; 2018-DU-BX-0219 to S.W.) and by the US Department of Defense. IUPUI personnel, data collection, and analyses were supported by the National Institute of Justice (2015-R2-CX-0023, 2014-DN-BX-K031 & 2018-DU-BX-0219 to S.W.). The UK Medical Research Council and Wellcome (grant no. 102215/2/13/2) and the University of Bristol provide core support for ALSPAC. The publication is the work of the authors and they will serve as guarantors for the contents of this paper. A comprehensive list of grant funding is available on the ALSPAC website (http://www.bristol.ac.uk/alspac/external/documents/grant-acknowledgements.pdf). ALSPAC GWAS data was generated by Sample Logistics and Genotyping Facilities at Wellcome Sanger Institute and LabCorp (Laboratory Corporation of America) using support from 23andMe. We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists, and nurses. Human cranial vault. Data on human cranial vault shape were obtained from the Adolescent Brain Cognitive Development (ABCD) Study, held in the NIMH Data Archive (NDA). This is a multisite, longitudinal study designed to recruit more than 10,000 children aged 9-10 and follow them over 10 years into early adulthood. The ABCD Study is supported by the National Institutes of Health and additional federal partners under award numbers U01DA041048, U01DA050989, U01DA051016, U01DA041022, U01DA051018, U01DA051037, U01DA050987, U01DA041174, U01DA041106, U01DA041117, U01DA041028, U01DA041134, U01DA050988, U01DA051039, U01DA041156, U01DA041025, U01DA041120, U01DA051038, U01DA041148, U01DA041093, U01DA041089, U24DA041123, U24DA041147. A full list of supporters is available at https://abcdstudy.org/federal-partners.html. A listing of participating sites and a complete listing of the study investigators can be found at https://abcdstudy.org/consortium_members/. ABCD consortium investigators designed and implemented the study and/or provided data but did not necessarily participate in the analysis or writing of this report. This manuscript reflects the views of the authors and may not reflect the opinions or views of the NIH or ABCD consortium investigators. Sitting height ratio GWAS. We thank Eric Bartell and Joel Hirschhorn for their contribution of GWAS of sitting height ratio summary statistics to this work.
Author information
Authors and Affiliations
Contributions
M.V., H.H., S.G., J.D., J.C., H.P., B.H., and P.C. conceptualized the study. M.V., H.H., and H.M. carried out the data curation. M.V., H.H., S.G., H.M., J.D., J.D.A., and P.C. wrote the software. M.V., H.H., S.G., and J.D. carried out the formal analysis and did the visualization. M.S., M.L.M., S.M.W., S.W., S.R., O.D.K., R.A.S., H.P., B.H., and P.C. were responsible for funding acquisition and data acquisition. H.P., B.H., and P.C. carried out the supervision. M.V., H.H., S.G., H.P., B.H., and P.C. wrote the original draft. M.V., H.H., S.G., H.M., J.D., J.D.A., J.C., M.S., M.L.M., S.M.W., S.W., S.R., O.D.K., R.A.S., H.P., B.H. and P.C reviewed and edited the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Kaustubh Adhikari, and the other anonymous reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Vanneste, M., Hoskens, H., Goovaerts, S. et al. Syndrome-informed phenotyping identifies a polygenic background for achondroplasia-like facial variation in the general population. Nat Commun 15, 10458 (2024). https://doi.org/10.1038/s41467-024-54839-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-024-54839-1






