Introduction

Velopharyngeal dysfunction (VPD) refers to an inability to close the opening between the nasal cavity and the oral cavity during speech. This occurs because the muscular soft palate (velum) and lateral pharyngeal walls are physically unable to make a sufficient seal of the oral cavity from the nasal cavity during speech production. As a consequence, air tends to escape into the nasal cavity during speech, resulting in hypernasality and excess air emissions. The causes of VPD are heterogeneous and are the basis of three subtypes of VPD: velopharyngeal incompetency (caused by a lack of neuromotor competency), velopharyngeal mislearning (caused by maladaptive articulatory habits) and velopharyngeal insufficiency (VPI, caused by insufficient tissue or mechanical restriction)1,2. For example, a congenitally short palate and/or deep nasopharynx can alter the geometry of the velopharyngeal apparatus, such that the soft palate is no longer able to effectively create a seal against the posterior wall of the nasopharynx2,3. Although VPD can occur as the result of surgical procedures, such as adenoidectomies, the most common congenital cause of VPD is cleft palate (CP) or submucous cleft palate (smCP), which can occur as isolated malformations or as part of a syndrome4. After primary palatal repair surgeries, approximately 30% of CP patients require additional surgery for VPD1.

VPD can also occur in the absence of an overt orofacial cleft5, and some cases have been reported with autosomal-dominant inheritance6,7. Detailed phenotyping in these “isolated” VPD cases shows that these result from structural deficiencies in the anatomical components that comprise the velopharyngeal mechanism. The genetic basis of isolated VPD is poorly understood and the autosomal-dominant families have yet to be genetically mapped. However, given the association with CP and the structural deficiencies of the palate, genes and pathways implicated in the pathogenesis of secondary palate clefting may provide some clues8,9.

The purpose of the current study is to examine the influence of common genetic variants on VPD. To accomplish this, we performed a genome-wide association study (GWAS) on a sample of unaffected relatives from families with a history of CP or smCP, who had been assessed for VPD.

Materials and Methods

Participants

Our study sample consisted of 976 relatives within three degrees of relatedness of probands with an isolated CP (437 male, 539 female; mean age = 29.7 ± 16.29) and who were not affected with an overt CP by both self-report and in-person assessment of their cleft status. The participants were recruited as part of the larger Pittsburgh Orofacial Cleft Study10 at different US and international sites: Pittsburgh (n = 281), St. Louis (n = 51), Texas (n = 299), Colorado (n = 39), Hungary (n = 99), Colombia (n = 16), Philippines (n = 171), and Puerto Rico (n = 20).

Speech assessment

Structured and spontaneous speech samples were recorded for all participants using a Canon 7D camera (Canon USA, Melville, NY). The structured speech paragraphs in English, Spanish and Tagalog can be found in the online supplemental material. Medical and surgical history was documented for all subjects, with a particular focus on speech pathology and palatal surgery. Using the Pittsburgh Weighted Speech Scale11, the speech samples were rated for the presence of VPD by an experienced speech and language pathologist (MF). A speech score was given based on the presence of audible nasal emission and nasal turbulence, nasality, phonation and articulation patterns. Subjects with a score higher than three (which is the cut-off for clinical significance) were considered to have VPD. Using this threshold, a total of 54 participants (20 males, 34 females) were diagnosed with VPD.

Genotyping, quality control, population structure and imputation

DNA was extracted from saliva or blood, and genotyped for 541787 SNPs on an Illumina HumanCore + Exome array plus 15890 SNPs of custom content covering candidate genes for overt clefts. Genetic data cleaning and quality control analyses were performed as described previously12. In brief, samples were interrogated for genetic sex, chromosomal aberrations, relatedness, genotype call rate, and batch effects. SNPs were interrogated for call rate, discordance among 72 duplicate samples, Mendelian errors among HapMap controls (parent-offspring trios), deviations from Hardy-Weinberg equilibrium, and sex differences in allele frequencies and heterozygosity. Filters applied to genotyped SNPs are described in Supplementary Table S1.

Imputation of non-genotyped variants was performed via IMPUTE213, using haplotypes from the 1000 Genomes Project Phase 3 as the reference. We converted imputed probabilities to most-likely genotypes using a genotype probability threshold of 0.9. We then filtered out imputed SNPs with an info score of <0.5. Masked variant analysis, in which genotyped SNPs were imputed in order to assess imputation quality, indicated high accuracy of imputation. Genetic association with VPD was tested for genotyped and imputed SNPs with MAF >5% and which did not show evidence of extreme deviation for the Hardy-Weinberg equilibrium.

Association analysis

Genetic association with VPD was tested for SNPs with MAF >5% using a mixed-models approach as implemented in EMMAX14, which explicitly models the variance due to the kinship (comprising both the family relatedness and population structure) in the sample. 54 unaffected relatives of patients with CP were diagnosed with VPD and compared to 922 unaffected relatives of patients with CP, who are not showing VPD. Sex, age, age2, and site (as a proxy for language) were included as covariates. Principal components of ancestry were not included because variation due to population structure was already explicitly modeled by the kinship matrix. Autosomal SNP genotypes were modeled additively. For SNPs on the X-chromosome, genotypes were coded as 0, 1, and 2 for females, and were coded as 0 or 2 for males in order to maintain the same scale between sexes. The conventional Bonferroni-corrected threshold of 5 × 10−8 was set for genome-wide statistical significance; 5 × 10−6 was the threshold for suggestive hits.

Functional annotation

Potential genes of interest were identified based on physical proximity of ±500 kb from the lead SNP at each genome-wide significant locus. These genes were queried in the following online databases: The Mouse Genome Informatics (MGI) database15, which was used to annotate expression in relevant tissues and phenotypic consequences, the VISTA enhancer database16, which was used to annotate active enhancer elements in relevant tissues, and OMIM and PubMed, which were used to annotate human phenotypic information. The following genes were considered to be of interest during annotation: genes involved in orofacial clefting, in speech pathology caused by structural and central differences, and in craniofacial development.

Data availability

SNPs used in the current study are available in the dbGAP repository (https://www.ncbi.nlm.nih.gov/gap; accession number phs000774.v1.p1).

Compliance with Ethical Standards

Written informed consent was obtained from all individual participants in the study. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional research committees and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards: University of Pittsburgh IRB #PRO09060553 and IRB0405013 (covering both the Pittsburgh and Hungary sites); UT Health Committee for the Protection of Human Subjects #HSC-DB-09-0508 and #HSC-MS-03-090; Colorado Multiple Institutional Review Board #10-0055, Washington University in St. Louis HRPO #03-087, University of the Philippines Manilla, IRB00002908, FWA00018728; University of Puerto Rico Medical Sciences Campus IRB, IRB protocol 0640111 l. The “Fundacion Clinica Noel de Medellin” IRB approved the study protocol for the subjects in Colombia, an IRB approval number is not available for this site.

Results

Genetic association

Our GWAS revealed five genome-wide significant associations at loci 3q29 (lead SNP: rs6583326, p = 2.86 × 10−8), 9p21.1 (lead SNP: rs2800342, p = 4.88 × 10−8), 12q21.31 (lead SNP: rs1133104, p = 1.96 × 10−8), 16p12.3 (lead SNP: rs12922822, p = 2.08 × 10−8) and 16p13.3 (lead SNP: rs13335236, p = 3.50 × 10−9) (Table 1). Two of these SNPs were intragenic: SNP rs1133104 (12q21) is located in the last exon of CLEC4A and rs13335236 (16p13) is located intronic in PPL. The LocusZoom plots17 of these results are shown in Fig. 1, and the Manhattan and QQ plots are shown in Supplementary Figure S1. In addition, 15 loci showed a suggestive association (p < 5 × 10−6) with VPD (Table 1). LocusZoom plots of all suggestive associations are displayed in Supplementary Figure S2 and results are described in Supplementary Table S2.

Table 1 Genome-wide significant (p < 5*10E-08) and suggestive (p < 5*10E-06) associations with VPD.
Figure 1
figure 1

(ae) LocusZoom plots of genome-wide significant (p < 5*10E-08) associations. LocusZoom plots show the association (left y-axis; log10-transformed p-values). Genotype SNPs are represented by stars, imputed SNPs are represented by squares. Shading of the points represent the linkage disequilibrium (r2, based on the 1000 Genomes Project) between each SNP and the top SNP. The blue overlay shows the recombination rate (right y-axis). Positions of genes are shown below the plot.

Several possible biologically relevant candidate genes (e.g., PCYT1A, FREM1) were located at the five genome-wide significant loci. To ensure a more comprehensive evaluation, genes within 500 kb of each lead SNP were queried for possible roles in orofacial clefting, speech development, and/or development of the nasopharyngeal region. Corroborating evidence, such as expression in relevant tissues or putative roles in relevant human syndromes, was found for eight of the 20 loci as discussed below.

Discussion

This GWAS of VPD identified five genome-wide significant associations and 15 suggestive associations. We were not able to replicate these results, because, to our best knowledge, no suitable replication cohort is available. Although the genetic basis of VPD is largely unknown, several of these loci were located near potentially relevant candidate genes, including some previously implicated in orofacial clefting. PCYT1A, located 350 kb upstream of the lead SNP at the 3q29 locus, has been shown to be associated with increased risk for NSCL/P, through an epistatic interaction with BHMT18. Furthermore, a microdeletion in this locus is furthermore associated with a delayed development, especially in speech19. The association at this locus, however, is based on a single imputed SNP. We also observed borderline associations with several variants in TTC28 (locus 22q12.1). Conte and colleagues found an association between copy number variants in TTC28 and orofacial development20. A microdeletion in the same genetic region was found in a child with Pierre-Robin Sequence (including cleft palate) and Neurofibromatosis type 221. Moreover, the ttc28 mouse mutant shows cranial abnormalities with abnormal maxillary morphology, further suggesting the potential for this gene to impact palatogenesis.

Several other associated loci contained candidate genes known to be involved more generally with craniofacial development. For instance, FREM1 (near locus 9p22.3) has been shown to play a role in the fusion of the nasal processes during gestation22, and was implicated in human upper lip morphology in a recent GWAS of facial shape23. In humans, mutations in this gene result in several Mendelian conditions affecting the midline facial structures, such as BNAR syndrome (OMIM #608980) and trigonocephaly (OMIM #190440)22,24. Trigonocephaly is in 34% of the cases associated with speech and/or language delay25. KREMEN1 (near locus 22q12.2) is a modulator of WNT signaling, crucial for neural tube closure26. TFRC, located 500 kb from the lead SNP of locus 3q29, is involved in craniofacial morphogenesis by regulating TGFß and BMP signaling activation27.

Interestingly, one of the identified loci contained genes involved in the neural control of speech production. NAGPA at locus 16p13.3 is involved in stuttering28 and focal epilepsy with speech disorders29, respectively. Neurophysiological dysfunction is a known cause of VPD4, so it is possible that variants in these genes might influence the movement of the muscles comprising the soft palate.

We have hypothesized that isolated VPD may represent a subclinical phenotype in families with a history of orofacial clefting10. In the context of orofacial clefting, subclinical phenotypes can be conceptualized as incomplete (or intermediate) expressions of the risk factors for the overt defect or as pleiotropic expressions in related tissues/structures. Such subclinical phenotypes have now been extensively documented in the clinically unaffected relatives of affected individuals within families affected with orofacial clefts10,30. A key assumption is that the overt and subclinical manifestations of the orofacial cleft phenotype share common etiological factors. If this model is correct, then at least some of the genes that underlie orofacial cleft susceptibility, particularly those involved in clefts of the secondary palate, may also underlie VPD.

If the hypothesis is correct that CP and VPD (partially) share their genetic etiology, it may be valuable to investigate the effect of genes known to be involved in the etiology of CP, in subjects with VPD. However, our understanding of the genetic basis of isolated CP is largely incomplete. Associations between CP and SNPs in FAF131, FOXE132, and GRHL39,33 have been described. No variants in or near these genes were associated with VPD in our study cohort. This does not preclude the possibility that additional (yet to be identified) variants involved in CP will contribute to VPD and vice versa. Unlike overt CP, smCP is usually not immediately diagnosed at birth. A recent candidate gene study of six SNPs in loci strongly associated with orofacial clefting, did not show any association between these SNPs and smCP34, leaving the genetic basis of smCP unknown. Since there is a high prevalence of VPD in patients with smCP, the genetic loci identified in this study are also good candidate genes for both overt CP and smCP.

This study represents the first attempt to identify genetic variants associated with VPD. Although we observed several promising associations, we are not currently able to independently replicate any of these signals due to a lack of additional datasets. Our study was also limited by a lack of objective assessments of VPD, such as acoustic nasalence data during speech assessed by nasometry, or visualization of the velopharyngeal mechanism during speech assessed by video nasopharyngeal endoscopy. Including these types of assessments may yield additional insights into the genetic basis of VPD. Furthermore, the presence of smCP could not be determined in this dataset. Thus, it is possible that the presence of VPD is actually due to undiagnosed smCP. Although we hypothesize that VPD may be a subclinical phenotype of CP, we did not find strong evidence of association between the signals identified in this study and CP.