Contributions of common genetic variants to specific languages and to when a language is learned

Wong, Patrick C. M.; Kang, Xin; So, Hon-Cheong; Choy, Kwong Wai

doi:10.1038/s41598-021-04163-1

Download PDF

Article
Open access
Published: 12 January 2022

Contributions of common genetic variants to specific languages and to when a language is learned

Patrick C. M. Wong ORCID: orcid.org/0000-0002-6105-5027^1,2,3,
Xin Kang ORCID: orcid.org/0000-0002-1126-5771^1,2,4,5,
Hon-Cheong So^2,6 &
…
Kwong Wai Choy⁷

Scientific Reports volume 12, Article number: 580 (2022) Cite this article

7289 Accesses
12 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Research over the past two decades has identified a group of common genetic variants explaining a portion of variance in native language ability. The present study investigates whether the same group of genetic variants are associated with different languages and languages learned at different times in life. We recruited 940 young adults who spoke from childhood Chinese and English as their first (native) (L1) and second (L2) language, respectively, who were learners of a new, third (L3) language. For the variants examined, we found a general decrease of contribution of genes to language functions from native to foreign (L2 and L3) languages, with variance in foreign languages explained largely by non-genetic factors such as musical training and motivation. Furthermore, genetic variants that were found to contribute to traits specific to Chinese and English respectively exerted the strongest effects on L1 and L2. These results seem to speak against the hypothesis of a language- and time-universal genetic core of linguistic functions. Instead, they provide preliminary evidence that genetic contribution to language may depend at least partly on the intricate language-specific features. Future research including a larger sample size, more languages and more genetic variants is required to further explore these hypotheses.

Language and nonlanguage factors in foreign language learning: evidence for the learning condition hypothesis

Article Open access 15 September 2021

Genetic association analysis of human median voice pitch identifies a common locus for tonal and non-tonal languages

Article Open access 07 May 2024

A genome-wide association study of Chinese and English language phenotypes in Hong Kong Chinese children

Article Open access 27 March 2024

Introduction

Even before the publication of the first studies on the molecular genetics of a speech disorder^{1, 2}, researchers have hypothesized that developmental speech and language disorders were inherited (see Ludlow and Cooper³ for an early review). In the two decades since the first studies concerning FOXP2 and apraxia of speech were made^{1, 2}, a series of studies (e.g., see Newbury and Monaco⁴ for a review) have identified new genes that explained a small portion of variation in spoken and written language functions and disorders^{5, 6}. These latter studies often focused on common genetic variants and their associations with language-related traits (e.g., non-word repetition). Though the effect sizes are small, the study of common variants offers an important opportunity to investigate variation of language functions on a continuum. Subtle differences in language functions (e.g., lower proficiency in using a particular set of grammatical forms in language rather than a severe breakdown in communication) are more likely to be associated with primary language impairment and variations in success in acquiring foreign languages. These subtle differences differ from severe forms of speech and language impairment (e.g., childhood of apraxia of speech) that are more likely to be associated with rarer genetic mutations (e.g., Thevenon et al.⁷). The focus of the present study is on common variants and subtle differences in language.

In addition to investigating the molecular pathways that give rise to the neurological functions of genes associated with language functions and disorders^8,9,10 and to identifying more new genes, we argue that the genetic studies of language should consider two additional questions concerning variation on a continuum. First, what can the genetics of language inform us about how languages are learned? Second, if an ultimate translational goal of the study of genetics of language is to develop a screening tool for primary language impairment, how can it be used for the more than 7000 languages that are currently spoken and languages that are learned at different times in life?

In both native^{11, 12} and foreign^{13, 14} language learning, a large degree of individual variability in learning success has been observed (see Kidd et al.¹⁵ for a review). Many factors have been attributed to individual variability, including socioeconomic background for native languages¹⁶, and memory¹⁷, music experience^18,19,20 and subtle neuroanatomical differences²¹ for foreign languages. At the lower end of individual variability is primary language impairment, which includes Developmental Language Disorder (formerly known as Specific Language Impairment) and dyslexia, which concerns impairment of language in the written modality. The vast majority of the studies were conducted to examine individual variability in native language, and more specifically English and other European languages as a native language.

To obtain a more comprehensive understanding of the genetic basis of language, research must consider not only genetic associations with native language on a continuum of proficiency level, but also foreign languages learned at different time points in life. Such an understanding would give us a clearer idea of whether the genetic effects on language functions are subject to developmental and learning factors. Languages that are learned later in life may require a different set of cognitive resources than languages learned in infancy, which may have been contributory sources of individual variability in L2 attainment²². If that is the case, the genes that explain individual differences in native language would not be the same as those in foreign languages. In fact, it may be the case that the genetic effects on native language would be larger than the effects on foreign languages. A better understanding would address long-standing debates in language learning about whether the learning of native and foreign languages is fundamentally different²³. As far as we are aware, with the exception of Waye et al.²⁴, who examined Chinese and English literacy in bilingual children and one gene, no genetic studies of language have yet investigated foreign language learning. Rimfeld et al.²⁵ examined the genetic contributions to foreign language learning using a twins sample and did not examine the molecule genetics of such contributions.

A more comprehensive understanding of the genetic basis of language must also investigate languages other than European languages. More than 7000 languages are spoken worldwide²⁶. The genes that have been attributed to language could be those that subserve language functions independent of language features (e.g., lexical retrieval, which is required for all languages) or functions that are specific to a linguistic feature (e.g., inflectional morphology, which occurs only in some languages). A real-world implication for understanding the language universal or specific nature of genetic association concerns whether the same genetic diagnosis of language impairment can be made only for a specific language or for any language. In recent years, genetic research has been extended to the examination of non-European languages such as Chinese^{27, 28}. However, with the notable exception of the work of Waye et al.²⁴, these studies of non-European languages did not examine the genetic associations of two languages within the same population. This makes it difficult to tease apart language and population specific effects, because these two factors often co-vary.

The present study covers young adult participants whose L1 and L2 are Chinese and English, respectively, who were students learning French, German, or Spanish as L3 at college level. The study aims to further our understanding of how common genetic variants are associated with language in three ways. First, while most studies to date on the genetic basis of language have focused on English-speaking individuals (see Devanna et al.²⁹ for a review), we asked whether the same genetic variants collectively demonstrate an extended effect on language ability that is measured in early adulthood in speakers of Chinese. To answer this question, we surveyed the literature on the genetic basis of language and identified a group of 28 genetic variants (Table 1). We then simultaneously examined their effects on the participants’ native, first language (L1) as measured by the Chinese subject test of the college entrance examination in Hong Kong. Table 2 summarizes the participant characteristics.

Table 1 SNPs of language-related genes hypothesized to be associated with language proficiency that we examined in the present study.

Full size table

Table 2 Demographic information and phenotype scores of participants.

Full size table

Second, we examined whether this same group of common variants, whose effects were studied for native language (cf Vaughn and Hernandez⁵², and Waye et al.²⁴ for bilingual speakers), would exert similar effects on a foreign, second language (L2) that was learned since early childhood with a relatively high proficiency. Foreign language proficiency was measured by the English subject test of the same college entrance examination in Hong Kong from the same group of participants. Third, we investigated whether the same genetic variants contribute to the learning of a new, third language (L3) in adulthood. We used the same group of participants, namely students at college-level modern language courses whose L3 ability was measured comprehensively by a composite series of classroom and laboratory tests (see SI for more information).

Our study tests two sets of hypotheses. The first hypothesizes that a group of genetic variants contributes to a set of core language functions that are universal across languages and independent of when learning occurs (whether the learned language is native or foreign). This group of genetic variants would contribute to the learning of L1, L2 and L3. Alternatively, we argue that different languages and languages learned at different times have different genetic underpinnings. As different language features are associated with different brain functions (e.g., the middle frontal gyrus is specific for Chinese reading)^{53, 54}, these functions would have different underlying neurogenetic processes. Differences may also be due to the possibility that languages that are learned at different times in life are subject to the influence of different sets of non-genetic factors⁵⁵. For example, the learning of new languages is subject to social factors such as motivation⁵⁶ that may not have the same influence on L1.

Results

We conducted two types of analyses to evaluate our hypotheses (see “Methods and materials” for details). First, we used stepwise regression to evaluate genetic (all 28 SNPs) and non-genetic (e.g., gender) contributions to each language (L1, L2 or L3) in three models. This method allows us to determine unique variance explained by genetic and non-genetic factors for each language. However, a weakness of this approach is that we cannot simultaneously examine quantitatively whether the same genes or non-genetic factors also account for variance in the other two languages. Thus, followed by stepwise regression, we constructed a structural equation model (SEM) that included genetic variants that we found to contribute to any of the three languages we found in the regression models. These variants were entered into the SEM and their contribution to all three languages were tested simultaneously, along with non-genetic factors (Fig. 1). Because not all participants had all measures collected (genetic and non-genetic), we used listwise deletion to exclude those without complete data use in the regression analyses⁵⁷ which resulted in fewer participants than the entire set (Tables 3, 4, 5, and 6 showed the number of participants included for each type of analysis).

Table 3 The final model of bi-directional stepwise regression analyses for L1 proficiency using the original dataset.

Full size table

Table 4 The final model of bi-directional stepwise regression analyses for L2 proficiency.

Full size table

Table 5 The final model of bi-directional stepwise regression analyses for L3 proficiency with motivation variables included as additional predictors.

Full size table

Table 6 Path coefficients of structural equation models (SEMs) for L1, L2, and L3.

Full size table

Stepwise regression models

Stepwise procedure in both directions was implemented to determine which hypothesized SNPs (if any) significantly explained the variation in language proficiency. In the first step, all 28 hypothesized SNPs were included as predictors of language proficiency, along with non-genetic variables. The final model had the best combination of independent variables for predicting the language proficiency. Gender (∆R² = 0.02, FDR corrected p = 0.012), family SES (∆R² = 0.01, FDR corrected P = 0.028), two SNPs of DCDC2 (rs6456593, rs6940827) (∆R² = 0.01, FDR corrected P = 0.033; ∆R² = 0.01, FDR corrected P = 0.049), and one SNP of DRD2 (rs1800497) (∆R² = 0.02, FDR corrected P = 0.025) were significantly predicting L1 proficiency (Table 3). Music training (∆R² = 0.03, FDR corrected P = 0.002), family SES (∆R² = 0.05, FDR corrected P < 0.001), two SNPs of FOXP2 (rs1852469, rs6980093) (∆R² = 0.02, FDR corrected P = 0.009; ∆R² = 0.01, FDR corrected P = 0.036) and one SNP of CATNAP2 (rs2538991) (∆R² = 0.01, FDR corrected P = 0.046) were significant predictors of L2 proficiency (Table 4). Internal motivation (∆R² = 0.05, FDR corrected P < 0.001) and S100B (rs9722) (∆R² = 0.01, FDR corrected P = 0.046) were significant predictors of L3 proficiency (Table 5). Thus, for L1, the combined unique variances explained by common variants and non-genetic factors were 3.7% and 3.6%, respectively. For L2, they were 3.5% and 7.6%, respectively; and for L3, they were 0.9% and 5.4%, respectively.

Structural equation modelling (SEM)

The stepwise regression approach reported above provided information about which ones of the 28 hypothesized genetic variants as well as non-genetic factors contributed to each language individually. In order to examine the contribution of genetic and non-genetic factors simultaneously for the three languages, we used SEM⁵⁸ (see Fig. 1 for the metamodel). The SEM provided a statistically good fit, as indicated by the root mean square error of approximation (RMSEA) = 0.000 [CI 0.000–0.045], the standardized root mean square residual (SRMR) = 0.011, the robust Comparative Fit Index (CFI) = 1.000, the robust Tucker-Lewis Index (TLI) = 1.040, and the Yuan–Bentler scaling correction factor = 1.024. Table 6 presents path coefficients that represent the estimates of the connection strengthen between a unit change in genetic and non-genetic factors and the latent language proficiency variables. A positive coefficient means a unit increase in these factors leads to a direct and proportional increase in language proficiency, while a negative coefficient means that an increase in these factors leads to a direct and proportional decrease in language proficiency. We found that L1 proficiency was positively associated with Gender (standardized path coefficient 0.146) and DRD2 (rs1800497) (standardized path coefficient 0.110), but negatively associated with Family SES (standardized path coefficient − 0.122) and DCDC2 (rs6940827) (standardized path coefficient − 0.123). L2 proficiency was positively associated with L1 proficiency (standardized path coefficient 0.253), Family SES (standardized path coefficient 0.256), music (standardized path coefficient 0.150), CNTNAP2 (rs2538991) (standardized path coefficient 0.094), but negatively associate with FOXP2 (rs1852469) (standardized path coefficient − 0.169). L3 proficiency was positively associated with L2 proficiency (standardized path coefficient 0.295), internal motivation (standardized path coefficient 0.243), S100B (rs9722) (standardized path coefficient 0.112), and DCDC2 (rs6456593) (standardized path coefficient 0.106). Generally speaking, the SEM results converged with the stepwise regression results, even when proficiency levels for all three languages were considered together.

Discussion

We found little overlap in the genetic associations among the three languages that our participants learned at different times in life. This pattern of results can be seen when the three languages were examined individually or simultaneously. Instead, we found that different common genetic variants contribute to explaining variance of the three languages. The effects of genes on language seem to be language specific and are stronger for native than foreign languages. By contrast, the effects of non-genetic factors seem to be stronger for foreign than native languages.

We found two genes that contributed to explaining variance in L1 ability in our stepwise regression, DCDC2 and DRD2. Importantly, the significant DCDC2 variants were those found in other studies of Chinese, including rs6456593³⁴, and rs6940827³⁵, each contributing to about 1% of the variance in L1. DRD2 (rs1800497) was found to contribute significantly to about 1.6% of variance in our study. In a previous study, the same variant was found to explain variance in bilingual proficiency⁵², which confirmed the results of a previous artificial language learning study where young adults learned a morpho-phonological grammar⁴². We found two different genes associated with L2, namely CNTNAP2 and FOXP2, which combined explained about 3.5% of variance. CNTNAP2 (rs2538991), which is downregulated by FOXP2, is associated with non-word repetition in English⁵. Non-word repetition is a predictor of language impairment in English-speaking children⁵⁹. Interestingly, in Chinese, non-word repetition did not predict language impairment⁶⁰. Thus, the association of CNTNAP2 (rs2538991) with English only may support the language-specific hypothesis. The specific genetic variants of FOXP2 that we found to be associated with L2 included rs6980093, which was associated with verbal fluency (naming as many words as possible in a semantic category within 60 s) in two Italian samples⁴⁵, and rs1852469, which has been associated with speech sound disorders in a Chinese population⁴⁴. Compared to L1, the genetic effects on L3 is much weaker. For the common variants examined, S100B (rs9722) was the only significant contributor to L3 proficiency in the stepwise regression analysis, which explained about 1% of variance. S100B are highly expressed in the hippocampus⁶¹. Its association with the learning of a new language is consistent with the role of declarative memory in early stages of language learning⁶². The pattern of results of the SEM converged with those of the stepwise regression, except that rs6456593 was also found to be associated with L3 but not L1. This difference does not change the preliminary conclusions of the study.

Table 7 summarizes the SNPs that we found to be significantly associated with language phenotypes in the present study. The risk alleles we found in the present study and other relevant studies are also listed. For the most part, our findings are consistent with those reported in the literature with two exceptions. For rs1852469 and rs2538991, the allele which we found to be associated with weaker language ability was opposite of what was found in Zhao et al.⁴⁴ and Vernes et al.⁵, respectively. In both cases, the allele frequencies in our sample were different from what was reported in those studies. While the allele frequencies we found for rs1852469 was consistent with what was reported in dBSNP (https://www.ncbi.nlm.nih.gov/snp/) (A>T), the opposite was found in Zhao et al.⁴⁴ (T>A), even though both samples were East Asian. For rs2538991, the allele frequencies were roughly equal for the European population that Vernes et al.⁵ studied, but for our sample of East Asian, the A allele was clearly the minor allele.

Table 7 Risk alleles of SNPs that were reported to be linked with language abilities in the present and in the literature.

Full size table

The amount of variance explained by any single SNP was about 1 to 2% in this study, which is seemingly large when compared to those effects found in GWAS studies (e.g., Okbay et al.⁶³). Only 28 SNPs were examined in the present study, and it is likely that overlapping variance with other SNPs that we did not investigate would be revealed should a GWAS study was conducted. Furthermore, because our candidate SNPs have been studied extensively in other studies, they represent those of larger effects and our replication here speaks to that. In addition to these explanations, it is important to acknowledge that smaller studies such as this one often results in overestimation of effect sizes⁶⁴ and even false positives.

Taken as a whole, the results may support the hypothesis that genetic associations are strongest for a specific language. Furthermore, genetic effects seem to be strongest for native than foreign languages. For L1, the amount of variance explained by genetic factors combined (3.7%) was much stronger than that of any one of the significant non-genetic factors, including gender⁶⁵ (2.2% of variance explained) and family SES¹⁶ (1.4%). For L2, the best predictor was family SES⁶⁶ (5%), followed by music training^17,18,20 (2.6%). For L3, the best predictor was clearly the non-genetic factor of motivation (5.4%). Again, this finding is consistent with the results of previous non-genetic studies^{55, 67,68,69}, which found motivation to be the best predictor of learning a new language.

It is worth noting that the effect of family SES on L1 is in the negative direction in our sample. This is likely a unique finding to learning L1 and L2 in Hong Kong. In a longitudinal study in school children in Hong Kong, family income only predicted L2 (English) but not L1 (Chinese) proficiency⁷⁰. In early adulthood, this trend may lead to a negative association between family SES and L1 because of an emphasis on learning English for families of higher SES background, as learners from higher SES families are more likely to attend English-medium schools.

An important feature of our study is that we examined the genetic associations of three languages all within a single (Han Chinese) population and investigated the contributions of a group of genes that have found to be related to language. This design allows us to more clearly study how the same group of genes are associated with different languages and languages learned at different times, without contamination by the co-varying factors of population and language. As far as we know, Waye et al.²⁴ are the only other researchers who have examined L1 and L2 within the same population. However, only the genetic variant rs3743205 of DYX1C1 was studied. Vaughn and Hernandez⁵² also examined two languages but did not report association results for each language independently, focusing instead only on bilingual proficiency, a measure of the balance of two languages.

Our study contributes to the decades-long debate in language learning about whether native and foreign languages are learned primarily with the same mental mechanisms. Our two hypotheses were aligned with the Linguistic Coding Differences Hypothesis (LCDH)⁷¹ and the Fundamental Difference Hypothesis (FDH)^{23, 72}. Under LCDH, a set of identical “core languages functions” such as phonological and syntactic processes are required for the successful learning of any languages at any time in life. In terms of genetics, this implies the same set of genetic variants for native and foreign languages. FDH hypothesizes an innate language learning system that is only accessible at the earliest time in life for learning an infant’s native language. Foreign language learning lacks access to this innate system. In genetic terms, it implies a group of genetic variants that are only associated with L1.

Wong et al.⁷³ hypothesized that dopamine-related genes are linked to individual differences in language learning. Vaughn and Hernandez⁵² tested this hypothesis and found a significant association between the dopamine-related genes COMT (rs4680) and DRD2 (rs1800497), and individual differences in achieving balanced bilingual proficiency. Wong et al.⁴² who used an artificial language in laboratory conditions rather than an authentic language, found a significant association between DRD2 (rs1800497) and the learning of morphophonology. Stein et al.⁷⁴ found a significant association between several SNPs of DRD2 (including rs1800497) and measures of native language but only the vocabulary measure reached statistical significance after correction for multiple comparisons. Nevertheless, the findings from these previous studies are consistent with those of the present study. The dopamine hypothesis concerns a language universal mechanism. Future research will need to explore why the present study only found a significant association with native language.

Our study has several limitations. First, although the genetic variants we examined were those that have been reported (and sometimes replicated) in research studies during the past two decades and are the most promising candidates for language, many more potential genetic variants remain to be examined. It is very likely that those genetic variants may show an overlap across three languages. But based on the best available information we have about genes and language, we designed our study and found interpretable findings to confirm one of the two hypotheses. A GWAS with a very large sample size is needed in the future. Second, although we have found differences in genetic associations across languages, it is still unclear whether they occur because of language features or because they are languages learned at different points in life. Our evidence provides support for both explanations. A much larger-scale study with a much larger sample size in the future would control for the different grouping of languages and when they are learned, which would allow for a more precise delineation of these two factors. Third, only Han Chinese participants were studied. Future research will need to sample different populations (see Carrion-Castillo et al.⁷⁵ and Becker et al.⁷⁶ for examples of studies of European samples) who may have different, subtle genetic differences which may not occur in such a restricted sample. Fourth, we did not collect data on participants’ time on L3, which may explain some of the variance in L3 proficiency.

In a unique sample of Han Chinese participants who have learned three different languages, we found differences in genetic associations that depend on the specific language and when the language is learned. Individual differences in L1 seem to be more highly associated with language-related genes, especially those that have been found to be related to impairment of Chinese. L2 seems to be more closely related to both genetic and non-genetic factors (musical background and family SES). L3 is most strongly related to the motivation of the learners who learn the new language. Our results did not lend support to the hypothesis that a common set of genetic factors contribute to all language learning. It is likely that language learning at different times in life requires different processing demands⁷⁷, which are underlined by different neurogenetic factors. It is also likely that different language features require different processing demands and, as a result, different neurogenetic factors contribute to different languages⁵⁴. The present study should be viewed as a preliminary step towards exploring the two primary hypotheses. Future research of a much larger scale is required to further explore the nature of genes and language.

Methods and materials

Participants

We recruited a total of 940 participants (696 females) between 18 to 25 years of age (Mean = 19.98, SD = 1.28) for our study through mass emails and advertisements in their language classes, after obtaining permission from the class teachers. Written informed consent was obtained from all participants. The research protocol was approved by the Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee and the research was performed in accordance with the Declaration of Helsinki. All participants were native speakers of Cantonese of Han Chinese descent without any self-reported neurological or psychiatric disorders. They all scored within normal limits (at least 85) of the nonverbal intelligence measured by the Test of Nonverbal Intelligence (4th Ed)⁷⁸ and passed the hearing screening at the frequencies of 500, 1 k, 2 k and 4 k Hz at 30 dBH. All learned English as L2, and French, German, or Spanish as L3. Because these participants enrolled in this study over a 4-year period, not all variables were collected from every participant. Some data was also missed due to fatigue, coding errors and genotyping failures. Table 2 presents descriptive measures for the different participant variables.

Questionnaires

We collected demographic information on the participants, including their gender, date of birth, language background, family socioeconomic status (SES), and musical experiences. Family SES was determined by following the Hollingshead index⁷⁹ by coding parents’ educational levels and occupational prestige. Participants also completed the Modern Language (ML) Learner Questionnaire⁸⁰ to indicate their internal motivation, external motivation, anxiety, and attitudes to learning the L3. A data reduction process was used to derive four metrics related to this questionnaire (see SI).

Proficiency of L1, L2, and L3

The L1 and L2 proficiency of participants were measured by the composite scores of each of the Chinese and English language subjects of the Hong Kong Diploma of Secondary Education Examination (HKDSE), the public examination for university entrance in Hong Kong, administered by the Hong Kong Examinations and Assessment Authority (HKEAA). HKDSE implements an annual calibration exercise to ensure that scores across years reflect the same levels of performance⁸¹. For both Chinese and English, the composite scores were calculated using subtests on reading, writing, speaking, and listening skills on a scale from 1 (lowest) to 7 (highest).

To obtain an overall measure of L3 proficiency, we collected laboratory-based and classroom-based data which covered reading, writing, speaking, and listening abilities for each third language, similar to L1 and L2. Laboratory-based measures included three types of data. First, a sample of passages read aloud from the “Frog, Where Are You?” story⁸² was transcribed, morphosynatically tagged, and analyzed using the CLAN program of the TalkBank project⁸³. Second, the pronunciation of speech production was assessed by native speakers based on excerpts from the storytelling sample. Third, lexical access was calculated by using the accuracy rates of a picture naming task. Classroom-based measures were participants’ z-transformed exam scores of the L3 class. The final L3 proficiency index, known as the L3 Global score, was calculated by using the Principal Component Analysis based on these measures. Details regarding to data collection, analysis, and reduction procedures for L3 proficiency are given in SI Materials and Methods.

Genes and SNP genotyping

Saliva samples were collected using Oragene (DNA Genotek) and used to extract the genomic DNA of participants. A NanoDrop Spectrophotometer was used to quantify Extracted DNA samples, and was normalized to 5 ng/μl for use in genotyping. A commercially available Sequenom MassARRAY platform was used to genotype the SNPs. Table 1 presents the allele frequencies of our sample. For the most SNPs, the allele frequencies in our sample are consistent with those reported by the dbSNP database published by the National Center for Biotechnology Information (US) (https://www.ncbi.nlm.nih.gov/snp/) for East Asians.

In selecting our genetic candidates, our focus was on individual differences of language functions on a continuum and their association with common genetic variants, rather than rare forms of neurodevelopmental disorders or disorders that lead to language impairment as a secondary condition. SNPs of FOXP2 were included so far as they were common variants and were associated with speech⁴⁴. We conducted a literature search for studies that had investigated individual differences in typical language functions or language impairment. For genetic variants associated with language impairment, we only considered language impairment as a primary condition (Developmental Language Disorder), excluding studies of autism, intellectual disability, and other neurodevelopmental disorders where language impairment of any modality is a secondary condition^{84,85,86,87,88,89,90}. We also excluded studies that examined rare deletions⁷, along with studies of genetic variants that are linked to stuttering without other traits related to abstract linguistic structures⁹¹. We only included variants of CNTNAP2 that have been associated with primary language conditions⁵. CNTNAP2 has been associated with language functions in Autism Spectrum Disorder (ASD) in children of European backgrounds⁹². In Chinese children with ASD, there are conflicting findings regarding the role of CNTNAP2 polymorphisms^{93, 94}. Given these uncertainties, SNPs that were associated with language in ASD but not language as a primary condition were excluded. We also excluded SNPs due to linkage disequilibrium with other SNPs in the study. Linkage disequilibrium (LD) among the SNPs on the same chromosome was calculated using snpStats⁹⁵ package of R⁹⁶ (see Fig. S5 for the LD results). In the end, based on the results of previous studies which reported associations with language functions, we composited a list of 28 SNPs as our candidates (see Table 1 for the references).

Statistical analysis

Because each analytic method has its own strengths and limitations, we opted to use multiple methods for our data analysis. Based on the practice of previous studies, we chose two methods: stepwise regression and structural equal modeling (SEM). We began our analysis with stepwise regression. For each language of a stepwise regression model, we used the 28 SNPs as predictors, and used family SES, gender, and musical training as non-genetic predictors. For L3, we also analyzed the data with motivation measures as additional predictors. Standard linear additive SNP encoding was used to code the alleles. The major alleles were given a value of 2, the heterozygous alleles a value of 1, and the minor alleles a value of 0. Thus, a positive statistical relationship between SNP and language means a higher load of the major alleles for better language.

Stepwise regression

We included all 28 SNPs and non-genetic variables (gender, music training, and family SES for L1 and L2; these factors and motivational factors for L3) in stepwise regression models for L1, L2, and L3 separately. Stepwise regression is a method of fitting regression models in which the choice of predictive variables is made by an automatic procedure. The final model had the best combination of independent variables for predicting the dependent variables. For all models, stepwise procedure in both directions was implemented via MASS package⁹⁷ of R⁹⁶ to remove and add predictors based on their improvement to the Akaike information criterion (AIC). Final models of stepwise regression included all predictors that showed improvement to the AIC. Statistical significance of each variable was also indicated by the false discovery rate (FDR) corrected p values, which were calculated using the Benjamini–Hochberg method.

Structural equation modelling

To quantify the statistical relationships of language proficiency and hypothesized SNPs, we fitted a structural equation model (SEM) using the lavaan package⁹⁸ of R⁹⁶. Demographic characteristics, including gender, music training, and family SES, and genetic variants that were associated with each language separately from stepwise regression models were considered independent variables in the data analysis. Proficiency in each language was treated as a latent variable. In the metamodels, we hypothesized that both non-genetic (e.g., gender, music training, and family SES) and genetic variables had effects on proficiency of each language (Fig. 1). For L3, motivation was additionally associated with proficiency⁵⁶. As proficiency levels among languages might be related as found in our recent study⁵⁵, those relationships were also accounted in the SEM. We used the full information maximum likelihood (FIML) to account for missing data and robust SEs accounting for non-normality. The goodness of fit for the tested model was established by the following indices: (i) χ² test with an estimated significance level P ≥ 0.05, (ii) χ²/df < 2, (iii) robust root mean square error of approximation (robust RMSEA) < 0.05 and an upper limit of the 95% confidence interval (CI) for robust RMSEA < 0.08, (iii) robust comparative fit index (robust CFI) and robust Tucker–Lewis Index (robust TLI) with values ≥ 0.90, and (iv) standardized root mean square residual (SRMR) with a value lower than 0.10. We reported both unstandardized and standardized path coefficients (Table 6).

Data availability

All data needed to evaluate the conclusions in the paper are present in the paper and/or Supplementary Information. The numeric data and analysis scripts of this study will be available at Open Science Framework (https://osf.io/vkgmd/).

References

Fisher, S. E., Vargha-Khadem, F., Watkins, K. E., Monaco, A. P. & Pembrey, M. E. Localisation of a gene implicated in a severe speech and language disorder. Nat. Genet. 18, 168–170 (1998).
Article CAS PubMed Google Scholar
Lai, C. S. L., Fisher, S. E., Hurst, J. A., Vargha-Khadem, F. & Monaco, A. P. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature 413, 519–523 (2001).
Article ADS CAS Google Scholar
Ludlow, C. L. & Cooper, J. A. Genetic Aspects of Speech and Language Disorders (Academic Press, 1983).
Google Scholar
Newbury, D. F. & Monaco, A. P. Genetic advances in the study of speech and language disorders. Neuron 68, 309–320 (2010).
Article CAS PubMed PubMed Central Google Scholar
Vernes, S. C. et al. A functional genetic link between distinct developmental language disorders. N. Engl. J. Med. 359, 2337–2345 (2008).
Article CAS PubMed PubMed Central Google Scholar
Scerri, T. S. et al. DCDC2, KIAA0319 and CMIP are associated with reading-related traits. Biol. Psychiatry 70, 237–245 (2011).
Article CAS PubMed PubMed Central Google Scholar
Thevenon, J. et al. 12p13.33 microdeletion including ELKS/ERC1, a new locus associated with childhood apraxia of speech. Eur. J. Hum. Genet. 21, 82–88 (2013).
Article CAS PubMed Google Scholar
Roll, P. et al. Molecular networks implicated in speech-related disorders: FOXP2 regulates the SRPX2/uPAR complex. Hum. Mol. Genet. 19, 4848–4860 (2010).
Article CAS PubMed PubMed Central Google Scholar
Vernes, S. C. et al. Foxp2 regulates gene networks implicated in neurite outgrowth in the developing brain. PLoS Genet. 7, e1002145 (2011).
Article CAS PubMed PubMed Central Google Scholar
Co, M., Hickey, S. L., Kulkarni, A., Harper, M. & Konopka, G. Cortical Foxp2 supports behavioral flexibility and developmental dopamine D1 receptor expression. Cereb. Cortex. 30, 1855–1870 (2020).
Article PubMed Google Scholar
Street, J. A. & Dąbrowska, E. More individual differences in language attainment: How much do adult native speakers of English know about passives and quantifiers?. Lingua 120, 2080–2094 (2010).
Article Google Scholar
Tanner, D. & Van Hell, J. G. ERPs reveal individual differences in morphosyntactic processing. Neuropsychologia 56, 289–301 (2014).
Article PubMed Google Scholar
Ettlinger, M., Bradlow, A. R. & Wong, P. C. M. Variability in the learning of complex morphophonology. Appl. Psycholinguist. 35, 807–831 (2014).
Article Google Scholar
Skehan, P. Individual differences in second language learning. Stud. Second Lang. Acquis. 13, 275–298 (1991).
Article Google Scholar
Kidd, E., Donnelly, S. & Christiansen, M. H. Individual differences in language acquisition and processing. Trends. Cogn. Sci. 22, 154–169 (2018).
Article PubMed Google Scholar
Fernald, A., Marchman, V. A. & Weisleder, A. SES differences in language processing skill and vocabulary are evident at 18 months. Dev. Sci. 16, 234–248 (2013).
Article PubMed Google Scholar
Hamrick, P., Lum, J. A. G. & Ullman, M. T. Child first language and adult second language are both tied to general-purpose learning systems. Proc. Natl. Acad. Sci. USA. 115, 1487–1492 (2018).
Article CAS PubMed PubMed Central Google Scholar
Slevc, L. R. & Miyake, A. Individual differences in second-language proficiency: Does musical ability matter?. Psychol. Sci. 17, 675–681 (2006).
Article PubMed Google Scholar
Wong, P. C. M., Skoe, E., Russo, N. M., Dees, T. & Kraus, N. Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 10, 420–422 (2007).
Article CAS PubMed PubMed Central Google Scholar
Wong, P. C. M. et al. ASPM-lexical tone association in speakers of a tone language: Direct evidence for the genetic-biasing hypothesis of language evolution. Sci. Adv. 6(22), eaba5090 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Wong, P. C. M. et al. Volume of left Heschl’s gyrus and linguistic pitch learning. Cereb. Cortex 18, 828–836 (2008).
Article PubMed Google Scholar
Birdsong, D. Plasticity, variability and age in second language acquisition and bilingualism. Front. Psychol. 9, 81 (2018).
Article PubMed PubMed Central Google Scholar
Bley-Vroman, R. The evolving context of the fundamental difference hypothesis. Stud. Second. Lang. Acquis. 31, 175–198 (2009).
Article Google Scholar
Waye, M. M. Y., Siu, C. O., McBride, C., Ho, C. S. H. & Wong, C. W. Association of the DYX1C1 gene with Chinese literacy in a healthy Chinese population by latent class and LASSO analyses. In Proceedings of the Hawaii International Conference on System Sciences (HICSS) (University of Hawaii at Manoa, 2018).
Rimfeld, K., Dale, P. S. & Plomin, R. How specific is second language-learning ability? A twin study exploring the contributions of first language achievement and intelligence to second language achievement. Transl. Psychiatry 5, e638 (2015).
Article CAS PubMed PubMed Central Google Scholar
Eberhard, D. M., Simons, G. F. & Fennig, C. D. Ethnologue: Languages of the World https://www.ethnologue.com/ (2019).
Zhao, H., Chen, Y., Zhang, B. & Zuo, P. KIAA0319 gene polymorphisms are associated with developmental dyslexia in Chinese Uyghur children. J. Hum. Genet. 61, 745–752 (2016).
Article CAS PubMed PubMed Central Google Scholar
Lim, C.K.-P., Wong, A.M.-B., Ho, C.S.-H. & Waye, M.M.-Y. A common haplotype of KIAA0319 contributes to the phonological awareness skill in Chinese children. Behav. Brain. Funct. 10, 23 (2014).
Article PubMed PubMed Central Google Scholar
Devanna, P., Dediu, D. & Vernes, S. C. The genetics of language: From complex genes to complex communication. In The Oxford Handbook of Psycholinguistics 865–898 (Oxford University Press, 2019).
Google Scholar
Newbury, D. F. et al. CMIP and ATP2C2 modulate phonological short-term memory in language impairment. Am. J. Hum. Genet. 85, 264–272 (2009).
Article CAS PubMed PubMed Central Google Scholar
Einarsdottir, E. et al. Mutation in CEP63 co-segregating with developmental dyslexia in a Swedish family. Hum. Genet. 134, 1239–1248 (2015).
Article CAS PubMed PubMed Central Google Scholar
Whitehouse, A. J. O., Bishop, D. V. M., Ang, Q. W., Pennell, C. E. & Fisher, S. E. CNTNAP2 variants affect early language development in the general population. Genes. Brain. Behav. 11, 501–501 (2012).
Article Google Scholar
Wang, J. et al. The interactive effect of genetic polymorphisms of IL-10 and COMT on cognitive function in schizophrenia. J. Psychiatr. Res. 136, 501–507 (2020).
Article PubMed Google Scholar
Chen, Y., Zhao, H., Zhang, Y.-X. & Zuo, P.-X. DCDC2 gene polymorphisms are associated with developmental dyslexia in Chinese Uyghur children. Neural Regen. Res. 12(2), 259 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mary, M. Y. et al. Study of genetic association with DCDC2 and developmental dyslexia in Hong Kong Chinese children. Clinical Practice & Epidemiology in Mental Health, 13(1), 104–114. https://doi.org/10.2174/1745017901713010104 (2017).
Zhang, Y. et al. Association of DCDC2 polymorphisms with normal variations in reading abilities in a Chinese population. PLoS ONE 11, e0153603 (2016).
Article PubMed PubMed Central Google Scholar
Newbury, D. F. et al. Investigation of dyslexia and SLI risk variants in reading- and language-impaired subjects. Behav. Genet. 41, 90–104 (2011).
Article CAS PubMed Google Scholar
Dennis, M. Y. et al. A common variant associated with dyslexia reduces expression of the KIAA0319 gene. PLoS. Genet. 5, e1000436 (2009).
Article PubMed PubMed Central Google Scholar
Matsson, H. et al. SNP variations in the 7q33 region containing DGKI are associated with dyslexia in the Finnish and German populations. Behav. Genet. 41, 134–140 (2011).
Article PubMed Google Scholar
Kong, R. et al. Genetic variant in DIP2A gene is associated with developmental dyslexia in Chinese population. Am. J. Hum. Genet. 171, 203–208 (2015).
Google Scholar
Lim, C. K., Ho, C. S., Chou, C. H. & Waye, M. M. Association of the rs3743205 variant of DYX1C1 with dyslexia in Chinese children. Behav. Brain. Funct. 7, 16 (2011).
Article CAS PubMed PubMed Central Google Scholar
Wong, P. C. M., Ettlinger, M. & Zheng, J. Linguistic grammar learning and DRD2-TAQ-IA polymorphism. PLoS ONE 8, e64983 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Association of the DYX1C1 dyslexia susceptibility gene with orthography in a Chinese population. PLoS ONE 7, e42969 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhao, Y. et al. Association between FOXP2 gene and speech sound disorder in a Chinese population. Psychiatry Clin. Neurosci. 64, 565–573 (2010).
Article CAS PubMed Google Scholar
Mozzi, A. et al. A common genetic variant in FOXP2 is associated with language-based learning (Dis)abilities: Evidence from two Italian independent samples. Am. J. Med. Genet. B. Neuropsychiatr. Genet. 174, 578–586 (2017).
Article CAS PubMed Google Scholar
Rice, M. L., Smith, S. D. & Gayán, J. Convergent genetic linkage and associations to language, speech and reading measures in families of probands with specific language impairment. J. Neurodev. Disord. 1, 264–282 (2009).
Article PubMed PubMed Central Google Scholar
Shao, S. et al. The roles of genes in the neuronal migration and neurite outgrowth network in developmental dyslexia: Single- and multiple-risk genetic variants. Mol. Neurobiol. 53, 3967–3975 (2016).
Article CAS PubMed Google Scholar
Venkatesh, S. K., Siddaiah, A., Padakannaya, P. & Ramachandra, N. B. Analysis of genetic variants of dyslexia candidate genes KIAA0319 and DCDC2 in an Indian population. J. Hum. Genet. 58, 531–538 (2013).
Article CAS PubMed Google Scholar
Cope, N. et al. Strong evidence that KIAA0319 on chromosome 6p is a susceptibility gene for developmental dyslexia. Am. J. Hum. Genet. 76, 581–591 (2005).
Article CAS PubMed PubMed Central Google Scholar
Bates, T. C. et al. Genetic variance in a component of the language acquisition device: ROBO1 polymorphisms associated with phonological buffer deficits. Behav. Genet. 41, 50–57 (2011).
Article PubMed Google Scholar
Matsson, H. et al. Polymorphisms in DCDC2 and S100B associated with developmental dyslexia. J. Hum. Genet. 60, 399–401 (2015).
Article CAS PubMed PubMed Central Google Scholar
Vaughn, K. A. & Hernandez, A. E. Becoming a balanced, proficient bilingual: Predictions from age of acquisition & genetic background. J. Neurolinguistics 46, 69–77 (2018).
Article PubMed PubMed Central Google Scholar
Siok, W. T., Perfetti, C. A., Jin, Z. & Tan, L. H. Biological abnormality of impaired reading is constrained by culture. Nature 431, 71–76 (2004).
Article ADS CAS PubMed Google Scholar
Siok, W. T., Niu, Z., Jin, Z., Perfetti, C. A. & Tan, L. H. A structural–functional basis for dyslexia in the cortex of Chinese readers. Proc. Natl. Acad. Sci. USA. 105, 5561–5566 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Kang, X., Matthews, S., Yip, V. & Wong, P. C. M. Language and nonlanguage factors in foreign language learning: Evidence for the learning condition hypothesis. npj Sci. Learn. 6, 1–13 (2021).
Article Google Scholar
Dörnyei, Z. & Ushioda, E. Motivation, Language Identity and the L2 Self (Multilingual Matters, 2009).
Book Google Scholar
Allison, P. D. Missing Data (Quantitative Applications in the Social Sciences Book 136) (SAGE Publications Inc, 2001).
Google Scholar
Bollen, K. A. & Noble, M. D. Structural equation models and the quantification of behavior. Proc. Natl. Acad. Sci. USA 108(Suppl 3), 15639–15646 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Dollaghan, C. & Campbell, T. F. Nonword repetition and child language impairment. J. Speech. Lang. Hear. Res. 41, 1136–1146 (1998).
Article CAS PubMed Google Scholar
Stokes, S. F., Wong, A.M.-Y., Fletcher, P. & Leonard, L. B. Nonword repetition and sentence repetition as clinical markers of specific language impairment: The case of Cantonese. J. Speech. Lang. Hear. Res. 49, 219–236 (2006).
Article PubMed Google Scholar
Rothermundt, M., Peters, M., Prehn, J. H. M. & Arolt, V. S100B in brain damage and neurodegeneration. Microsc. Res. Tech. 60, 614–632 (2003).
Article CAS PubMed Google Scholar
Ullman, M. T. Contributions of memory circuits to language: The declarative/procedural model. Cognition 92, 231–270 (2004).
Article Google Scholar
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Button, K. S. et al. Power failure: Why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).
Article CAS PubMed Google Scholar
Eriksson, M. et al. Differences between girls and boys in emerging language skills: Evidence from 10 language communities. Br. J. Dev. Psychol. 30, 326–343 (2012).
Article PubMed Google Scholar
Kahn-Horwitz, J., Shimron, J. & Sparks, R. L. Weak and strong novice readers of English as a foreign language: Effects of first language and socioeconomic status. Ann. Dyslexia 56, 161–185 (2006).
Article PubMed Google Scholar
Ripollés, P. et al. The role of reward in word learning and its implications for language acquisition. Curr. Biol. 24, 2606–2611 (2014).
Article PubMed Google Scholar
Gardner, R. C. & Lambert, W. E. Motivational variables in second-language acquisition. Can. J. Psychol. 13, 266–272 (1959).
Article Google Scholar
Dörnyei, Z. Conceptualizing motivation in foreign-language learning. Lang. Learn. 40, 75–78 (1990).
Article Google Scholar
Li, T., McBride-Chang, C., Wong, A. & Shu, H. Longitudinal predictors of spelling and reading comprehension in Chinese as an L1 and English as an L2 in Hong Kong Chinese children. J. Educ. Psychol. 104, 286–301 (2012).
Article Google Scholar
Sparks, R. L. Examining the linguistic coding differences hypothesis to explain individual differences in foreign language learning. Ann. Dyslexia 45, 187–214 (1995).
Article CAS PubMed Google Scholar
Bley-Vroman, R. What is the logical problem of foreign language learning? In Linguistic Perspectives on Second Language Acquisition 41–67 (Cambridge University Press, 1989).
Chapter Google Scholar
Wong, P. C. M., Morgan-Short, K., Ettlinger, M. & Zheng, J. Linking neurogenetics and individual differences in language learning: The dopamine hypothesis. Cortex 48, 1091–1102 (2012).
Article PubMed PubMed Central Google Scholar
Stein, C. M. et al. Association between AVPR1A, DRD2, and ASPM and endophenotypes of communication disorders. Psychiatr. Genet. 24, 191–200 (2014).
Article CAS PubMed PubMed Central Google Scholar
Carrion-Castillo, A. et al. Association analysis of dyslexia candidate genes in a Dutch longitudinal sample. Eur. J. Hum. 25, 452–460 (2017).
Article CAS Google Scholar
Becker, J. et al. Genetic analysis of dyslexia candidate genes in the European cross-linguistic. Eur J. Hum. 22, 675–680 (2014).
Article CAS Google Scholar
Ullman, M. T. The declarative/procedural model: A neurobiological model of language learning, knowledge, and use. In Neurobiology of Language 953–968 (Academic Press, 2016).
Chapter Google Scholar
Brown, L., Sherbenou, R. J. & Johnsen, S. K. Test of Nonverbal Intelligence (TONI 4) (PRO-ED, 2010).
Google Scholar
Hollingshed, A. B. Four factor index of social status. Yale J. Sociol. 8, 21–51 (2011).
Google Scholar
Dörnyei, Z. & Taguchi, T. Questionnaires in Second Language Research: Construction, Administration, and Processing (Routledge, 2009).
Book Google Scholar
H. K. E. A. A. Grading Procedures and Standards-referenced Reporting in the HKDSE. http://www.hkeaa.edu.hk/DocLibrary/Media/Leaflets/HKDSE_SRR_A4booklet_Mar2018.pdf (2018)
Mayer, M. Frog, Where Are You? (Dial Press, 1967).
Google Scholar
Macwhinney, B. The CHILDES Project: Tools for Analyzing Talk Vol. 8 (Erlbaum Associates, 2000).
Google Scholar
O’Roak, B. J. et al. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat. Genet. 43, 585–589 (2011).
Article PubMed PubMed Central Google Scholar
Hamdan, F. F. et al. De novo mutations in FOXP1 in cases with intellectual disability, autism, and language impairment. Am. J. Hum. Genet. 87, 671–678 (2010).
Article CAS PubMed PubMed Central Google Scholar
Horn, D. et al. Identification of FOXP1 deletions in three unrelated patients with mental retardation and significant speech and language deficits. Hum. Mutat. 31, E1851–E1860 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hoischen, A. et al. De novo mutations of SETBP1 cause Schinzel-Giedion syndrome. Nat. Genet. 42, 483–485 (2010).
Article CAS PubMed Google Scholar
Marseglia, G. et al. 372 kb microdeletion in 18q12.3 causing SETBP1 haploinsufficiency associated with mild mental retardation and expressive speech impairment. Eur. J. Med. Genet. 55, 216–221 (2012).
Article PubMed Google Scholar
Feng, G. et al. Neural preservation underlies speech improvement from auditory deprivation in young cochlear implant recipients. PNAS 115(5), E1022–E1031 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ingvalson, E. M. & Wong, P. C. M. Training to improve language outcomes in cochlear implant recipients. Front. Psychol. 4, 263 (2013).
Article PubMed PubMed Central Google Scholar
Kang, C. et al. Mutations in the lysosomal enzyme-targeting pathway and persistent stuttering. N. Engl. J. Med. 362, 677–685 (2010).
Article CAS PubMed PubMed Central Google Scholar
Rodenas-Cuadrado, P., Ho, J. & Vernes, S. C. Shining a light on CNTNAP2: Complex functions to complex disorders. Eur. J. Hum. Genet. 22, 171–178 (2014).
Article CAS PubMed Google Scholar
Zhang, T. et al. Association between CNTNAP2 polymorphisms and autism: A family-based study in the Chinese Han population and a meta-analysis combined with GWAS data of psychiatric genomics consortium. Autism Res. 12, 553–561 (2019).
Article PubMed Google Scholar
Li, X. et al. Association analysis of CNTNAP2 polymorphisms with autism in the Chinese Han population. Psychiatr. Genet. 20, 113–117 (2010).
Article PubMed Google Scholar
Clayton D snpStats: SnpMatrix and XSnpMatrix classes and methods. R package version 1.40.0. (2020).
R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2019).
Google Scholar
Venables, W. N. & Ripley, B. D. Modern Applied Statistics with S 4th edn. (Springer, 2002).
Book MATH Google Scholar
Rosseel, Y. lavaan: An R package for structural equation modeling. J. Stat. Softw. 48, 1–36 (2012).
Article Google Scholar

Download references

Acknowledgements

General: We thank Kay Hoi Yi Wong, Kynthia Yip, Doris Lau, Danny Ip, Tsz Yin Wong, Mavis Chan, and a group of student research assistants and transcribers for their assistance with data collection and analysis. We also wish to thank the Modern Languages instructional team at the Chinese University of Hong Kong (led by Annette Frömel and Celia Carracedo Manzanera at the time of the research) for their assistance with participant recruitment and general advice. We thank Xiujuan Geng for advice on statistical analysis, and Kara-Morgan Short for comments on the Spanish data.

Funding

The Research Grants Council of Hong Kong (HSSPF #34000118), the Dr. Stanley Ho Medical Development Foundation, and the Department of Linguistics and Modern Languages and Lui Che Woo Institute of Innovative Medicine at the Chinese University of Hong Kong provided funding for this work.

Author information

Authors and Affiliations

Department of Linguistics and Modern Languages, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Patrick C. M. Wong & Xin Kang
Brain and Mind Institute, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Patrick C. M. Wong, Xin Kang & Hon-Cheong So
Department of Otorhinolaryngology, Head and Neck Surgery, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Patrick C. M. Wong
Research Centre for Language, Cognition and Language Application, Chongqing University, Chongqing, China
Xin Kang
School of Foreign Languages and Cultures, Chongqing University, Chongqing, China
Xin Kang
School of Biomedical Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Hon-Cheong So
Department of Obsterics and Gynecology, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China
Kwong Wai Choy

Authors

Patrick C. M. Wong
View author publications
Search author on:PubMed Google Scholar
Xin Kang
View author publications
Search author on:PubMed Google Scholar
Hon-Cheong So
View author publications
Search author on:PubMed Google Scholar
Kwong Wai Choy
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: P.C.M.W.; Methodology: P.C.M.W., K.X., H.C.S., K.W.C.; Investigation: K.X., K.W.C.; Visualization: K.X.; Supervision: P.C.M.W.; Writing—original draft: P.C.M.W., K.X.; Writing, review and editing: P.C.M.W., K.X., H.C.S., K.W.C.

Corresponding authors

Correspondence to Patrick C. M. Wong or Xin Kang.

Ethics declarations

Competing interests

PCWM declares that he is an owner of a startup company supported by a Hong Kong SAR Government technology startup scheme for universities. The other authors declare no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wong, P.C.M., Kang, X., So, HC. et al. Contributions of common genetic variants to specific languages and to when a language is learned. Sci Rep 12, 580 (2022). https://doi.org/10.1038/s41598-021-04163-1

Download citation

Received: 07 June 2021
Accepted: 10 December 2021
Published: 12 January 2022
Version of record: 12 January 2022
DOI: https://doi.org/10.1038/s41598-021-04163-1

This article is cited by

A large-scale repository of spoken narratives in French, German and Spanish from Cantonese-speaking learners
- Xin Kang
- Virginia Yip
- Patrick C. M. Wong
Scientific Data (2023)
Dyslexia-related loci are significantly associated with language and literacy in Chinese–English bilingual Hong Kong Chinese twins
- Cheuk Yan Chung
- Dora Jue Pan
- Kwong Wai Choy
Human Genetics (2023)

Subjects

Abstract

Similar content being viewed by others

Language and nonlanguage factors in foreign language learning: evidence for the learning condition hypothesis

Genetic association analysis of human median voice pitch identifies a common locus for tonal and non-tonal languages

A genome-wide association study of Chinese and English language phenotypes in Hong Kong Chinese children

Introduction

Results

Stepwise regression models

Structural equation modelling (SEM)

Discussion

Methods and materials

Participants

Questionnaires

Proficiency of L1, L2, and L3

Genes and SNP genotyping

Statistical analysis

Stepwise regression

Structural equation modelling

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A large-scale repository of spoken narratives in French, German and Spanish from Cantonese-speaking learners

Dyslexia-related loci are significantly associated with language and literacy in Chinese–English bilingual Hong Kong Chinese twins

Search

Quick links