Introduction

Neurofibromatosis type 1 (OMIM 613,113) is a relatively common and well known neurocutaneous disorder caused by pathogenic genetic alterations of the NF1 gene, which has a high mutation rate1. NF1 is a multisystem disorder characterized by skin pigmentary features, multiple neurofibromata, optic pathway glioma (OPG), Lisch nodules, sphenoid dysplasia, and pseudoarthrosis of a long bone. The NF1 gene is mapped to chromosome 17q11.2. It consists of 61 exons and spans over 282 kb of genomic DNA. It encodes the protein neurofibromin which is one of the players in the RAS/MAPK, PI3K/mTOR, and cAMP pathways2. It is also regarded as a tumour suppressor, with a role in regulation of cellular growth, proliferation and differentiation3. The neurofibromin consists of 9 functional domains namely the cysteine/serine-rich domain (CSRD), tubulin-binding domain (TBD), GTPase-activating protein-related domain (GRD) Sec14-like domain (Sec14), pleckstrin homology-like domain (PH), HEAT-like repeat regions (HLR), C-terminal domain (CTD), nuclear localization signal region (NLS), and the syndecan-binding region (SBR)4. The diagnosis of NF1 can be made with reference to the revised NIH criteria which include ≥ 6 café au lait macules, axillary or inguinal freckling, ≥ 2 neurofibromas or Lisch nodules, plexiform neurofibromas, optic pathway glioma, distinctive osseous lesion, and the germline NF1 pathogenic variant5. The mutation spectrum of NF1 is wide and includes single nucleotide variants and indels scattered across the entire gene, single/multiple exon deletion/duplication, and whole gene deletion, etc. There was no clear genotype–phenotype correlation except for specific variants6, such as the c.5425C > T p.(Arg1809Cys) missense variant which correlates with facial features of Noonan-syndrome, and the c.3826C > T (p.(Arg1276*) nonsense variant which correlates with cardiovascular disorders7,8. This lack of genotype–phenotype correlation has made it difficult for the medical professionals to predict the clinical outcome of their patients and to tailor follow-up plan based on the molecular findings.

Single nucleotide variants and indels can generally be classified as truncating or non-truncating. Protein function is easily affected by the shortening of the protein by truncating variants9, while non-truncating variants often affect the protein binding site or enzyme active site or alter the protein 3D structure10,11,12. A protein domain is defined as the functional and/or structural unit in a protein. Non-truncating variants of the same type in the same protein domain tend to produce similar phenotypes because protein function is similarly affected. In addition, protein domains and variant types have been included in the ACMG 2015 variant interpretation guideline13. An increasing number of genes have been found to have multiple disease-causing mechanisms associated with domains and types of variants14,15. Domain-specific genotype–phenotype studies are required and should also consider the type of variant in the studies.

Previous domain-specific analyses have demonstrated the ability to guide clinical management based on molecular evidence16. Patients with non-truncating variants in the Ras GTPase-activating protein domain (RAS-GAP, or GRD) were found to have a significantly higher percentage of congenital heart anomalies. A complete analysis involving other domains of this gene, as well as truncating and non-truncating variants, would facilitate variant interpretation and disease mechanism studies.

In this study, we used data from our own NF1 cohort as well as data from 12 publications to conduct an in-depth investigation of the relationship between clinical features, domains, and types of variants, which can be used to guide the variant interpretation to benefit clinical care planning and support the investigations into disease-causing mechanisms.

Methods

Ethics statement

This project was reviewed and approved by the University of Hong Kong Human Research Ethics Committee (HREC reference number: EA240038). This was a secondary data analysis project, and informed consent was not required by the Committee. All procedures were performed in accordance with relevant guidelines and regulations.

Our cohort

Clinical features and genotype data were obtained from 738 individuals with molecularly confirmed NF1 in the Clinical Genetics Service Unit, Hong Kong Children’s Hospital, Hong Kong SAR, China (hereinafter referred to as the CGS cohort. For details see Ho et al., 2022).

Public cohort

Among 121 search results of ‘NF1 variant’ and ‘genotype’/ ‘phenotype’ in PubMed (on 17 December 2022), only 25 publications provided both clinical features and NF1 variants of each case7,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40. The large amount of variants prevents us to re-classified all of them according to ACMG, ACGS guidelines, or ClinGen recommendations, but sequence variants were further filtered and only variants with at least one pathogenic or likely pathogenic ClinVar submission were kept to construct the public cohort with reported pathogenic and likely pathogenic variants. A collection of 925 NF1 sequence variants and clinical features were obtained from these 25 publications.

Variant analysis

The transcript NM_000267.3 has the largest number of variations in Leiden Open Variation Database (LOVD) NF1 database (https://databases.lovd.nl/shared/transcripts/00014502). For comparison with the previous publication of our institute16, transcript NM_000267.3 was used. Based on transcript NM_000267.3, variants were classified as truncating or non-truncating variants by variant consequences41,42. Variant consequences involving splice acceptor variant, splice donor variant, splice region variant, start lost, nonsense, frameshift variants, and microdeletions were considered as truncating variants. While non-truncating variant consequences involve missense variant, intron variant, inframe insertion, and inframe deletion. For variants with RNA study supporting the out-of-frame splicing effect, involving c.1466A > G, c.479G > C, and c.5791 T > C, are classified as truncating variants43.

Clinical features studied in this study are listed in Table 1.

Table 1 Clinical features of individuals.

Protein domains were obtained from a recent publication4 instead of from UniProt which contains only 2 of the protein domains and were based on 3 previous publications44,45,46.

The number of truncating and non-truncating variants per domain was obtained by bedtools v2.25.047. The Fisher’s exact test was used to compare distributions of categorical variables48. A p-value < 0.001 was considered statistically significant.

Results

Demographic data

A total of 1663 individuals were included in the NF1 dataset comprising 809 females, 801 males, and 53 individuals of unreported sex. There are 1,210 (72.76%) ≥ 8 years old individuals. Table 1 summarises the frequencies of 32 clinical features. The top 3 most frequent clinical features are CALMs (Café-Au-Lait Spots or Café-Au-Lait Macules) (94.58%), freckling (71.04%), and cutaneous neurofibromas (42.92%). All of them are included in the NIH diagnostic criteria of NF15.

Domain VS Clinical Features

All 9 protein domains of the neurofibromin were studied (Fig. 1).

Fig. 1
figure 1

Distribution of NF1 variants (Reference transcript: NM_000267.3). Stacked bar plot showing the number of unique variants in exons. Splice site mutations are included in the closest exon. Protein domains and tertiles are shown below the stacked bar plot. CSRD: cysteine/serine-rich domain. TBD: tubulin-binding domain. GRD: GTPase-activating protein-related domain. Sec14: Sec14-like domain. PH: pleckstrin homology-like domain. HLR: HEAT-like repeat regions. CTD: C-terminal domain. NLS: nuclear localization signal region. SBR: syndecan-binding region.

Considering all the truncating and non-truncating variants, a total of 34 pairs out of the 288 investigated pairs of domains and clinical features were identified to have statistically significant associations (p-value < 0.001) and all of them are novel (Table 2, Table S1, Figs. 2, and 3). For clinical features with common age of onset ≥ 8 years old, only samples fulfil the age criteria were included in the comparison.

Table 2 Significant associations between variants in domains and clinical features (p-value < 0.001).
Fig. 2
figure 2

Categories of clinical features associated with domains. Number in the sub-bar represents the number of clinical features associated with a domain.

Fig. 3
figure 3

Clinical features with statistically significant associations with domains.

Associations between domains and clinical features were further grouped by categories because clinical features of the same categories are usually evaluated together during the diagnostic process (Fig. 2). Variations in frequencies between different domains and clinical features outlined in the NIH criteria for neurofibromatosis type 1 have been noted. In addition, among the domains analysed, it was observed that variants in the CSRD, TBD, GRD and PH domains were positively associated with clinical features overlapping with Noonan syndrome. Clinical features in NIH criteria and features overlapping with Noonan syndrome are further discussed below.

Clinical features in NIH criteria

Eight NIH criteria-related clinical features were included in this study: CALMs, freckling, Lisch nodules, optic pathway glioma, pseudoarthrosis of a long bone, sphenoid bone dysplasia, plexiform neurofibromas, and cutaneous neurofibromas. Half of these clinical features were found to have association with domains, except freckling, optic pathway glioma, pseudoarthrosis of a long bone and sphenoid bone dysplasia.

Patients with variant in the CSRD domain were found to have statistically significant higher frequencies of 3 NIH criteria-related clinical features: Lisch nodules (47.50% vs. 40.39%, OR = 2.06, CI 1.44–2.95, p < 0.001), plexiform neurofibromas (35.52% vs. 16.63%, OR = 2.76, CI 1.91–397, p < 0.001), and cutaneous neurofibromas (70.62% vs. 48.83%, OR = 2.62, CI 1.78–3.59, p < 0.001).

Clinical features overlapping with Noonan-syndrome

Phenotypic overlap between neurofibromatosis type 1 and Noonan-syndrome has been discussed in many studies 19,49. In this study, we included nineteen clinical features that overlapped with Noonan syndrome and investigated the domain-specific genotype–phenotype correlation. Four groups of clinical features were studied, including 5 facial features, 6 neurological and behavioural features, 4 cardiovascular abnormalities, as well as 4 growth and musculoskeletal features.

Noonan-syndrome facial features

Five Noonan-syndrome facial features in this study include broad nasal bridge, hypertelorism, ptosis, down-slanting palpebral fissures, and bulbous nose. Frequencies of certain Noonan-syndrome facial features increased in the middle tertile (TBD, GRD, and PH) of the NF1 gene (Fig. 2).

Ptosis was found to have positive associations with variants in GRD and PH, respectively. Patients with the GRD variants were found to have an increased frequency of ptosis compared to other patients (36.52% vs. 1523%, OR = 3.20, CI 2.17–4.70, p < 0.001). A similar pattern can be seen in comparing patients with the PH variants (63.64% vs. 15.97%, OR = 9.18, CI 5.23–16.44, p < 0.001) to other patients. Increased frequency of hypertelorism was found in patients with TBD (45.83% vs. 19.67%, OR = 3.45, CI 1.81–6.52, p < 0.001) and GRD (32.50% vs. 18.47%, OR = 2.12, CI 1.41–3.17, p < 0.001) variants compared to other patients.

Neurological and behavioural features

The neurological and behavioural features in this study include delayed cognitive development, mental retardation, speech delay, attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (ASD), and Epilepsy. Similar to the Noonan-syndrome facial features, frequencies of certain neurological and behavioural features increased in the middle tertile (TBD, GRD, and PH) of the NF1 gene (Fig. 2).

When comparing the frequency of clinical features in patients with variants in the domain to other patients, the frequency of delayed cognitive development was found to be increased in patients with variants in the PH domain (52.12% vs. 27.75%, OR = 2.83, CI 2.01–3.99, p < 0.001).

Likewise, higher frequencies of speech delay were observed in patients with variants in TBD (32.56% vs. 12.52%, OR = 3.37, CI 1.58–6.86, p < 0.001), GRD (24.31% vs. 11.25%, OR = 2.53, CI 1.56–4.05, p < 0.001) and PH domain (44.74% vs. 12.04%, OR = 5.89, CI 2.81–12.22, p < 0.001).

In a similar manner, patients with variants in the PH domain (37.84% vs. 14.34%, OR = 3.63, CI 1.67–7.62, p < 0.001) were found to have a higher frequency of ADHD when comparing with other patients without variants in the specific domain.

Cardiovascular abnormalities

Increased frequencies of cardiovascular abnormalities were found in patients with variants in the GRD domain (18.30% vs. 6.92%, OR = 3.01, CI 2.02–4.47, p < 0.001) when compared to other patients without variants in the specific domain. This finding demonstrates the importance of regular cardiac follow-up for patients with variants in the GRD.

Growth and musculoskeletal features

Scoliosis, macrocephaly, short stature, and pectus deformity were included in the growth and musculoskeletal features in this study. An association between the PH domain and pectus deformity was observed. Patients with PH domain variants were 8.16 times (CI 3.80–17.14) more likely to develop pectus deformity (p < 0.001) and 3.15 times (CI 1.60–6.03) more likely to develop short stature (p < 0.001). Similarly, short stature was found to have a higher frequency in patients with the GRD variants (29.31% vs. 12.18%, OR = 2.98, CI 1.78–4.95, p < 0.001) compared to other patients.

Based on the above, the frequency of certain growth and musculoskeletal features was increased in the middle tertile (GRD and PH) region of the NF1 gene.

Types of variants VS clinical features

To investigate whether different types of variants disrupt NF1 function by affecting the protein domain, we compared the frequencies of clinical features between individuals with a specific type of variants in the domain and other individuals with the same type of variants but outside of this domain.

For non-truncating variants, twenty-six statistically significant pairs of clinical features and domains were found (Table S2). Twenty-five pairs of them are new findings, except the patients with variants in the GRD domain have a higher frequency of congenital heart disease mentioned in the previous study16. Pectus abnormalities considered more frequent in patients with non-truncating variants in the GRD domain16 were found to have no such association in this study. Similarly, 2 clinical features (Lisch nodules and cutaneous neurofibromas) considered less frequent in patients with non-truncating variants in the GRD domain16 were found to have no such association in this study.

2 clinical features were found to have statistically significant positive associations with non-truncating variants in domains while no such significant associations were found in when considering all variants in domains. Non-truncating variants in PH domain are positively associated with epilepsy and hypertelorism.

When comparing the prevalence of specific clinical features of patients with truncating variants in the domain to that of patients with truncating variants in other domains, no pairs of domains and clinical features with statistically significant associations (p-value < 0.001) (Table S3) were found.

Comparison of truncating variants and non-truncating variants with or without the clinical features was conducted to further explore the association between the types of variants and clinical features regardless of the domain. Sixteen clinical features were found to have significant positive associations with truncating variants or non-truncating variants (Table S4). Higher frequencies of Lisch nodules and CALMs were found in the patients with truncating variants compared with non-truncating variants.

Variant consequences VS clinical features

To investigate whether certain variant consequences have statistically significant associations with clinical features, we compared the frequencies of clinical features between individuals with a specific type of variant and other individuals with other variant consequences (Table S5). Four variant consequences, frameshift variant, stop-gained, missense variant, and in-frame deletion, were found to have positive associations with a total of twenty clinical features.

The associations between variant consequences, domains, and clinical features were studied for the top 3 variant consequences in this study: missense (48%), stop gained (15%), and frameshift (13%) variants. Twenty-eight clinical features have statistically significant associations with missense variants (Table S6). But only one clinical feature, speech delay, has a statistically significant association with frameshift (Table S7). No significant association for stop-gained variants (Table S8).

Variants not in any domain

Besides variants in domains, there are 446 variants not in any domains included in this study, including 245 truncating variants and 201 non-truncating variants. Distributions of clinical features, types of variants, domains and not in any domains were studied. When considering all variants, eighteen out of thirty-two clinical features were found to have higher frequencies in variants that do not belong to any known domains compared with variants in domains (Table S9). Similarly, the numbers become twenty-nine for truncating variants (Table S10) and two for non-truncating variants (Table S11).

Discussion

Larger sample collections yield more precise associations

To the best of our knowledge, the largest NF1 study16 that studied the associations between domains and clinical features only considered the GRD domain and non-truncating variants. Since this study is twice the size of the previous NF1 study considering all domains and both types of variants, it enables us to discover 99.17% (120/121) novel findings with statistically significantly different frequencies (p-value < 0.001). Consistent with the previous study which only considered the GRD domains, congenital heart disease was found to be associated with the non-truncating variants in the GRD domain. This study has demonstrated the same analysis workflow can be applied to other domains of NF1 gene.

Despite the emergence of numerous new findings, there is still considerable scope for improvement in our understanding. Regarding optic pathway glioma, two previous studies50,51 present conflicting results, whereas the most recent study aligns with our findings that there is no statistically significant correlation between various domains and optic pathway glioma. However, it is worth noting that patients with PH domain variants exhibited a lower risk of developing optic pathway glioma, as supported by this study. Notably, our research encompassed a cohort of 970 patients aged 7 years and older, nearly tripling the participant count of previous optic pathway glioma studies. Surprisingly, none of the patients with PH domain variants in our study were diagnosed with optic pathway glioma. Furthermore, we observed associations between all domains of NF1 and at least one clinical feature, except for NLS and SBR domains, which had a limited number of variants. Similarly, clinical features with low frequencies within our cohort did not exhibit associations with any specific domain. These suggest that there is still room for improvement in the diversity of samples in the cohort.

Overall, this analysis workflow can be extended to other genes as long as clinical details and molecular information were collected properly. Complex population studies, like the 100,000 Genomes Project52, Singapore 10 K Genome Project53, All of Us Research Program54, and on-going Hong Kong Genome Project55, can boost genotype–phenotype association discovery by providing both clinical details and molecular information.

Diversified association patterns between categories of clinical features and domains

Distinct groups of clinical features exhibit varying patterns of associations with different domains. 3 clinical features involved in NIH NF1 diagnostic criteria were found to have significantly increased frequencies in patients with variants in CSRD when compared to patients without variants in the corresponding domain.

When compared to patients without variants in the corresponding domain, patients with the middle tertile domain GRD variants were found to have increased frequency of clinical features across all four categories of clinical features overlapped with Noonan syndrome. While patients with TBD or PH variants only have increased frequency related to two and three categories of clinical features overlapped with Noonan syndrome, respectively. Overall, different categories of clinical features demonstrate distinct patterns of associations with the domains which suggests variants in domains might affect the protein function in different ways leading to various damages of different systems.

The PH domain is the second smallest domain of NF1 (101 amino acids). Variant c.5425C > T p.(Arg1809Cys) in this domain is associated with facial features of Noonan-syndrome including hypertelorism, ptosis, low-set ears, webbed neck, and triangular face7. In this study, variants in the PH domain are positively associated with a Noonan-syndrome facial feature, ptosis. This reaffirmed the previous finding based on a single variant and extended it to a domain level. For cases with variants in the PH domain and ptosis, 31 out of 42 were impacted by the variant c.5425C > T. This suggests the association between the PH domain and Noonan features is mainly due to this variant.

A previous study showed the variant c.3826C > T p. (Arg1276*) associated with cardiovascular abnormalities8. The variant c.3826C > T is in the GRD domain. There are only 2 patients with this variant in our cohort. One of them does not have the congenital heart disease and hypertension. The other one does not have the information of these two clinical features. In this study, variants in the GRD domain are positively associated with 2 cardiovascular features: hypertension and congenital heart disease. The GRD domain is positively associated with congenital heart disease when considering only non-truncating variants. However, no significant association was found when considering only truncating variants. This suggests that the association between the GRD domain and cardiovascular features is mainly due to non-truncating variants.

Truncating variants escaping nonsense-mediated decay

Truncating variants can lead to shortened mRNA or shuffled parts of the mRNA sequence by changing the reading frame. These abnormal mRNAs are believed to be eliminated by Nonsense-mediated mRNA decay (NMD). Observing truncating variants in NF1 patients with various clinical features implies that there should be some abnormal mRNA escaping the NMD and leading to abnormal phenotypes which is consistent with the hypothesis mentioned in a previous study56.

In addition, the significant associations between truncating variants in specific domains and clinical features imply that there are various lengths of the gene products that involve different functional domains and these domains which might be truncated or partially altered will eventually lead to different clinical features. However, further analyses involving RNA and protein expression studies are required to help scientists to articulate the story.

Conclusions

This study is the first study for eight domains of the NF1 gene on domain-specific and types of variants-specific basics. Utilisation of the clinical features of individuals with molecularly confirmed variants of the CGS cohort and publications enables us to associate the phenotypes, domains, and types of variants, the result of which can act as a practical guideline for clinical management. Yet, persistently comprehensive collections of molecular evidence and clinical features are required to promote the NF1 study further.