Introduction

Facioscapulohumeral muscular dystrophy (FSHD) is an autosomal inherited genetic disorder characterised by initial symptoms and signs of skeletal muscle atrophy and weakness in the face, shoulders, and upper arms [1]. Muscle weakness slowly progresses throughout the lifespan, sometimes in an asymmetric manner [2], often affecting other parts, including the lower limbs, pulmonary function-related muscles, and trunk muscles, leading to disabilities such as the loss of independent ambulation, wheelchair use, chronic respiratory dysfunction necessitating an assistance device [3, 4], sleep disorders, chronic pain and fatigue as well [5]. Facial weakness typically serves as an early indicator of disease onset, whereas lower-extremity involvement and wheelchair dependency indicate more advanced disease severity, although these are not observed in all cases as progression patterns vary among individuals. The disease often manifests in the second or third decade of life; however, the age at onset, levels of local penetrance, and total disease severity largely vary among individuals with FSHD. Early onset FSHD is defined by the clinical onset of facial involvement before 5 years of age and manifestations of shoulder girdle weakness before 10 years of age, often showing faster disease progression and more frequent dependence on wheelchair use [6, 7]. Extramuscular and systemic symptoms, such as high-frequency hearing loss [8, 9], retinal abnormalities resembling Coat’s disease [10,11,12], spinal deformities [13], epilepsy/mental retardation [14], and mild cardiac involvement [15] have rarely been observed. Early-onset or infantile FSHD averagely accounts for 10% of the total FSHD population [16]. These cases often present with more severe muscle phenotypes and more frequent systemic symptoms [13, 17], representing the severe end of the FSHD disease spectrum [18].

The prevalence of FSHD is estimated at 15:10,000 in the Dutch population, the highest to date [19], whereas it is lower in other populations partially due to fewer opportunities for access to genetic diagnosis, especially for asymptomatic or paucisymptomatic cases [20], possibly depending on the active research environment, or otherwise due to potential race differences [21, 22]. The sex difference in disease severity is implied where FSHD affects males more severely and more frequently than females [23,24,25], although other studies reported that females were more likely to progress to wheelchair use and predominantly occupied a category with severe phenotype [26,27,28], requiring further study with attention to differences in the detailed classification of disease severity, age related to hormonal activity and race. Furthermore, 10–30% cases are considered de novo, including those derived from parental germline mutations and other cases of gonadosomatic mosaicism occurring during early embryogenesis [29], whereas a recent family-based study in a cohort of Italian families estimated a frequency of as low as 8% for truly de novo cases [30].

Basic genetic classification and common local epigenetic de-repression by complex genetic backgrounds cause FSHD onset

Numerous investigations in clinical genetics and epigenetics have reported the central role of ectopic activation of embryogenic and germline gene DUX4 (double homeobox 4) in muscle-specific pathology and the genetic necessity of the ‘permissive allele’ to allow ectopic DUX4 expression in skeletal muscles [31]. Earlier efforts to understand the genetic cause of this disease using linkage analysis of FSHD families identified an FSHD-associated region in 4q35, a subtelomeric region on chromosome 4 [32]. The 4q35 region contains a type of large tandem repeat unit named the D4Z4 macrosatellite or simply a D4Z4 repeat, each unit of which consists of a 3.3 kb CpG-rich sequence. A wide range of copy number variations, up to 150 units, are found in D4Z4 repeats in the human population. Several early studies have demonstrated that the level of DNA methylation on D4Z4 repeats in at least one allele is significantly low, that is, hypomethylated, in cells derived from patients with FSHD, including muscle cells and other cell types from different tissue origins, such as blood, skin, and saliva [33,34,35,36,37,38]. Several studies have reported that the level of DNA hypomethylation in the D4Z4 region is correlated with disease severity (details are described later).

Two major factors are linked to differences in DNA methylation levels of D4Z4 repeats in the 4q35 region: copy number of D4Z4 repeat units (RUs) and mutations in chromatin regulators of this repeat. Most of FSHD cases (~95%) are found with the lower D4Z4 RU within the ‘contraction’ range of 1–10 RU, where the significant DNA hypomethylation of D4Z4 repeat on the contraction-associated allele coincides in clinical FSHD cases [37]. FSHD cases associated with D4Z4 repeat contractions were classified as FSHD type 1 (FSHD1). The other cases without contracted D4Z4 repeat observed are classified into FSHD type 2 (FSHD2), where the additional mutations can be found most frequently (80%) in SMCHD1 (the Structural Maintenance of Chromosomes flexible Hinge Domain-containing protein 1, OMIM: 614982) on chromosome 18p11 [39], or less frequently in DNMT3B (DNA methyltransferase 3B, OMIM: 602900) on chromosome 20q11 [40] and LRIF1 (the ligand-dependent nuclear receptor-interacting factor 1, OMIM: 615354) on chromosome 1p13 [41]. The D4Z4 repeat copy numbers in FSHD2 cases have been previously recognised as 10 RU to 20 RU. However, cases involving > 20 RUs have been reported, albeit less frequently [42]. The clinical features of patients with FSHD1 and FSHD2 were indistinguishable, suggesting that FSHD1 and FSHD2 result from the same pathophysiologic process [43]. SMCHD1 mutations are found in more severe symptomatic cases with 9–10 RU of D4Z4, a range conventionally categorised as FSHD1, than in FSHD1 cases with larger FSHD1 alleles (8–10 RU) [44,45,46]. This combined case was referred to as FSHD1 + 2. Symptomatic cases with relatively milder severity, asymptomatic and non-penetrant cases are often found among the individuals with 8–10 RU plus without SMCHD1 mutations [20, 47, 48], highlighting 8–10 RU as the ‘borderline’ or ‘grey zone’ of FSHD1 penetrance [49]. Interestingly, FSHD1 patients have been rarely found in the ‘grey zone’ in a Japanese and a Korean cohort [22]. This suggests potential differences in disease susceptibility among ethnic populations, though further research is needed to clarify whether these differences reflect ascertainment bias and diagnostic practices, genuine genetic background effects, or cultural factors such as diet. In addition, mutations in DNMT3B have been attributed to the worsening of disease progression as disease modifiers in FSHD1 [40]. Based on these genotype-phenotype observations, it is proposed that FSHD should be considered a disease continuum or spectrum depending on multiple factors, such as D4Z4 repeat copy numbers, DNA methylation levels in the D4Z4 region, and mutations in chromatin regulator genes, rather than undergoing a traditional simple classification into two discrete categories, FSHD1 and FSHD2 [46]. Therefore, the current common genetic diagnosis protocol allows for the overlapping category of FSHD1 and FSHD2 with 8–10 RU [49].

This concept of FSHD as a disease continuum also fits the clinical variability, in which a certain population of patients manifests earlier and/or more severe disease progression. A rough and inverse correlation between disease severity and the number of D4Z4 RUs in FSHD1 among 1–10 RU is observed [48, 50], although intrafamilial clinical variability and asymptomatic/non-penetrant cases are often observed over 4 RU [25]. Early-onset FSHD cases tend to be found within a shorter range of D4Z4 repeat (1–3 RU), although this shorter range of contraction does not always lead to a severe disease outcome or, rather, even an asymptomatic case in certain relatives carrying the same short allele as the proband [51]. In addition, even monozygotic twins with FSHD show extreme variability in the phenotype [52]. In contrast, affected carriers of a disease array of 7–10 D4Z4 RU, but not familial non-penetrant mutation carriers, have a greater reduction in D4Z4 DNA methylation [38]. Another study reported that DNA methylation in FSHD1-affected subjects was lower, with a higher response to epigenetically derepressing chemicals, than in non-manifesting familial subjects in the context of encoded gene activation [53]. These studies indicate that epigenetic derepression, represented by DNA hypomethylation, can serve as a significant hallmark and reliable predictor of clinical FSHD progression.

Each unit of the D4Z4 repeat contains a single open reading frame of a retrogene named DUX4 their oriented forward [54]. Each ORF in the repeat lacks a polyadenylation signal (PAS) sequence to stabilise its transcripts, except for the most distal unit, which allows for stable transcription and subsequent translation, depending on the sequence patterns with potential PAS-containing elements outside of the D4Z4 repeat. Human DUX4, and its murine orthologous gene Dux as well, is transiently activated during a limited window of early embryonic stages [55,56,57] and is expressed in tissues such as the testis and thymus (possibly reflecting on expression in the lymphocytes) in humans [58, 59], suggesting potential unknown functions, but is normally tightly silenced in most later developmental stages and somatic tissues. In an evolutionarily conserved context, DUX4 can utilise a PAS-containing exon located 10 kbp downstream of the D4Z4 repeat (in the T2T-CHM13 reference genome, but varies among haplotypes), and this splicing pattern is considered irrelevant to FSHD [58]. Intense genetic investigations revealed that although the 4q35 region has a variety of haplotypes in the human population [60], only a class of haplotypes called 4qA is relevant to FSHD manifestation, and other haplotypes, such as 4qB, are not [47, 61]. The 4qA haplotypes exclusively contain a segment of DNA called pLAM and a 6.2 kb β-satellite repeat [62]. The SNP present in the pLAM region creates a functional but non-canonical PAS (AUUAAA) sequence that is proximally located after the last D4Z4 unit, stabilises DUX4 transcripts to produce mature mRNAs, and allows DUX4 protein production [31]. Thus, the requirements of genetically typical cases of FSHD are conceptually summarised as the combination of DNA hypomethylation of the D4Z4 repeat and its cis-connected 4qA haplotype, which forms a permissive 4qA allele that potentially allows DUX4 activation and FSHD onset (Fig. 1).

Fig. 1
figure 1

Schematic representation of the D4Z4 repeat array on chromosome 4q35.2, the genetic locus associated with FSHD. Each D4Z4 unit is 3.3 kb in length. FSHD is caused by contraction of these repeats [1,2,3,4,5,6,7,8,9,10] or mutations in chromatin regulators such as SMCHD1, DNMT3B and LRIF1, leading to hypomethylation of the region and aberrant expression of the DUX4 gene in the most distal repeat. DUX4 expression additionally requires a permissive 4qA allele containing a polyadenylation signal to stabilize its transcript. Cases caused by D4Z4 contraction per se are categorized as FSHD1, while those associated with variants in other genes are categorized as FSHD2. These standard criteria accommodate cases that overlap both categories (such as FSHD1 with modifier effects from SMCHD1 variants). In FSHD1, disease severity roughly correlates inversely with the number of D4Z4 repeats: cases with 1–3 repeats tend to be most severe including more early onset cases, those with 4–7 repeats moderately severe, and those with 8–10 repeats relatively mild. Asymptomatic cases are more frequently observed among individuals with longer repeat arrays

DUX4 may have evolutionarily conserved biological functions that are essential during the early embryonic stages and in specific tissues that express its protein; however, when activated in skeletal muscle cells, DUX4 exerts its phenotypic effects through its transcriptional activity followed by multiple molecular pathways, leading to tissue-level degradation [63,64,65,66,67,68,69,70]. The currently available cell and animal models for investigating DUX4 function and FSHD have been summarised in other review articles [71, 72]. The molecular and cellular consequences of DUX4 activation in FSHD have been summarised in these review articles [73, 74]. Importantly, recent studies support the association between inflammatory magnetic resonance imaging (MRI) characteristics and the expression of genes regulated by DUX4 and other gene categories associated with FSHD disease activity at the local level within skeletal muscle tissues, indicating detailed disease progression, from inflammation or oedema to fat infiltration [75]. In contrast, a meta-analysis of published FSHD muscle biopsy gene expression studies revealed that PAX7 target gene repression in FSHD correlated with disease severity, independent of onsite DUX4 target gene expression [68, 76]. This apparent discrepant signature may occur via the epigenetic footprint of historical DUX4 activation during development and growth before the onset, when DUX4 is already on [77] and later accumulatively affects stem cell function after growth and during regeneration [78]. Thus, based on this supporting knowledge of the central role of DUX4 in pathology, methodologies to silence DUX4 activation, degrade DUX4 mRNA, and prevent or halt DUX4 protein transcriptional activity and downstream adverse toxic effects are being developed as promising target therapeutic approaches for FSHD [79]. In addition, supplementation strategy for muscle cell resources may be necessary for functional improvement of damaged muscles in severely degraded stages.

Learning from genetically atypical cases to understand essential genetic factors for FSHD onset

As described above, the genetic background of FSHD consists of large genomic loci and mutations at different loci. Although several clinical cases demonstrating FSHD symptoms can be genetically confirmed based on the typical requirements explained above, certain cases fall outside these criteria and yet may still arise from the same mechanism, in which a hypomethylated D4Z4 repeat with a cis-linked PAS-containing sequence forms a permissive allele that enables stabilised DUX4 expression. In addition, certain asymptomatic and non-penetrating cases, which are often found among family cohorts of patients with FSHD, can guide a more precise dissection of essential genetic/epigenetic elements and disease modifiers of FSHD onset. Intense investigations of FSHD cases by molecular combing revealed complex patterns of rearrangements other than the typical sequence of the reference genome, which can be overlooked with standard genetic diagnosis by Southern blot [80] (Fig. 2).

Fig. 2
figure 2

Schematic representation of the D4Z4 region in atypical FSHD and non-FSHD cases. In the duplication type, although one D4Z4 repeat array is contracted, traditional Southern blotting may misinterpret the duplicated units as a single long repeat array, masking the contraction. While D4Z4 repeats exist throughout the genome, a similar repeat array occurs on chromosome 10q. Contractions in 10qD4Z4 alone do not cause FSHD; however, translocation of a permissive 4qA allele to the 10q locus can result in 10q-linked FSHD. In the D4Z4 proximally extended deletion (DPED) cases, a large proximal deletion removes the Southern blot probe region, making detection of the pathogenic allele difficult. In mosaicism, only a fraction of somatic cells harbour D4Z4-contracted alleles, leading to DUX4 expression and FSHD symptoms that are typically milder than in contraction-matched cases without mosaicism

Subgroups of haplotypes

Although the haplotypes of 4q35 and 10q26 are often categorised into groups such as 4qA, 4qB, and 10qB, according to the necessity of PAS immediately distal to the D4Z4 repeat for FSHD manifestation, more small subtypes and subgroups with sequence variations inside and outside the D4Z4 repeats exist [47, 60]. A detailed analysis of these haplotypes revealed the necessity of PAS for DUX4 activation and FSHD onset [31]. Although at least 17 unique 4q haplotypes have been identified, only 4A161S, 4A161L, 4A159, and 4A168 have been reported to be associated with FSHD [73], and most of the other haplotypes, including 4qB and 10q, lack functional PAS and are not permissive. Interestingly, 4A166, among PAS-containing 4qA haplotypes, is not considered to be associated with FSHD, as no clinical FSHD cases have been found even when it had repeat contraction [47], indicating that certain critical genetic elements required for enhancing DUX4 activation may be altered or that certain de novo elements may be formed to affect DUX4 activation in this haplotype allele. However, another report claimed that 4A166 haplotypes with clinical phenotypes were observed [25, 81], thus requiring further clarification among different studies. A molecular genetic study reported a significant role of other nearby sequences in the 4qA β-satellite as a cis-acting element for DUX4 mRNA production [62], in addition to the PAS in 4qA pLAM. Further comparative investigations among subtypes may help identify the essential functional elements for DUX4 activation in FSHD muscle cells. Gene editing strategies for PAS in the permissive allele to achieve DUX4 reduction have yielded inconsistent results [82,83,84], possibly because distinct approaches might affect the molecular behaviours of regulators by modifying or provoking surrounding genetic elements associated with polyadenylation and transcription.

D4Z4 proximally extended deletion (DPED)

Genetic analysis in certain clinical cases demonstrated longer genomic deletions extending from contracted D4Z4 repeats to include sequences immediately upstream of the repeat, called D4Z4 proximally extended deletions (DPED) [80, 85,86,87]. These DPED cases manifested typical FSHD phenotypes, indicating that the deleted genomic region most likely does not play an essential role in FSHD pathology. An in vitro study suggested that myogenic regulatory elements that can activate DUX4 lie within the deleted region, although their significance is controversial based on this clinical genetic observation [88]. In addition, this deleted region contains DUX4c, FRG2, and long non-coding DBE-T [87], genes previously suggested as contributing factors to FSHD. However, their central significance remains uncertain, although FRG2B, another FRG2 family member, is located near 10q D4Z4 with potential compensation by their functional redundancy. In contrast, as several studies have reported long-range three-dimensional conformational changes between non-FSHD and FSHD chromatin [88,89,90,91,92,93], the DPED cases do not exclude the possibility of any regulatory elements that can activate DUX4 beyond the deleted region [94].

Translocation of 4qA to 10q

The D4Z4 repeat and its partial fragment are scattered throughout the whole human genome [95]. 10q26, the subtelomeric region of chromosome 10, contains the D4Z4 repeat, with each unit having a sequence difference that is technically recognisable by restriction enzyme sensitivity in genetic tests [60]. 10q D4Z4, similar to 4q D4Z4, also has variability in copy number, although contraction of 10q D4Z4 was not pathogenic when it was cis-linked to the prevalent typical 10q haplotypes [47]. Globally, population genetic studies have revealed that translocation events occur between 4q and 10q in non-FSHD and FSHD populations [80, 96,97,98,99]. The translocation of 4qA to 10q allows for a permissive allele on chromosome 10 and FSHD manifestation when the allele has a repeat contraction [30, 100]. The 10q-linked FSHD cases displaying a classical FSHD phenotype raise a question about the central pathological contribution of other 4 qter genes, such as WWC2, SORBS2, FAT1, and FRG1, as dysregulation of these genes did not appear to be consistently affected in those cells; however, upregulation of DUX4 and its direct target FRG2 was conserved. Although these cases may justify DUX4 and D4Z4 repeats as the only critical coding elements explaining FSHD manifestation, DUX4 could be activated by three-dimensional access of cis-active regulatory elements near D4Z4 on chromosome 10, which may again be muscle-specifically active, considering that enhancer-promoter communication can be triggered without proper chromatin regulation [101].

Duplication

In certain cases with clinical FSHD manifestations, cis D4Z4 array duplication or even triplication alleles with repeat contraction in either array and 4qA haplotype were found [80, 102,103,104]. Those alleles can cause FSHD onset both with or without variants of the FSHD2 genes. A typical Southern blot test may overlook those permissive alleles as the spacer sequence flanked by two D4Z4 arrays and repeat contraction on either array is not recognised, whereas the molecular combing assay and long-read sequencing assay can resolve this genetic rearrangement [80, 104]. These duplication cases also fit the model of the necessity of the permissive allele for FSHD onset, and DNA hypomethylation in the contracted array of the duplication alleles was also detected by nanopore analysis [104].

Although duplication cases can occur de novo in any of the permissive 4qA haplotypes, most cases were linked to 4qA-L, a European-specific 4qA class haplotype, indicating that these cases were derived from a limited number of ancestral origins in local regions. Interestingly, in one family, the proximal 17U array of the 17U + 2U duplication allele showed a methylation profile comparable to that of the distal 2U allele and lower than that of the 17U array of the 17U + 9U duplication allele [104], implying that DUX4 is transcribed from both the proximal and distal arrays or may even be predominantly from either array, which could be transcriptionally more actively regulated by distal elements as it is closer than the other. A detailed analysis of the duplicated alleles may be informative for understanding the mechanism of DUX4 activation, as chromatin conformation may also be affected, although hardly predicted, due to the potential insulator function of the D4Z4 sequence.

Gonosomal mosaicism

Mosaicism is often observed in FSHD, in which more than two chromosome 4 alleles and at least one permissive allele with a reduced proportion are found in a single proband [29, 30, 80, 105]. Rearrangements that produce repeat contractions causing FSHD1 can occur during early cell division, leading to gonosomal mosaicism, explaining approximately half of the de novo cases [30, 106]. In theory, although the rate of transmission of the FSHD1 allele from a parent with gonosomal mosaicism to the offspring is lower than that of a non-mosaic patient, the offspring will be more severely affected than the mosaic parent because all cells in the offspring contain the permissive allele. These mosaicism cases emphasise that permissive alleles are critical to FSHD manifestation; however, it is unclear whether the muscle cells with the permissive allele directly or indirectly affect the others without the permissive allele in the tissue environment, or whether the clinical impact of the permissive allele and DUX4 activation are limited locally within the positive cells and their niches. Mosaicism often leads to disease manifestation in males, but not so in females [29, 30], indicating sex-oriented regulatory mechanisms or disease susceptibility.

Given the standard classifications of FSHD1 and FSHD2, these complex genetically atypical cases can compromise result interpretation using the common diagnostic method of pulsed-field gel electrophoresis (PFGE) with Southern blotting. A recent consensus publication outlines the minimal requirements for genetic confirmation of both FSHD1 and FSHD2 [49]. The atypical rearrangements described above can be pathogenic per se (similar to FSHD1) or in combination with variants of genes causing FSHD2, which will be discussed in details next.

FSHD2 genes

Currently, variants that cause FSHD2 have been identified in SMCHD1, DNMT3B, and LRIF1. These factors are thought to regulate chromatin by promoting a closed state in the D4Z4 region. The pathogenic variants of SMCHD1 in FSHD2 are found over the whole coding region, the 3′UTR region and the introns of the gene, leading to haploinsufficiency or dominant negative effect [38, 39, 107, 108]. In addition, mutations in SMCHD1 cause the unrelated disorder Bosma arhinia microphthalmia syndrome (BAMS) [109], in which mutations in SMCHD1 are enriched exclusively in the extended ATPase domain, and DNA hypomethylation is often observed in D4Z4 [110]. Certain identical pathogenic variants of SMCHD1 are associated with these two seemingly unrelated disorders, suggesting that BAMS is likely caused by complex oligogenic or multifactorial mechanisms that only partially overlap at the SMCHD1 level, similar to FSHD2 [111]. Heterozygous mutations in DNMT3B (DNA methyltransferase 3 beta, MIM:602900) have been found in FSHD2 cases, depending on the PAS containing the 4qA allele and repeat number [40]. Biallelic DNMT3B mutations have been reported in autosomal recessive immunodeficiency, centromeric instability, and facial anomaly syndrome type 1 (ICF1 [OMIM: 242860]), in which DNMT3B activity is reduced, leading to DNA hypomethylation in a variety of repeat structures, including D4Z4 [40, 112,113,114]. The homozygous mutation in ligand-dependent nuclear receptor interacting factor 1 (LRIF1, also referred to as HBiX1, MIM:615354) was found in a Japanese FSHD2 case with 13 RUs in the permissive allele affecting only the longer isoform LRIF1L expression [41], whereas the heterozygous mutation may function as a disease modifier in the FSHD1 family. FSHD2 appears to be involved in the establishment of high DNA methylation in D4Z4 repeat. Interestingly, SMCHD1 knockout in FSHD myoblasts allowed DUX4 activation but did not recapitulate the DNA hypomethylation observed in the cells derived from patients with FSHD2, suggesting that SMCHD1 is not actively involved in the maintenance of DNA methylation in somatic cells [115]. This indicates that SMCHD1 plays a critical role in gene regulation and establishment of DNA methylation.

There still exist genetically undiagnosed cases of typical FSHD clinical manifestations without repeat contractions or mutations in known FSHD2 genes. Considering that all these known FSHD2 genes are related to X chromosome inactivation (Xi) in females [116, 117], the mechanism of chromatin regulation in both biological contexts likely partially overlaps, and other unknown FSHD2 genes might be found in Xi-associated factors because certain FSHD2 cases remain unsolved for conclusive causative variants in any gene. Clinical and molecular investigations of binding proteins on the D4Z4 repeat have provided a list of potential FSHD2 candidates. Of the variants in these genes, CTCF, DNMT1, DNMT3A, EZH2, and SUV39H1 have been reported to potentially contribute to FSHD pathology in clinical FSHD cohorts with permissive alleles [118], whereas further investigation is needed with functional analysis for confirmation as FSHD2 genes and/or disease modifiers of FSHD1. To date, the prevalence of FSHD2 variants in candidate genes, except SMCHD1 and DNMT3B, is extremely rare. This might be because the other candidate genes likely have more general functions in gene regulation broadly on the genome compared to SMCHD1 and DNMT3B, which have limited targets on the genome and/or in certain biological contexts, and may have minimal tissue-specific impacts on developmental progress when mutated.

DNA hypomethylation: a potential predictor of symptom outcome

Despite their distinct genetic backgrounds described above, significant DNA hypomethylation in the D4Z4 repeat of the permissive allele is common among FSHD1, FSHD2, and all atypical FSHD cases, indicating its potentially critical contribution to pathology. In FSHD1 cases, DNA hypomethylation occurs in the permissive allele but not in other alleles without repeat contraction, as the reduction in methylation is associated with repeat contraction. In contrast, in FSHD2, hypomethylation can occur in all D4Z4 repeats in both permissive and non-permissive alleles of chromosomes 4 and 10, as variants of chromatin regulators such as SMCHD1 potentially affect all arrays [37, 38]. Although numerous studies are available on the analysis of D4Z4 DNA methylation comparing FSHD cases, non-FSHD controls, and non-penetrant carriers with distinct methods, such as methylation-sensitive restriction enzyme-based, bisulphite conversion-based, and long-read sequencing, certain studies might apply protocols that may produce potentially significant bias in PCR amplicons targeting D4Z4 sites after bisulfite conversion, observing caution when interpreting the published results and selecting a proper method for analysis [119].

Jones et al. carefully compared two of the published protocols with identical genetically confirmed samples and concluded that their protocol successfully reproduced the expected detection of differences in D4Z4 methylation, where FSHD1, FSHD2, and non-FSHD cases could be distinguished, potentially offering a widely accessible diagnostic for FSHD using saliva DNA. This method consists of two distinct assays: the ‘BSX’ assay analyses the DR1 regions to represent the average level of DNA methylation in the whole D4Z4 units with certain preference to 4q D4Z4 due to the primers with mismatches to typical 10q D4Z4 and the ‘BSSA’ and ‘BSSL’ assays the distal most D4Z4 unit of 4qA and 4qA-L haplotype, respectively, to represent the methylation level of the distal unit of the permissive allele [119]. This method is compatible with high-throughput sequencing, as well as Sanger sequencing [120]. The combined outcomes of these assays distinguished between non-FSHD, FSHD1 and FSHD2, as only the BSSA/L assay revealed reduced methylation in FSHD1 samples, whereas both BSSA/L and BSSX assays demonstrated reduced methylation in FSHD2.

Recently, several independent studies have demonstrated that the methylation level of the most distal D4Z4 unit of the permissive allele is correlated with disease severity and progression, similar to or rather than the number of D4Z4 RUs of the permissive allele, motivating the use of this regional DNA methylation level as a predictor of disease progression [120, 121]. Another study reported that distal hypomethylation of the permissive allele was detected in a case of DPED, supporting its diagnostic application for differential diagnosis in patients with suspected FSHD, including complex structural variants [122].

Long-read sequencing methods have emerged as promising diagnostic tools as alternatives to bisulfite PCR-based methods. Several studies have shown that long-read sequencing can successfully identify the number of D4Z4 repeats, haplotypes and DNA methylation levels in each D4Z4 unit along the sequential order of all alleles without PCR amplification bias and DNA fragmentation [95, 123,124,125,126,127]. In this methodology, all alleles, even if one donor sample contains the same haplotypes, such as 4qA/4qA, potentially mixing contracted and non-contracted alleles in FSHD1, can be separately assembled and analysed for DNA methylation if reads that are long enough to cover the regions proximal and distal to the D4Z4 repeat in one read are obtained. In theory, this methodology appears tolerant to the atypical FSHD cases described above, which would otherwise require other types of more specialised methods, such as optical genome mapping and molecular combing for genotyping. However, using those methods, it might be difficult to separately analyse DNA methylation levels of more than one distal region of each D4Z4 array. Therefore, long-read sequencing can potentially offer an efficient, regular diagnostic method in the future if the run cost becomes more feasible to obtain statistically sufficient numbers of high-quality long reads for assembling each allele with a quantitative methylation level. Several published cases now apply Cas9-mediated enrichment of D4Z4-associated fragments, requiring additional procedures compared to the protocol for whole-genome sequencing [95, 123, 126], whereas whole-genome sequencing or other types of enrichment protocols may allow the detection of variants in FSHD2 genes and the feasible application of this technique with a few days of simple workflow.

Interestingly, long-read sequencing studies demonstrated that DNA methylation increased from the proximal to the distal unit within the D4Z4 repeat, suggesting a mechanistic dependency of distal D4Z4 repeat methylation on the number of D4Z4 RU [95, 123]. The age of onset correlated with both the D4Z4 RU number and methylation in FSHD1 [128]. Statistical modelling studies have demonstrated that shorter D4Z4 repeats are predictive of an earlier age at onset, although longer disease duration may better explain more severe disease progression than repeat number [28]. Recent clinical studies, some of which used the FSHD Comprehensive Clinical Evaluation Form (CCEF) for the systematic dissection of disease phenotype [129], still suggest significant clinical variation in FSHD1, especially in a longer range of contracted D4Z4 repeat [130,131,132,133]. This variation implies an unknown disease modifier of FSHD. On the other hand, medical comorbidities and medication use appear to be highly linked to potentially severe outcomes such as wheelchair dependence [28]. Double trouble, referring to a condition with other coexisting neuromuscular genetic diseases, may explain certain atypical cases [133]. These observations call for careful specification of the intrinsic phenotype of FSHD [133] and its secondary risk factors that accelerate disability. Evidence of anticipation, an intergenerational worsening of the disease, has been observed [134,135,136], which may be explained by parental mosaicism and/or intergenerational changes in distal D4Z4 methylation [127, 137].

DUX4 activation in muscles: leakage rather than active mode

Although D4Z4 hypomethylation has been observed in a variety of cells, such as blood cells, saliva, skin fibroblasts, pluripotent cells and myoblasts derived from patients with FSHD [37, 138], the robust DUX4 activation is typically only observed in skeletal muscle biopsies, cultured differentiated myoblasts and cultured lymphocyte lineages of patients with FSHD [58, 64, 139, 140], demonstrating tissue/cell type specificity in the activation mode. Moreover, DUX4 is not equally activated in a muscle cell population; however, it seems that DUX4 is expressed during a certain stage of muscle differentiation sporadically with a very low frequency (highly variable, more or less than ~1 in 1000), as revealed by immunostaining [58, 141,142,143] and the single cell/nucleus transcriptome [144,145,146,147]. Although DUX4 activation in somatic muscle cells is not necessarily limited to patients with FSHD, it sometimes occurs at a significantly lower frequency [141]. The correlation between local inflammatory MRI scores and DUX4 expression in FSHD muscle tissues [75, 148, 149] may indicate that disease progression depends on the sporadic probability of DUX4 activation per cell. The mathematical simulations demonstrate DUX4 drives significant cell death despite expression in only 0.8% of live cells [150]. Interestingly, the frequency of DUX4-positive cells in cell culture appears to be conserved per clone, independent of cell passage [53], indicating a deterministic regulatory mechanism underlying sporadicity. A reporter assay monitoring the DUX4 promoter showed sporadic activation, particularly for sense orientation [151]. Importantly, in this study, the lentivirus-mediated promoter fragment (several fragments appeared to be randomly integrated into the genome per cell) showed obvious clonal and cell-type differences in burst rate, emphasising the importance of cellular context and surrounding genomic landscape which can mediate enhancer-promoter communication, rather than direct activation of the promoter, and eventually determining transcriptional burst frequency. However, in the promoter, given the correlation between the degree of epigenetic D4Z4 de-repression and DUX4 expression in an in vitro test among familial cohorts [53] and between disease severity and distal D4Z4 hypomethylation [120, 122], the level of hypomethylation likely regulates disease progression by affecting DUX4 frequency, as the promoter of translatable DUX4 lies in the most distal D4Z4.

However, because methylation scores in bulk BSS analysis roughly and inversely represent the ratio of cells with hypomethylated permissive alleles, and therefore majority of FSHD muscle cells likely show significant DNA hypomethylation, there exists a clear discrepancy between the DUX4 frequency (less than 1%) and the ratio of hypomethylated alleles/cells (more than 50%). This indicates that D4Z4 DNA hypomethylation is not sufficient for myogenic activation of DUX4, which requires more factors specific to the cell type and stage in limited contexts of DUX4 activation. Several studies are available on potential context-dependent factors and pathways that directly activate DUX4 such as telomere shortening [152], PARP1 [153], herpesviruses infection [154, 155], oxidative stress [138], DNA damage [138, 156, 157], unstable G-quadruplexes [158], active chromatin regulators [159,160,161,162] and SIX transcription family [163]; and also suppress DUX4 such as Wnt/β-catenin signalling [164], β2-adrenergic receptor (β2AR) pathway via cyclic adenosine monophosphate (cAMP) dependent on p38 mitogen-activated protein kinases (MAPKs) and/or protein kinase A (PKA) [161, 165,166,167,168] and repressive chromatin regulators [159, 169,170,171,172]. Most studies have not evaluated whether the observed changes in DUX4 expression due to these stresses and by targeting the signals reflected the frequency of DUX4-positive cells in a pool or expression per cell. Nevertheless, for instance, the physiological level of oxidative stress increased DUX4 expression in FSHD muscle cells, which was correlated with frequency [138]. DUX4 activators can be theoretically categorised into two groups: the first is essential for the basal expression, and the second has only an additional impact above the basal level. The SIX transcription family is involved in skeletal muscle development, although it is not exclusively expressed in the muscle lineage [173]. The latter group is unlikely to be a major driver of myogenic DUX4 activation in FSHD but may explain variations in disease progression, such as left-right asymmetricity. If a strong transcriptional activator binds directly to the DUX4 promoter, it produces a positive state more frequently than the widely observed rare ratios. There may be distal regulatory machinery that is less efficient and allows the leakage of DUX4 activation. Thus, the cell-type specificity of DUX4 activation in FSHD is not well understood, which is highly linked to the question of why FSHD is a ‘muscle-restricted’ disease.

FSHD as a muscle disease

Based on clinical observations, FSHD affects a limited number of tissues, primarily the skeletal muscles. The concept of DUX4 in pathology theoretically suggests that disease conditions and tissue specificity can be determined by DUX4 activation, represented by the level/frequency of DUX4 expression and DUX4 biological effects, including toxicity through transcriptional activity and myogenesis perturbation by DUX4 binding [174] (Fig. 3). Based on this, the question of why the FSHD genetic background ‘does not’ affect most other organs can be answered by the absence or negligible levels of DUX4 activation and/or DUX4 biological effects.

Fig. 3
figure 3

Model of DUX4 expression and downstream responses leading to FSHD onset and progression. The expression of DUX4 requires contraction of the D4Z4 repeat array, subsequent DNA hypomethylation and the presence of a permissive 4qA haplotype. Additionally, mutations in chromatin modifiers known to cause FSHD2, along with other intracellular signalling pathways or external stress factors, may contribute to a permissive epigenetic environment for DUX4 expression in the pre-DUX4 phase. Sporadic DUX4 expression in muscle tissue induces FSHD-specific pathological phenotypes downstream. On top of DUX4 intrinsic cytotoxicity, the cascade of various molecular events it triggers contributes to irreversible pathological changes in muscle tissues in the post-DUX4 phase. Moreover, the progression of pathology may be influenced not only by intracellular responses, but also by environmental factors through daily behaviours, which may either exacerbate or mitigate disease severity. Intrinsic genetic factors that influence disease severity at various points along these pathways act as disease modifiers in FSHD, potentially including genes and regulatory elements

The whole atlas of somatic DUX4 expression patterns in FSHD is currently not available; however, most cells in most organs in FSHD appear negative; however, certain skeletal muscle lineages and peripheral blood mononuclear cells, especially lymphocytes are positive [175], suggesting cell type-specific regulation. Extramuscular symptoms provoke questions regarding the possibility of any other somatic cell type expressing DUX4. Interestingly, a recent mouse study with muscle-specific DUX4 overexpression demonstrated a retinal abnormality resembling that of FSHD, implying a muscular phenotype as the origin and interorgan mechanism [70]. Hearing loss is bilateral [9], in contrast to the widely observed asymmetry in the muscle phenotype. The cell types directly affected by these extramuscular symptoms require further investigation to determine the possibility of DUX4 expression in these cells or the effects of secreted/systemic factors derived from the affected muscles. Moreover, skeletal muscle fibres are among the few cell types with multinucleate potential. This multinucleate state, resulting from the fusion of mononucleate myocytes that share the cytoplasm, allows DUX4 to diffuse into other nuclei, causing the entire cell to undergo deterioration [176]. This characteristic may explain why skeletal muscles are uniquely vulnerable to DUX4 toxicity compared with other organs composed of mononuclear cells.

Heterogeneity in intra-individual muscles

Certain muscles are affected earlier and others are affected later in FSHD, which was recently elegantly captured using whole-body MRI scans for regional fat infiltration [177]. However, mechanisms underlying this heterogeneity remain largely unknown. The facial muscles are among the earliest regions affected by typical clinical FSHD. Facial and trunk muscles differ in their developmental origin during mesodermal specification [178]. Differences in the characteristics of muscle stem cells with different developmental origins [179] may explain the regional preference for active disease. DUX4 leaves muscle cells in a biologically vulnerable state to metabolic stress, such as oxidative stress [180,181,182,183] and abnormal iron accumulation [70]. Hyaluronic acid pathway and autophagy pathways are involved in DUX4 toxicity and prevention, respectively [184, 185]. The intrinsic susceptibility to such stresses and differences in the metabolic capacity of each pathway may explain the regional preference of affected muscles and phenotypic differences in individuals, implying potential protective disease modifiers and lifestyle-linked environmental factors behind these pathways (Fig. 3). Regarding sex bias, controversial insights into sex hormones as protective factors were made from experimental and clinical studies [186,187,188,189,190], necessitating further clinical observations with consideration of the relevant temporal functions of sex hormones on skeletal muscles, including the establishment of a stem cell pool [191]. This potential protective effect of female hormones and melatonin may be mediated by an increase in miR-675, followed by both the inhibition of DUX4 toxicity and reduction in DUX4 expression [192]. Sexual dimorphisms in mitochondrial protein adaptation to exercise may also contribute to potential sex differences as well [193]. Muscle fibre types may contribute to regional heterogeneity because a reduction in sarcomeric force in type II but not type I muscle fibres was observed in FSHD [194], with potentially higher susceptibility to DUX4 stress or frequency of DUX4 activation in type II fibres. The topography of muscle abnormalities caused by Fat1 loss-of-function in mice resembled that of patients with FSHD, and thus, lower FAT1 expression in regional muscles that are affected at early stages of FSHD progression was proposed to be relevant to heterogeneity [195, 196].

The involvement of the perturbed immune system and inflammation in the pathology of the FSHD muscle has been pronounced. Immune cell infiltration [149, 197, 198] and elevated complement levels in the plasma and biopsies of FSHD subjects [149, 199]. Ectopic DUX4 expression in muscle cells activates immune mediators [63] and suppresses MHC Class I expression [200]. Muscle-specific chronic DUX4 expression in mice resembles the pathological features of FSHD muscles, including inflammation and fibrosis [201]. Bilateral comparison revealed that, in contrast to the often-observed intra-individual left–right asymmetric symptomatic manifestations captured by histopathological features in muscle biopsy, whole-muscle fat infiltration, proportion of immune cell populations, and D4Z4 DNA methylation were likely symmetric [149], suggesting a systemic mechanism of disease progression that might be relevant to extra-muscular symptoms through circulation.

Therapeutic development

Based on the current understanding of DUX4 central role, various therapeutic approaches have been demonstrated in preclinical studies, with some progressing to clinical trials. In alignment with the central dogma, these include: small molecules to suppress DUX4 transcription such as a p38 inhibitor, which was tested in recent clinical trials but showed insufficient efficacy [166,167,168] (refer to NCT05397470 at ClinicalTrials.gov); epigenetic editing strategies to silence transcription by modifying the gene to a repressive state using inactive CRISPR (clustered regularly interspaced short palindromic repeat)-based targeting approaches [202,203,204], with one approach planned for clinical trials (refer to NCT06907875 at ClinicalTrials.gov); genetic editing strategies targeting the DUX4 locus, including removal of the PAS sequence to reduce DUX4 mRNA; antisense oligonucleotides to degrade DUX4 mRNA, for which several clinical trials are ongoing or planned [79] (refer to NCT07038200 and NCT06131983 at ClinicalTrials.gov). Apart from DUX4 targeting strategies, clinical trials are ongoing for an antibody against myostatin, an Interleukin-6 receptor antagonist, and an adrenergic β2 receptor agonist, which aim to achieve efficacy by blocking the suppressive effect of myostatin on muscle volume and by mitigating inflammation and potentially curtailing fibrofatty degeneration in FSHD, respectively (refer to NCT05548556, NCT06222827, and NCT06721299 at ClinicalTrials.gov). In summary, while no established treatment currently exists, numerous therapeutic approaches are in various stages of development for FSHD.

Conclusion

Research has revealed the essential mechanisms underlying FSHD onset, thereby accelerating the development of targeted treatments focused on DUX4 pathology. However, the diverse clinical presentations among individuals with FSHD, stemming from specific genetic conditions that lead to different outcomes, call for deeper investigations of this disease from a broader perspective. In this sense, it is essential to uncover the mechanism of sporadic DUX4 activation and the epigenetic regulation of the D4Z4 repeat, providing insights into potential factors that influence disease outcome through gene regulation. In addition, understanding the extent to which genetic factors can or cannot explain the clinical diversity is critical. This is particularly important when considering intrafamilial clinical differences and the observed variations across sex and race. This knowledge will support not only the development of better drug targets and therapeutic options but also improve individual care management on a daily basis. It may reveal strategies to delay disease progression without costly medical interventions and help foster an inclusive society for all individuals with genetically confirmed FSHD. Although beyond the scope of this review, the effects of daily behaviours, such as exercise, nutrition, and mental state on disease progression, are also important areas for exploration. The combination of cutting-edge scientific approaches and insights from the lived experiences of individuals with FSHD such as patient-reported outcomes with large cohorts will promote a harmonised understanding and comprehensive solutions spanning genetics and epigenetics to patient experience.