Introduction

Lupus nephritis (LN) is a common manifestation of systemic lupus erythematosus (SLE), affecting over 50% of patients and typically developing within the first five years after SLE diagnosis1,2,3. LN arises from an inflammatory response to immunogenic chromatin, like oxidized mitochondrial DNA and nucleic acids in exosomes. Accumulated chromatin activates DNA/RNA sensors (TLR7, TLR8, TLR9, RIG1/MDA5–MAVS, CGAS–STING), leading to increased type I interferon and cytokines4,5,6,7. Genetic polymorphisms (e.g., BANK1, BLK, TLR7) persistently activate B cells and autoantibodies production8,9. Complement proteins also play a role, with reduced circulating levels and intra-renal deposition leading to LN10.

Several studies in Taiwan have identified genetic variants that contribute to LN. A cross-sectional analysis pointed out that in the Taiwanese population, the presence of HLA-DRB10301 and DRB11501 alleles significantly increases the risk of SLE, while the presence of DRB1*1202 allele appears to be protective against LN11. Another study identified specific genetic variants in the IFN-λ3/4 genes, including notable variants—rs8099917, rs12979860, rs4803217, and rs469415590—that significantly increase the risk of developing LN in patients with SLE12.

Our study aims to showcase the utility of genotype–phenotype association study in predicting and understanding LN through data from participants in the Taiwan Precision Medicine Initiative (TPMI)13. We finally highlighted four variants—rs1025129, rs80282109, rs516119, and rs134545—that were associated with LN. Additionally, we characterized variants rs80282109 (in the BACH2 gene intron) and rs1025129 (in the upstream region of the HGF gene), as BACH2 and HGF are crucial for immune cell function. These findings provide significant genetic insights into LN risk variants within the Taiwan Han population.

Materials and methods

Ethics

This study utilized genome typing data from participants enrolled in accordance with the regulations of the Tri-Service General Hospital Institutional Review Board (IRB), under approval number 2-108-05-038. The organization and operation of the IRB comply with Good Clinical Practice (GCP) and the applicable laws and regulations. All participants provided written informed consent, allowing the collection and analysis of their data. Appropriate security measures will be implemented to safeguard the data, and all research team members are required to strictly uphold the confidentiality of identifiable information or establish relevant mechanisms. All investigators must report any Serious Adverse Events and Unanticipated Problems in accordance with governmental laws and regulatory requirements.

Participants enrolled in study

The study participants were recruited from Tri-Service General Hospital (TSGH) as part of the Taiwan Precision Medicine Initiative (TPMI), a collaborative effort between Academia Sinica and 16 leading medical centers in Taiwan. The primary objective of TPMI is to establish a comprehensive database containing detailed clinical information and genetic profiles of one million individuals from the Taiwanese Han population. In total, 27,529 participants were recruited through TPMI and underwent genotyping conducted by Academia Sinica. Participants were enrolled from various medical centers. Table 1 summarizes the age and sex distribution of TSGH participants included in this study, comprising individuals with lupus nephritis (LN), systemic lupus erythematosus (SLE), and controls. For the control group, participants were eligible if they had no autoimmune diseases, although other medical conditions were permitted. Exclusion criteria included a history of blood transfusion, leukemia, lymphoma, or prior chemotherapy.

Table 1 Participants information in this study.

For the experimental group, 244 individuals were diagnosed with SLE through diagnostic interviews conducted by eight clinical rheumatologists, following the 2019 European Alliance of Associations for Rheumatology (EULAR)/American College of Rheumatology (ACR) classification criteria. Among them, 63 individuals were identified as having lupus nephritis (LN), based on the presence of the renal disorder variable defined in the ACR criteria and/or biopsy confirmation according to the standards set by the International Society of Nephrology/Renal Pathology Society (ISN/RPS). All participants in the experimental group, similar to those in the control group, had no history of blood transfusion, leukemia, lymphoma, or chemotherapy.

Genotyping by TPM array

The process of DNA extraction and identification of nucleotide mutation in this study followed the following steps and these steps allowed researchers to analyze genetic variations and understand their association with various health conditions.: (1) DNA Extraction: Genomic DNA was extracted and purified from 3 mL of peripheral blood collected in EDTA vacutainers using the QIA symphony SP system by QIAGEN (Hilden, Germany). (2) Genotyping: Each participant’s purified DNA was loaded onto the Taiwan Precision Medicine (TPM) array chip. Genome type signals produced from the TPMI array were detected using the Axiom GeneTitan platform by Thermo Fisher Scientific (Sunnyvale, CA, USA). (3) Quality Control and Annotation: Genome SNP data quality control (QC), SNP calling, and sample annotations were performed using the Axiom Analysis Suite by Thermo Fisher Scientific (Sunnyvale, CA, USA).

Association study

Figure 1 outlines the steps involved in the genotype–phenotype association analysis. Initially, we analyzed genotyping data from the TPM array. SNPs (total 493,852 SNPs) obtained from TPM array with a low typing call rate (< 80%) were filtered out. The remaining SNP and phenotype data, comprising LN (n = 63), SLE (n = 244), and control (n = 27,529) groups, underwent association analysis using the chi-squared method with PLINK 1.9 software (https://zzz.bwh.harvard.edu/plink/)14. To find out different variants between SLE and LN, three traits were designed as LN (only LN patients together compared to control), SLE (only SLE patients together compared to control), and LN/SLE (LN patients compared to SLE patients without LN). We removed variants of low quality (minor allele frequency less than 0.05 and Hardy–Weinberg equilibrium less than 1 × 10–5) and selected variants with highly significant p-values (less than 1 × 10–6) for further investigation.

Fig. 1
Fig. 1
Full size image

Genotype–phenotype association analysis pipeline in this study. Case groups (LN: 63 patients and SLE: 244 patients) and control group (other 27,529 participants) underwent genotyping identified by TPM array chip. Data from 493,852 SNPs were filtered, and 239,080 SNPs that passed were applied to chi-squared test for detecting risk factors.

Variant annotations and functional analysis

Significant variants related genes were annotated by utilizing the RefSeq Database (https://www.ncbi.nlm.nih.gov/refseq/), as described in wANNOVAR (https://wannovar.wglab.org/)15, a powerful tool for annotating functional consequences of genetic variations. To assess allele frequencies across diverse racial populations, we turned to publicly available databases:

  1. 1.

    1000 Genomes Project16: This comprehensive resource catalogs common human genetic variation, drawing from openly consented samples of healthy individuals.

  2. 2.

    Genome Aggregation Database (gnomAD)17: An international collaboration that harmonizes exome and genome sequencing data from various large-scale projects, making summary data accessible to the scientific community.

  3. 3.

    Taiwan BioBank (https://taiwanview.twbiobank.org.tw/index): A locally curated human biobank in Taiwan integrates lifestyle, environmental factors, clinical medicine, and biological markers to create a foundation for personalized medicine and health in Taiwan18.

Results

According to the genotype–phenotype association results, eight significant variants (p-value < 10–6) were identified including in the LN, SLE, and LN/SLE group when comparing each genotype–phenotype association patterns in the Fig. 2. The Q–Q plot depicted in Fig. S1A, notable variant points lie above the diagonal line, indicating a potential association with diseases. The genomic inflation factor, lambda, calculated from the median of observed and expected chi-square values, approaches 1, suggesting negligible inflation. This indicates the robustness of our test statistics, allowing for confident interpretation of the results. Additional quality control analyses, including principal component analysis (PCA) for assessing population stratification, are shown in Fig. S1B. The PCA plot demonstrates no distinct clustering patterns, suggesting the absence of population stratification among the study samples. To assess potential kinship relationships among samples, IBD analysis was performed. No sample exhibited a low IBS genetic distance (< 0.1) to any other, indicating that no significant relatedness or kinship was detected among the experimental samples. Finally, we tested individual genotype success rates at thresholds of 0.01, 0.02, 0.03, 0.04, 0.05, 0.1, 0.2, 0.3, 0.4, and 0.5. After applying a missing genotype rate threshold of 0.01, 95.8% of samples remained (Fig. S2). Consequently, we affirm that our genotype–phenotype association findings are minimally biased. After searching genotype–phenotype association catalog database, variants rs73366469 and rs117026326 showed records of associations with SLE19,20,21 and other variants did not have any association to SLE or immune disease in the genotype–phenotype association catalog.

Fig. 2
Fig. 2
Full size image

Manhattan plot of genotype–phenotype association results in four traits. There were 239,080 variants detected in TSGH TPMI participants through a chi-squared test. Eight highly significant variants were selected according to the p-value < 10–6 (Bold points A-I on the plot) : rs12508616, rs2027856, rs117026326, rs73366469, rs76267797, rs80282109, rs516119, and rs134545. Variants with star symbol indicate records of associations with SLE in genotype–phenotype association catalog database. Circle from the inside to outside: LN, SLE, and LN/SLE.

Annotation analysis of these variants in Table 2 showed that four LN related variants, rs1025129, rs80282109, rs516119, and rs13454, were characterized in the different gene locations such as BACH2 (BTB Domain And CNC Homolog 2), CACNA2D1 (Calcium Voltage-Gated Channel Auxiliary Subunit Alpha2delta 1), HGF (Hepatocyte growth factor), SOX1(SRY-Box Transcription Factor 1), and TTC28 (Tetratricopeptide Repeat Domain 28). All variants demonstrated highly significant p-values, and the p-values adjusted using the Genomic Control (GC) method (Reference) also remained below 10⁻6. It is interesting that variants rs516119 presented negative effects (Odds Ratio < 1) to the LN, indicating that this particular genetic variant is less likely to develop the disease. Furthermore, when compared to each variant allele frequency in different populations in Table 3, SLE related variants (rs117026326, rs76267797, and rs73366469) showed high frequency not only in Taiwanese population (TPMI and Taiwan biobank) but also in Asian. When comparing LN-related variants, their allele frequency showed lower rate in the Taiwanese or Asian population compared to others, reflects the complex interplay of genetic diversity in Taiwan or Asia LN disease. Further investigation of the detailed allele frequency of variants in participants, as shown in Fig. 3, revealed that the LN-associated variants are specific to SLE and controls and may possibly be causal genetic factors in Taiwanese LN patients. Allele frequency of rs1025129, rs802809, and rs134545 in LN patients were higher than SLE and control. Moreover, a specific variant rs73366469 and rs117026326 in the SLE patients has similar frequency in both LN patients and these two variants could be risk factors to both LN and SLE.

Table 2 Variants identified from highly significant GWAS results.
Table 3 Allele frequencies of variants across multiple large genetic projects.
Fig. 3
Fig. 3
Full size image

Mutation frequency of significant variants on each group. Mutation frequency was presented as green to red color from low to high. LN and SLE group showed similar frequency in the SLE trait. Variants rs134545, rs516119, rs1025129, and rs80282109 with high frequency were observed in the LN group in LN and LN/SLE trait.

According to the genetic location and gene structure of the variants, LN-related variants support changes in genes through exon–intron splicing and gene expression regulation. Figure S3 shows that the genetic locations of two LN-related variants, rs80282109 and rs134545, are in the intronic region of BACH2 and TTC28, respectively. The other two variants, rs1025129 and rs516119, located in the upstream region of HGF and SOX1.

Discussion

LN presents significant clinical challenges due to its prevalence and complex treatment. Careful management is needed to mitigate long-term complications. Genetic factors greatly influence SLE and LN pathogenesis, with first-degree relatives of SLE patients having a significantly increased risk: 316 times higher for twins, 23 times for siblings, and 11 times for parents22. Over 100 susceptibility loci linked to SLE and LN involve genes related to lymphocyte function, immune signaling, DNA clearance, complement pathway, and renal injury23,24. Known mechanisms may indirectly influence LN development, despite the lack of direct proof connecting specific SNPs/genes to LN22.

Abundant studies in Taiwan have investigated various genetic loci among SLE patients. A study analyzed 2429 SLE patients and 48,580 controls in Taiwan using GWAS and polygenic risk scores (PRS). Multiple PRS models combining SLE, ANA, anti-dsDNA, and anti-Sm markers improved SLE prediction (AUC 0.64). Novel genes and HLA haplotypes (HLA-DQA101:01, HLA-DQB105:01) were also identified, aiding early diagnosis and precision medicine25. Another study investigated six SNPs in TLR3, TLR7, and TLR8 genes in 795 Taiwanese SLE patients and 1162 controls. TLR7 rs3853839-G was significantly associated with SLE susceptibility in females and several clinical features. TLR8 rs3764880-G was linked to oral ulcers and pericardial effusion. TLR3 rs3775296-T was associated with photosensitivity and anemia. Certain TLR7/TLR8 haplotypes influenced SLE risk, suggesting TLR variations contribute to SLE susceptibility and phenotypic diversity26.

The variant rs1025129, located in the upstream regulatory region of the HGF gene, is crucial due to the role of HGF in tissue regeneration and anti-inflammatory responses. HGF interacts with the c-Met receptor, activating pathways like PI3-kinase and STAT-327,28,29. HGF also promotes podocyte migration and epithelial-to-mesenchymal transformation, which is essential for renal injury repair in LN30. This variant may modulate HGF gene expression, influencing cell migration. In-vivo studies show that peak podocyte apoptosis, aligned with TGF-β1 expression and initial albuminuria, occurs before mesangial expansion in glomerulosclerosis models, highlighting TGF-β1’s role in fibrosis and inflammation31,32. The balance between HGF and TGF-β is critical for LN prognosis, emphasizing TGF-β’s dual role in renal disease progression and nephrotoxicity33,34.

Notably, the variant rs80282109, located in the intronic region of the BACH2 gene, illustrates the intricate link between genetic variations and immune function regulation. BACH2 is critical in the development and function of B and T lymphocytes, playing a pivotal role in maintaining immune homeostasis. Within B cells, BACH2 is essential for balancing differentiation into antibody-producing plasma cells and preserving the capacity of B cells to mature and react to antigens35. For T cells, BACH2 controls the differentiation of CD4 + T cells into various subtypes, such as regulatory T cells (Tregs) and effector T cells36. Elevated levels of BACH2 suppress “myeloid genes” in pre- and pro-B cells, steering them away from the lymphoid lineage37. A study found that LN patients who experienced more frequent relapses exhibited higher microRNA-148a (miR-148a) expression in naive and memory B cells, alongside lower BACH2 expression in B lymphocytes38. MiR-148a is prevalent in B cells and plasma cells and regulates critical B cell transcription factors such as BACH2, enhancing plasma cell differentiation and maintaining B cell tolerance39,40. The regulation of miR-148a over BACH2 suppresses the maturation and homeostasis of B lymphocytes41. Thus, the presence of variant rs80282109 may influence the splicing and subsequent expression of BACH2, potentially altering its function and impacting the immune regulatory pathways crucial in LN pathogenesis.

Conversely, rs516119, positioned upstream of the SOX1 gene, demonstrated negative effects on LN, suggesting a reduced likelihood of disease development associated with this genetic variant. The SOX gene family predominantly regulates sex determination and development, with certain subgroups playing critical roles in neuronal development. In the SOX gene family, the SOX1 gene belongs to the SOXB1 subclass, which are expressed in neuronal tissues and are believed to play a role in regulating cell differentiation42. More reports indicate that SOX4, a member of a different subgroup within the SOX gene family, is associated with the pathogenesis of LN, as it is essential for the survival of pro-B cells in mice43 and plays a role in TGF-β signaling, suppressing the Th2 immune response by regulating GATA344.

Intronic variant rs134545 located in the middle region of the TTC28 gene, could potentially influence gene splicing or regulatory elements, thereby affecting gene expression. Given the involvement of TTC28 in the mitotic cycle, this variant might have a broad impact on cell division and proliferation within immune cells. However, the link between TTC28 and LN remains underexplored in existing research. Previous studies indicate that TTC28 regulates the mitotic cell cycle and localizes in the midbody45. In the vascular endothelial cells of the blood–brain barrier (BBB), TTC28 interacts with ICAM1, a membrane receptor involved in multiple sclerosis (MS) pathogenesis. This interaction facilitates the transcellular migration of inflammatory T-cells into the CNS46,47, a key mechanism in MS development.

However, our study has limitations, including reliance on specific genotyping technologies like the TPM array chip and bioinformatics tools, affecting sensitivity and specificity. Additionally, the cross-sectional design limits understanding of temporal gene expression changes and disease progression, necessitating longitudinal studies for comprehensive assessment.

Conclusion

This genetic screening study among Taiwanese LN patients identified four significant variants: rs1025129 (upstream of HGF), rs80282109 (intronic region in BACH2), rs516119 (upstream of SOX1), and rs134545 (intronic region in TTC28). Variant rs1025129 may influence HGF expression, crucial for tissue regeneration and inflammation mitigation. Variant rs80282109 might alter the role of BACH2 in immune regulation. Variant rs516119 shows a potential protective effect against LN. Variant rs134545 in TTC28 could impact cell division and proliferation. These findings highlight the need for functional assays to explore their effects on gene expression and protein function, aiding in understanding LN pathogenesis. This genetic survey marks a significant step toward personalized medical approaches for diagnosing and managing LN, leading to targeted therapies and enhanced predictive models of disease susceptibility.

Limitations

The present study was limited by a small sample size, which precluded adequate assessment of statistical power in the GWAS analysis. Therefore, this investigation should be considered a preliminary exploration of genetic variants associated with LN and SLE. Future studies will focus on expanding participant recruitment to achieve a sufficient sample size, thereby ensuring robust statistical power and enabling more definitive conclusions.