Introduction

The global impact of the coronavirus disease 2019 (COVID-19) has underscored the urgent need to understand the more precise molecular mechanisms underlying its clinical manifestations, particularly the factors contributing to severe disease outcomes1. The clinical spectrum of COVID-19 ranges from asymptomatic cases to severe respiratory failure, yet the underlying molecular and omics drivers of these varied outcomes remain elusive. COVID-19 can trigger a cytokine storm—an excessive immune response characterized by the overproduction of cytokines—that results in tissue damage and widespread inflammation, which are strongly linked to severe disease2. In the most critical cases, the cytokine storm spreads to multiple organs, ultimately causing multi-organ failure and death3.

In recent years, big-data-driven precision multi-omics approaches have emerged as a powerful tool for dissecting the complex biological interactions that underpin pathologies. By integrating multiple layers of omics data, such as genomics, transcriptomics, and epigenomics, these precision medicine approaches can be utilized to gain a comprehensive understanding of disease mechanisms in instances, such as COVID-19. DNA methylation, as the most common epigenetic modification, plays a crucial role in regulating gene expression and can be influenced by infections, as well as other stressors4,5. Several studies have demonstrated that epigenetic regulation plays a role in the severity of COVID-194,6,7,8,9. Previously, DNA methylation was considered relatively stable than other epigenetic modifications. However, recent studies have revealed that DNA methylation can occur more rapidly than previously thought10. This is particularly evident when cells are exposed to challenging environments, such as infection caused by direct invasion of pathogens. RNA expression analysis is the most informative molecular method measuring the effect and severity of infectious diseases, such as COVID-19, because it directly and rapidly reflects the epigenetic changes caused by viral infection11. Therefore, Expression Quantitative Trait Methylation (eQTM) analysis has been successfully employed in several studies to investigate the relationship between DNA methylation and gene expression12,13,14. However, despite its potential, eQTM approaches have rarely been applied to investigate the connection between DNA methylation and RNA expression in relation to COVID-19 severity.

In this study, we employed a cis-eQTM analysis to investigate the integrated patterns of DNA methylation and RNA expression in 46 patients with varying degrees of COVID-19 severity. Our objective was to determine the molecular mechanisms associated with severe outcomes during the acute phase of the disease and to assess the dynamic changes in multi-omics of these molecular signatures during the recovery phase. These observations underscore the dynamic nature of epigenetic and transcriptomic changes in COVID-19 progression and recovery, providing valuable insights into potential healthcare and therapeutic targets for mitigating severe outcomes.

Results

Dynamics of gene-regulating CpGs and gene expression levels in COVID-19 severity

We analyzed blood-based multi-omic differences between severe-critical (SC) and mild-moderate (MM) cases during the acute phase of infection in 46 patients hospitalized with COVID-19 (Fig. 1A). The clinical follow-up period for all patients is presented in Fig. 1B. We confirmed their severity status based on clinical lab values, including blood cell counts and inflammatory biomarker levels. Notably, SARS-CoV-2 viral load, as measured by PCR cycle threshold (Ct) values for multiple gene targets (N, E, and R genes), did not differ significantly between severity groups (Supplementary Fig. 1), suggesting that differences in clinical outcomes were not attributable to viral burden alone but to host-intrinsic factors, including immune and epigenetic responses.

Fig. 1: Overall study design and follow-up blood sample collection for cis-eQTM analysis in COVID-19 patients.
Fig. 1: Overall study design and follow-up blood sample collection for cis-eQTM analysis in COVID-19 patients.
Full size image

A Overview of blood sampling and analysis design; The marker discovery cohort consisted of hospitalized COVID-19 patients, stratified into Mild-Moderate (MM) (N = 37) and Severe-Critical (SC) (N = 9) groups, and was used to identify severity-associated cis-eQTM markers. The comparison cohort was composed of two independent reference groups: the ‘Convalescent group’ (N = 90) comprising individuals who recovered from COVID-19 and provided a single blood sample 4 to 12 weeks post-infection, and the ‘Healthy Controls’ group (N = 344), consisting of pre-pandemic individuals with no history of SARS-CoV-2 infection. B Clinical follow-up duration of 46 Hospitalized COVID-19 patients. The green and red points indicate MM and SC groups, respectively. Sample C19-C014(*), excluded DEG analysis due to QC failure, was retained in methylation analysis. In panel (A), the DNA, RNA, and blood icons were adapted from Flaticon (https://www.flaticon.com), and the remaining elements were created using icons available in Microsoft PowerPoint.

Leave-one-out (LOO) analysis between SC and MM COVID-19 groups identified 648 hypermethylated and 1296 hypomethylated CpG sites in the severe disease. Cis-eQTM analysis further revealed that 732 of these differentially methylated CpGs (113 hypermethylated CpGs and 619 hypomethylated CpGs) were correlated with 928 genes located within 1 Mbp of the differentially methylated CpGs (DMCs). Transcriptome-wide differential expression analysis identified 297 differentially expressed genes (DEGs) between SC and MM COVID-19 groups that were consistently observed in at least seven LOO iterations. Of these, 283 genes were up-regulated and 14 down-regulated in the severe disease. The up-regulated genes were enriched in pathways related to erythrocyte dynamics, innate immune activation, platelet aggregation, and acetylcholine receptor signaling (Supplementary Fig. 2A), whereas down-regulated genes were predominantly involved in DNA mismatch repair (MMR) pathway (Supplementary Fig. 2B).

Integration of methylation and transcriptomic data through cis-eQTM analysis refined these genes to 16 DEGs that were proximal to 30 DMCs (Supplementary Table 1). Among these, 15 genes were up-regulated and a single gene, GCNT4, was down-regulated. The overlap of these DEGs with cis-eQTM loci indicates their expression is likely regulated by local methylation status (Supplementary Table 1).

From the final set of cis-eQTM genes, six key genes and their ten regulatory CpGs were selected for further analysis based on their previous association with COVID-19 severity (Table 1). These genes are mechanistically linked to viral entry, immune regulation, and oxidative stress–major hallmarks of severe disease. FURIN, SRXN1, and FKBP5, which were up-regulated in the SC group, had strong negative correlations with their associated CpG methylation levels (ρ = –0.602, –0.582 to –0.510, and –0.551, respectively; P = 0.003, –0.004 to 0.021, and 0.009, respectively), indicating that hypomethylation at these loci is linked to increased gene expression. FOXO3 and IL18RAP also showed moderate inverse correlations (ρ = –0.501 and –0.503; P = 0.025 and 0.024), suggesting methylation-dependent regulation. In contrast, GCNT4, the only down-regulated gene among the six, was positively correlated with methylation levels (ρ = 0.528; P = 0.015).

Table 1 Six key genes associated with the severity of COVID-19 in the acute phase

We examined the distributions of DNA methylation and expression of the six prioritized genes according to the infection phase. Mean methylation differences between MM and SC COVID-19 patients were statistically significant for all ten CpGs (Supplementary Table 2). The convalescent group, with samples collected from 4 to 12 weeks post-COVID-19 infection, showed no significant differences in methylation levels compared to the healthy control group (no viral infection at the time of collection). Notably, the convalescent individuals from the MM group during the acute phase presented no significant differences in their methylation profiles across nine of the ten CpGs, with the exception of chr20:1166154 which regulates SRNX1 expression (Fig. 2A, Supplementary Fig. 3). The pattern was mirrored at the gene expression level (Fig. 2B). All six key genes were either up- or down-regulated in the SC group compared to the MM group. Three of the six genes (FURIN, SRXN1, and FKBP5) displayed significant differences in mean expression only between severe patients compared to convalescent individuals, but not between the MM and convalescent group (Supplementary Table 3). As for markers with no previous report on COVID-19 severity, all 24 DMCs showed significantly altered methylated proportions (Supplementary Fig. 3). Among ten DEGs with no past association with COVID-19 severity, nine genes (except GABARAPL2) displayed significant expression changes between MM and SC (Supplementary Fig. 4).

Fig. 2: Multiomic Profiling of Six Key cis-eQTM Loci Reveals COVID-19 Severity-Associated Methylation and Gene Expression Changes Across the Infection Timeline.
Fig. 2: Multiomic Profiling of Six Key cis-eQTM Loci Reveals COVID-19 Severity-Associated Methylation and Gene Expression Changes Across the Infection Timeline.
Full size image

A, C DNA methylation levels (% methylated CpGs) at representative cis-eQTM sites, stratified by clinical group: Healthy controls (n = 310), Mild-Moderate (acute phase; n = 37), Severe-Critical (acute phase; n = 9), and Convalescent individuals (n = 90). Panel A presents five cell-type dependent cis-eQTM CpGs, whereas panel C illustrates a cell-type independent CpG. B, D) Gene expression levels (VST count) for the matched loci are displayed in panels B (cell-type proportion dependent) and D (cell-type proportion independent) across clinical group: Healthy controls (n = 35), Mild-Moderate (acute phase; n = 36), Severe-Critical (acute phase; n = 9), and Convalescent individuals (n = 90). Boxplots display the median, interquartile range (IQR), and individual data points. Statistical significance was evaluated using Welch’s t-test for methylation and the Wilcoxon rank-sum test for gene expression, with nominal P-values. Significance thresholds are indicated as follows: ns (P ≥ 0.05), *(P < 0.05), ** (P < 0.01), *** (P < 0.001), **** (P < 0.0001). VST variance-stabilizing transformation.

Leukocyte-adjusted cis-eQTM analysis reveals FKBP5 methylation as a cell-type proportion-independent marker of severe COVID-19

Acute SARS-CoV-2 infection disrupts circulating leukocyte distributions. We repeated the cis-eQTM analysis while additionally adjusting for neutrophil, lymphocyte, monocyte, eosinophil, and basophil proportions obtained from clinical blood counts. Consistent with this premise, a direct comparison of the measured cell proportions revealed marked baseline differences between severity groups: SC group exhibited a higher neutrophil fraction and a lower monocyte fraction than MM group, whereas lymphocyte, eosinophil and basophil proportions were indistinguishable (Supplementary Fig. 5). Around a week after admission, the distributions of all five cell types converged, and no between-group differences persisted (Supplementary Fig. 5), indicating that the leukocyte imbalance is a transitory feature from acute to recovery of the severe disease. After incorporating these cell-type covariates, 29 of the original 30 CpG-gene pairs lost statistical significance, demonstrating that most apparent methylation-expression associations were confounded by cell-type heterogeneity. The single CpG that retained significance—chr6:36697843—remained inversely associated with the expression of its neighboring gene, FKBP5 (Fig. 2C, D; Supplementary Table 4). The persistence of the chr6:36697843-FKBP5 cis-eQTM after rigorous adjustment establishes FKBP5 as a cell-type proportion-independent epigenetic marker of disease severity.

Resolution of acute-phase epigenetic alterations during COVID-19 recovery

We next examined whether the severity-associated epigenetic alterations observed during the acute phase were reversible during recovery. Specifically, we focused on the 30 CpGs and their corresponding 16 genes identified from the cis-eQTM analysis. Methylation and expression profiles were examined in the hospitalized MM and SC groups, an independent convalescent cohort, and healthy controls. A heatmap of the methylation β-values across all COVID-19 patients revealed a distinct group-wise separation during the acute phase, with SC patients displaying pronounced hyper- or hypo-methylation relative to MM group (Fig. 3A). Notably, this severity-dependent stratification diminished substantially during the recovery phase, and methylation profiles became more similar between the SC and MM groups (Supplementary Fig. 6). To further assess the restoration trajectory, we visualized the dynamics using principal component analysis (PCA). The DNA methylation PCA (Fig. 3B) revealed that while MM group largely overlapped with convalescent and healthy controls, the SC group exhibited only a partial shift, especially along PC2. In the gene expression PCA (Fig. 3C), both SC and MM groups shifted almost completely toward the healthy cluster by 2–3 weeks post-infection, indicating rapid normalization of transcriptional responses (Supplementary Fig. 7). Taken together, although directional recovery was evident, the residual separation of DNA methylation levels from the healthy baseline implies that full epigenetic reversion may require a longer time course, especially for those with severe disease.

Fig. 3: Multiomic Landscape of COVID-19 Recovery in Mild-Moderate and Severe-Critical Groups.
Fig. 3: Multiomic Landscape of COVID-19 Recovery in Mild-Moderate and Severe-Critical Groups.
Full size image

A Heatmap depicting DNA methylation profiles of cis-eQTM markers in COVID-19 patients across infection phases (acute vs. recovery) and severity groups (Mild-Moderate vs. Severe-Critical). Differentially methylated CpG sites are indicated by color: hypermethylated sites are shown in red and hypomethylated sites in blue. Infection phases are represented as acute (pink) and recovery (light blue), while severity groups are differentiated by color: Mild-Moderate (dark green) and Severe-Critical (dark red). The β-value scale, ranging from 0 to 100, indicates the proportion of methylation at each CpG site. B Principal component analysis (PCA) of DNA methylation restoration between the acute and recovery phases in COVID-19 patients and healthy controls. C PCA of gene expression restoration between acute and recovery phases in COVID-19 patients. Arrows track group-level transitions, with ellipses denoting 3 standard deviations from the group mean. Methyl. Status Methylation Status.

To examine individual-level changes, we analyzed the methylation difference between the acute and recovery phase in each SC group (Supplementary Fig. 8). Most patients exhibited significant β-value shifts across both hypermethylated and hypomethylated CpGs, consistent with the overall trend of epigenetic normalization. One patient (C19-C058) presented an exception. Unlike other patients in the SC group, this patient showed virtually no methylation change between two time points. Clinical metadata indicated that C19-C058 was classified as “Critical”, had a high Charlson Comorbidity Index (CCI) (CCI = 5), and remained on oxygen therapy at the time of recovery-phase blood collection.

Discussion

Using early‑phase whole blood multi-omics from clinically well-characterized COVID‑19 patients, we mapped CpG‑to‑transcript links with cis‑eQTM analysis and uncovered a focused set of “reactive” CpGs whose methylation shifts reflected both disease severity and subsequent recovery. During the acute phase, coordinated hypo‑ or hyper‑methylation at these sites corresponded to sharp transcriptional changes that distinguished severe‑critical from mild-moderate cases. After correcting for transient leukocyte imbalances, only one CpG–RNA pair, FKBP5, remained independently associated with severity, highlighting it as a core epigenetic signal rather than a by‑product of immune‑cell redistribution. Longitudinal follow‑up demonstrated that both methylation and gene expression at the reactive loci returned to baseline within 4–12 weeks, underscoring the dynamic and largely reversible nature of COVID‑19‑induced epigenetic programming. These findings establish a compact, cell‑type composition–independent CpG-RNA signature that captures the trajectory from acute pathology to convalescence, providing possible prognostic biomarkers with mechanistic insight pertaining to host response and recovery during COVID-19.

By distinguishing between severe and mild cases based on oxygen therapy and intensive care unit (ICU) admission, we explored the molecular underpinnings associated with distinct clinical trajectories. Although there were only ten severe cases in this study, they represented 27% of the data, comparable to the proportion reported in previous large-scale studies. For example, a February 2020 study of 72,314 COVID-19 cases in China found that the majority (81%) of patients experienced mild to moderate symptoms, while 14% suffered from severe respiratory issues, and 5% progressed to critical conditions, including respiratory failure15.

This study is among the first to apply cis-eQTM analysis for dissecting gene regulatory programs linked to COVID-19 severity. We identified a distinct pattern of hypomethylation and concurrent upregulation of six genes: SRXN1, FURIN, IL18RAP, FOXO3, GCNT4, and FKBP5. These genes are not only epigenetically regulated but also biologically implicated in known pathogenic mechanisms which are recognized as hallmarks of COVID-19 severity. Notably, SRXN1, which was hypomethylated and up-regulated in severe cases across five distinct CpG loci, encodes Sulfiredoxin 1, a critical antioxidant mitigating reactive oxygen species (ROS) damage. Its dysregulation amplifies oxidative stress and has been repeatedly reported to be highly expressed in both blood and lung tissues of severe COVID-19 patients16,17,18,19,20. FURIN, another gene up-regulated in severe cases, facilitates viral entry by cleaving the S1/S2 junction of the SARS-CoV-2 spike protein21. Its expression correlates with disease progression and has been identified as a potential target for antiviral therapy22,23. Our results also revealed significant upregulation of IL18RAP and FOXO3 in severe COVID-19 cases—genes that are mechanistically linked to hyper-inflammatory responses and respiratory dysfunction. IL18RAP, a key mediator of IL-18 signaling, likely contributes to excessive immune activation and cardiopulmonary complications through inflammasome-driven IL-18 release, a pathway implicated in macrophage activation syndrome and multi-organ failure24,25,26. In parallel, FOXO3, which regulates immune balance and oxidative stress, was associated with increased oxygen demand and inflammatory lung injury, suggesting that its overexpression may exacerbate pulmonary damage in severe disease27,28,29.

Interestingly, GCNT4 was the only gene found to be epigenetically downregulated in severe patients. A prior proteomic study reported its elevation in respiratory failure among severe COVID-19 cases30. This discrepancy may reflect differences in regulatory control between the transcriptome and proteome or patient cohort characteristics.

To account for the known shifts in immune cell proportions during acute infection, such as neutrophilia and lymphopenia, we repeated the cis-eQTM analysis while adjusting for estimated immune cell-type proportion. Most CpG–gene associations lost significance following this correction, suggesting that they were biased by inflammation-induced cell redistribution. The persistence of the FKBP5 association after cell-type adjustment suggests that its regulation may reflect a cell-intrinsic epigenetic signal, strongly relevant to the pathophysiology of severe COVID-19. FKBP5 encodes FKBP51, an Hsp90 co‑chaperone that attenuates glucocorticoid‑receptor (GR) signaling31,32. By modulating GR sensitivity it fine‑tunes hypothalamic‑pituitary‑adrenal (HPA) feedback to stress and intersects with NF‑κB/RIG‑I pathways that shape innate‑immune and inflammatory responses33,34. Transcriptomic studies show FKBP5 up‑regulation in corticosteroid‑treated airway epithelium and in brains of fatal COVID‑19 cases35, suggesting that GR‑linked stress and treatment responses converge on FKBP5 during advanced disease, although its causal role in COVID‑19 severity remains inconclusive. While these findings point to a functional association between FKBP5 activity and COVID‑19 pathophysiology, further mechanistic and longitudinal studies are required to clarify whether FKBP5 is a driver of severe COVID-19.

The remaining ten cis-eQTM genes not covered in-depth in this study may still contribute to the broader molecular response to SARS-CoV-2 infection. Further investigation is warranted on elucidating their roles, which could reveal indirect effects on disease severity or related complications. Taken together, the blood-based multi-omic markers discovered here underscore pulmonary lesions and associated clinical symptoms. These markers could help predict respiratory complications and monitor disease severity.

In the recovery phase, spanning two to three weeks post-infection, we observed a relatively fast restoration of both the epigenome and transcriptome to levels akin to those of COVID-19 convalescence, across both severe and mild patient cohorts. The gene regulatory mechanisms observed in severe COVID-19 survivors, who avoided fatal outcomes, did not show significant disparities when compared to those in mild cases. This indicates that the identified CpGs are reactive to both the acute infection and recovery over the relatively short time period, reflecting the rapidly evolving pathological trajectory of the patients. Despite these overall trends, our findings also point to heterogeneity in recovery trajectories, particularly among SC patients. In a subset of individuals, epigenetic reversal was less pronounced, and in one notable case (C19-C058), virtually no change in DNA methylation was observed between the acute and follow-up samples. This individual had a high CCI, remained on oxygen therapy at follow-up, and their recovery-phase sample was taken just 10 days after the acute-phase time point. While these clinical factors may explain the apparent lack of epigenetic normalization, the case highlights that not all patients follow the same recovery timeline—and that molecular reversal may lag behind clinical resolution in more complex or prolonged disease courses.

In conclusion, these observations demonstrate that both cell-intrinsic and extrinsic epigenetic markers are responsive to COVID-19 pathophysiology and capable of tracking the acute-to-recovery transition. However, the variability across individuals also points to the importance of integrating clinical context and extending the observation window to fully understand the persistence, resolution, or relapse of molecular alterations—especially in relation to long-term sequelae, such as long-COVID.

Our investigation encountered several critical limitations. First, the COVID-19 severe cases in this study exhibited variability in the timing and duration of oxygen therapy following hospitalization (Fig. 1B). This discrepancy poses constraints on elucidating the exact association between identified biomarkers and the severity of the condition. Second, the limited number of samples in the severe group presents a challenge in drawing comprehensive general conclusions. Third, the lack of standardized control over the recovery phase, ranging from two to three weeks per patient, introduces variability that influences the interpretation of results. Additionally, the separate sample collection periods in 2021 and 2022 renders complexity. Different SARS-CoV-2 sub-strains—Delta and Omicron—were predominant during these times. The variants differed in transmissibility, virulence, and immune evasion, likely influencing disease severity and treatment response36. Lastly, while cytokine measurements were not performed in this study, the absence of these data limits our ability to directly evaluate the contribution of cytokine-mediated inflammatory responses to disease severity. Given the established role of cytokine storms in severe COVID-19 cases, future studies integrating cytokine profiling with epigenetic and transcriptomic data would offer a more comprehensive understanding of immune dysregulation in acute infection.

Materials and methods

Samples and clinical characteristics

We collected whole blood samples from 46 patients diagnosed with COVID-19 from Ulsan University Hospital (UUH), Ulsan, Republic of Korea. This study has received approval from the Institutional Review Board (IRB) of UUH and Ulsan National Institute of Science and Technology (UNIST) (IRB No.: UUH-2021-04-011-004, UNISTIRB-21-15-A). All ethical regulations relevant to human research participants were followed.

For the hospitalized patient group, whole blood samples were collected twice: once during the acute phase (at the time of hospital admission) and once during the recovery phase (2 to 3 weeks after hospitalization). For the convalescent patient group, whole blood samples were obtained between 4 to 12 weeks after COVID-19 diagnosis. These individuals had been previously hospitalized due to COVID-19 but had fully recovered at the time of sample collection. Our sampling period is divided into two halves. The first was during 2021, which was followed by the second in 2022. The acute phase of infection was defined as the period from hospitalization due to COVID-19 to one week thereafter, while the subsequent period of up to three weeks was defined as the recovery phase. A healthy control group consisted of 344 whole blood samples collected (309 samples for methylation, 34 for gene expression studies, and one sample for both omics) from the Korean Genome Project (KGP), approved by the IRB at UNIST in Ulsan, South Korea (IRB No.: UNISTIRB-21-66-A).

Patient classification of COVID-19 severity

The patients were assigned their severity upon diagnosis according to Food and Drug Administration (FDA) severity categorization and categorized into four groups: “Mild”, “Moderate”, “Severe”, and “Critical” (Supplementary Table 5)37,38. We further categorized the patients upon analysis defining their severity into two discrete groups: MM for mild and moderate categories and SC for severe and critical categories. SC patients suffered from respiratory symptoms with 80% of the patients requiring oxygen therapies, such as nasal prong or high-flow nasal cannula, or both, as compared to 14.6% for the MM group. In addition, 40% of the SC cases were admitted to the ICU, while none in the MM (Table 2).

Table 2 Baseline Characteristics of COVID-19 Hospitalized Patients, Convalescent Patients, and Non-infected Healthy Individuals

Clinical information

We collected clinical data from informed and consenting patients alongside whole blood samples. The dataset included routine laboratory measurements, smoking history, and viral load estimates based on PCR Ct values (Supplementary Fig. 1). To investigate clinical correlates of disease severity, we compared these parameters across patient groups stratified by COVID-19 severity. Clinical variables with insufficient sample size (fewer than three observations in any group) were excluded from downstream analyses to ensure statistical robustness.

Target bisulfite sequencing and methylation quantification

Frozen whole blood samples collected in EDTA tubes (Becton Dickinson, #367856) were used for genomic DNA (gDNA) extraction, performed using the DNeasy Blood & Tissue Kit (Qiagen, #69506) according to the manufacturer’s instructions.

Bisulfite-converted DNA libraries were prepared using the SureSelectXT Methyl-Seq Target Enrichment System Kit (Agilent, #G9651), following the manufacturer’s protocol. Sequencing was performed on the NovaSeq 6000 platform (Illumina), generating 150 bp paired-end reads.

Raw sequencing reads were processed using Fastp (v0.23.1) to remove adapters and low-quality bases39, with the following parameters: --cut_front 1 --cut_right 0 --cut_tail 1 --detect_adapter_for_pe --trim_front1 15 --trim_front2 15 --trim_tail1 0 --trim_tail2 0 --cut_mean_quality 20 --n_base_limit 1 --average_qual 20 34. Read quality was assessed before and after filtering reads using FastQC (v0.11.9)40.

Filtered reads were aligned to the human reference genome (GRCh38.p13) using Bismark (v0.23.1), a bisulfite-aware aligner41. Methylation calling and quantification were conducted using the methylKit R package (v1.20.0)42. To annotate CpG sites and associate them with proximal genes or regulatory features, we used the annotatr R package (v1.20.0)43.

mRNA sequencing and expression quantification

Frozen whole blood samples collected in PAXgene® Blood RNA Tubes (PreAnalytiX, #762174) were used for total RNA extraction. RNA integrity and concentration were assessed using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, #Q32866) and the Qubit RNA HS Assay Kit (Thermo Fisher Scientific, #Q32854). For mRNA enrichment, 200 ng of total RNA per sample was processed using the Dynabeads mRNA Purification Kit (Thermo Fisher Scientific, #01152851), which depletes rRNA and isolates polyadenylated transcripts using oligo(dT) beads.

Library preparation was performed using the MGIEasy RNA Directional Library Prep Set (MGI, #MG1000006386), following the manufacturer’s protocol. Fragment size distribution was checked using 4150 TapeStation (Agilent, #G2992A) with the cDNA D1000 ScreenTape (Agilent, #5067-5582). Final libraries were quantitated using the Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Q32866). RNA sequencing was run on DNBSEQ-T7RS (MGI) platform, generating 150 bp paired-end reads.

RNA-seq reads were trimmed and quality-filtered using Fastp (v0.23.1) with the same parameters as for DNA methylation processing. Read quality was verified using FastQC (v0.11.9).

Filtered reads were aligned to the human reference genome (GRCh38.p13) using STAR (v2.7.10b), a splice-aware aligner, with default settings44. Gene-level quantification was performed using RSEM (v1.3.3) using default options45, with gene annotations from GENCODE v42 (GFF3 format). Raw expression values were derived from the expected counts generated by RSEM.

Batch effect correction of gene expression data

To integrate gene expression data from an independent cohort of 35 healthy individuals, we implemented rigorous batch effect correction to ensure comparability. All samples were processed using an identical bioinformatic pipeline as RNA-seq data for COVID-19 patients to avoid potential in silico batch effects. We applied the ComBat-seq method implemented in the sva R package (v3.54.0)46 to remove batch effect by differences in sequencing platforms (i.e., NovaSeq 6000 vs DNBSEQ-T7RS) while preserving the biological condition (i.e., Healthy vs. COVID-19 status).

Normalization of gene expression data

To account for differences in sequencing depth and library composition, raw expected gene counts were normalized using the DESeq2 R package (v1.38.3)47. A variance stabilizing transformation (VST) was subsequently applied to the normalized counts. VST approximates a log2 transformation while decoupling mean-variance relationships of RNA-seq data, thereby improving the accuracy and interpretability of downstream statistical analyses. VST-transformed values were used for pairwise comparisons stratified by COVID-19 severity, allowing expression differences to be interpreted in fold-change-like units. These transformed values were also used for principal component analysis (PCA) to assess global expression patterns and sample clustering.

Discovery of differential methylation CpGs (DMC) associated with COVID-19 severity

Methylkit (R package; v1.20.0)42 was used to discover the Differentially Methylated CpGs (DMC) for COVID-19 severity, treating age and sex of the patients as covariates. We utilized the leave-one-out (LOO) method for cross-validation, iteratively collecting DMC without a sample for each round of discovery. Methylation sites that suffice the thresholds of absolute methylation difference (|meth.diff | ) >10 and FDR < 0.05 were selected as significant markers of the severity. The adjustment for P-value was done by Benjamini-Hochberg correction. We only selected a set of markers, those of which overlapped over all the folds.

Leukocyte-adjusted discovery of DMCs associated with COVID-19 Severity

We repeated the LOO-based DMC discovery as previously described. Here, we adjusted for age, sex, and blood cell-type composition (%)—including neutrophils, lymphocytes, monocytes, basophils, and eosinophils. This analysis included only samples with available cell count measurements recorded during clinical data collection, comprising 24 MM and 9 SC patients. The significance thresholds and marker selection criteria were identical to those used in the leukocyte-unadjusted analysis.

Discovery of DEGs associated with COVID-19 severity

DEGs associated with COVID-19 severity were identified using the DESeq2 R package (v1.38.3). Given the limited number of SC cases (n = 9), we employed a LOO cross-validation strategy to enhance the robustness of DEG discovery. In each iteration, one SC sample was excluded, and differential expression analysis was performed between the remaining SC and mild cases. This procedure was repeated nine times, each time omitting a different SC sample. Genes were considered significantly differentially expressed if they satisfied the following thresholds: absolute log₂ fold change (|log₂FoldChange | ) >1.3 and false discovery rate (FDR) < 0.05. To define robust DEGs (DEG-LOOs), we retained only those genes that exhibited consistent directionality—either upregulation or downregulation in SC patients—in at least seven out of the nine independent LOO iterations.

Gene ontology enrichment of DMCs and DEGs

To interpret the biological functions associated with differential DNA methylation and gene expression associated with COVID-19 severity, Gene Ontology (GO) enrichment analysis was conducted separately for genes linked to differentially methylated CpGs (DMCs) and DEGs. GO enrichment analysis was performed using ShinyGO v0.82 (https://bioinformatics.sdstate.edu/go82/)48, focusing on the Biological Process (GO:BP) category. The gene universe was defined as all genes expressed in the RNA-seq dataset (for DEGs) or all genes targeted by measured CpGs (for DMCs). Statistical significance was assessed using a hypergeometric test with Benjamini–Hochberg correction. GO terms with FDR < 0.05 were considered significantly enriched.

Prioritization of severity-associated CpG sites via cis-eQTM (expression quantitative trait methylation) analysis

We performed an eQTM analysis to assess the relationship between CpG methylation and gene expression (DESeq2-normalized read count) that have been discovered during previous steps (i.e., DMC and DEG) (Supplementary Fig. 9). Spearman correlation was used to quantify this relationship, with a significance threshold of absolute correlation coefficient > 0.5 using scipy.stats (v1.4.1). Gene coordinates and transcription start sites (TSS) were extracted from GENCODE v42 (GTF format). For cis-association analysis, we restricted the search space to CpG sites located within a ±1Mbp window of the corresponding gene’s TSS. This window was selected to capture local regulatory effects of CpG methylation on gene expression.

Statistics and reproducibility

Welch’s t-test (a two-sample independent t-test of unequal variances) was performed to find the mean difference of methylation values between different groups of infection stages using scipy.stats (v1.4.1). The normality assumption for the distributions of omics data was checked by the Shapiro-Wilk test using scipy.stats (v1.4.1) (Supplementary Table 6, 7). The P-values were adjusted with the Benjamini-Hochberg approach using statsmodels.stats (v0.13.2). Wilcoxon rank-sum test using scipy.stats (v1.4.1) was performed to test the statistically significant differences in gene expression and clinical values across severities. Principal components for DNA methylation and gene expression were computed using sklearn.decomposition.PCA (v0.23.2). All visualizations were drawn using matplotlib (v3.5.3) and seaborn (v0.11.0). The sample sizes for each analysis are indicated in the figure legends and supplementary tables. No separate replicate experiments were conducted. Instead, we assessed the robustness of our marker selection by applying a LOO cross-validation approach during the identification of DMPs and DEGs. Detailed descriptions of the LOO procedure are provided in the respective sections for DMP and DEG analyses.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.