Introduction

Coronavirus disease (COVID-19) is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). This highly contagious respiratory illness, first identified in late 2019 in Wuhan, China, spread worldwide and rapidly evolved into a global pandemic. Indeed, the ongoing pandemic has sparked global attention due to its impact on public health, economies, and daily life1. Interestingly, the clinical outcomes of COVID-19 is highly variable ranging from asymptomatic or mild flu-like symptoms to severe pneumonia, and acute respiratory distress syndrome, requiring intensive care2,3. Multiple host factors, including genetics4 epigenetics5 gene expression6 advanced age, gender and multiple pre-existing health conditions7,8,9 are implicated in influencing individual response to SARS-CoV-2. Thus, these factors contribute to disease susceptibility, severity, and potential treatment approaches. Since the onset of the pandemic, there has been remarkable and global efforts to advance our understanding of COVID-19 and the interaction between the virus and the human host, which led to the successful development and implementation of vaccination strategies, aimed at controlling the spread of the disease10,11.

The host immunological response to SARS-CoV-2 infection is still not well understood despite these efforts. Other coronaviruses are known to trigger an exuberant host inflammatory response, ultimately resulting in severe lung injury on account of cytokine storm that12,13,14,15,16,17 also appears to be the common cause of mortality in COVID-1918,19. Recent transcriptomic studies from patients with COVID-19 demonstrated pro-inflammatory host cytokine profile20, a shift in immune cell populations along with cytokine storm and adaptive immune dysfunction21. Other studies have identified circulating RNA biomarkers22 and highlighted the functional pathways involved in SARS-CoV-2 pathogenesis23. Additional, studies have revealed that severe and critical cases of COVID-19 exhibit elevated plasma levels of inflammatory cytokines such as IL2, IL6, IL7, IL10 and TNFα in comparison to mild cases24,25 suggesting the crucial role these cytokines play in COVID-19 progression. Postmortem findings from COVID-19 lung biopsies revealed interstitial mononuclear inflammatory infiltrates in lung tissues26. Elucidating the underlying molecular mechanisms of these pathophysiologic differences that contribute to the clinical sequel of SARS-CoV-2 infection is important to predict disease outcome and tailor treatment strategies. Additionally, accumulated SARS-CoV-2 mutations in the spike protein resulted in the emergence of variants, such as Alpha, Delta, and Omicron with differential transmissibility, pathogenicity, and capacity for immune evasion, thereby altering the epidemiological and clinical landscape of the pandemic and posing specific challenges to clinical management and public health strategies27. This is the first study form Qatar that relies on high-throughput transcriptomic gene expression and clinical data to further our understanding of the underlying molecular processes determining disease severity and mortality in a multiethnic SARS-CoV-2 infected population.

Results

Clinical characteristics of COVID-19 study patients

The study encompassed 228 individuals positive for SARS-CoV-2 RT-PCR, featuring a spectrum of clinical presentations ranging from asymptomatic to severe and critical cases (Fig. S1). Notably, 30.7% (n = 70) of the laboratory confirmed COVID-19 positive patients were asymptomatic, 40.3% (n = 92) mildly symptomatic or had mild pneumonia, with the remaining 29% (n = 66) classified as severe or critical with a 12% (n = 28) documented mortality (Table 1). Baseline demographic and clinical characteristics are summarized in Table 1 and supplementary Table 1 (additional clinical data).

Table 1 Demographic characteristics of study participants categorized by stratified by asymptomatic, mild symptomatic, mild pneumonia, severe and critical Covid-19 cases.

The prevalence of diabetes was 53.5% (n = 61/114) and hypertension 58% (n = 67/115) was higher in the severe/critical group. Inflammatory biomarkers such as IL-6 and CRP, known to be predictive markers in COVID-19 patients28were significantly elevated in the COVID-19 cohort suffering from severe/critical disease (Table 1). Additionally, thromboembolic complications were commonly observed in severe COVID-19 cases, with significantly elevated D-dimer and CRP levels (Table 1).

Comprehensive gene expression profiling in patients with varying COVID-19 severity

Transcriptomic changes in patients with varying clinical severity of SARS-CoV-2 infection were analyzed employing RNA-seq along with asymptomatic controls. The global patterns of expression values appear similar and consistent across the samples (Fig. S2). The overall variance in samples measured by principal component analysis (PCA) (Fig. 1A) and hierarchical clustering analysis (Fig. S3) demonstrated discernable gene expression patterns between different severity groups (Fig. 1A). Further analysis aimed to identify genes orchestrating the levels of severity in COVID-19 patients. The study assessed significant differentially expressed genes (DEGs) in pairwise comparisons with an adjusted P-value < 0.05 (Table S2-S4). Overall, when compared to asymptomatic cases, there were 7 genes that were differentially expressed in mild symptomatic patients (3 genes were upregulated, 4 genes were downregulated); 520 genes in mild pneumonia patients (446 genes were upregulated, 74 genes were downregulated); 2236 genes in severe patients (1490 genes were upregulated, 746 genes were downregulated) and 2860 genes in critical patients (1828 genes were upregulated, 1032 genes were downregulated) (Fig. S3; Table S2-S4). Additionally, the analysis of DEGs revealed variations in gene expression within peripheral blood mononuclear cells (PBMCs), reflecting changes associated with different levels of COVID-19 severity, and identifying both common and unique gene signatures. (Fig. 1C & D; Table S5). Notably, gene expression displayed significant variability, with certain genes showing downregulation of up to 30-fold.

Fig. 1
figure 1

Landscape of differential expression genes analysis and Partial Least-Square Discriminant Analysis (PLS-DA). (A) PLS-DA score plot for transcriptomics data, resulting in severity separation along principal component 1 (PC1-22.3%) is presented (B) Volcano plot of differentially expressed genes between COVID-19 transcriptomic profiles of different severity groups of patients showing fold change for each gene and their significance level. Colored dots show genes with an adjusted P-value < 0.05 and an absolute log2 (fold change) > 1. Blue dots represent down-regulated genes, and red dots represent up-regulated genes. Venn diagrams demonstrates the number of overlapping (C) Differentially up regulated genes, and (D) Differentially down regulated genes expressed between COVID-19 subjects.

Identification of differentially expressed genes and partial least-square discriminant analysis (PLS-DA) of COVID-19 patients

In order to highlight the differences between the COVID-19 severity groups, the transcriptomic profiles of each severity group (asymptomatic, mild symptomatic, mild pneumonia, severe and critical) was evaluated against one another where the DEGs are presented in volcano plots using adjusted P-value < 0.05 as cutoff with an absolute log2 (fold change) > 1 (Fig. 1B). The number of genes and their respective fold changes were higher between asymptomatic and critical ICU cohorts. Interestingly, a significant number of genes were downregulated in all symptomatic groups in comparison to asymptomatic controls, including SARS-CoV-2 entry protein the transmembrane protease serine 2 (TMPRSS2), cytotoxic and regulatory T cell molecule (CRTAM) and forkhead box P3 (FOXP3) (Fig. 2 A). Conversely, there was significant upregulation of angiotensin-converting enzyme 2 (ACE2) and key proteins in COVID-19 pathogenesis including arginase 1 (ARG1), Fas-associated death domain protein (FADD), hepatocyte growth factor (HGF) and interleukin 10 (IL10) (Fig. 2 A). PLS-DA analysis showed separation according to severity, wherein 22.3% variation was attributed to principal component 1 (PC1) and minimal separation was observed by principal component 2 (PC2) (Fig. 1A). Additionally, the variable importance of projection (VIP) analysis identified the top 30 contributing transcripts (VIP ≥ 1.3) (Fig. 2B). These top-ranked genes play an important role in hypoxia, apoptosis as well as innate and adaptive immune responses (Table 2). Gene transcripts increased in severe/critical patients, while decreased in mild cases included genes related to mitochondrial function, ionic transport and immune regulation (TMEM11, TXN and SLC22 A4). Adaptive immunity genes, such as IL10RA, TRIM44, PRKCH, MAPRE2, PEA15, CD4, FYN, NFATC, RFTN1, GATA3, SMAD3, PRKCQ, CX3 CR1, and CD247 exhibited suppression with increasing disease severity. Similarly, genes involved in fundamental cellular processes, including gene expression regulation, cell apoptosis, proliferation, and migration such as PRKX, TSPYL1, SMAD3, EVL, RRP1B, KDM2B, and TSHZ1 were downregulated as disease severity increased. Furthermore, genes implicated in cytoskeletal dynamics, intracellular and vesicular trafficking, and protein degradation including ARL4 C, SLC22 A4, EVL, SIDT1, ANKRD40, TMEM11, TIAM1, RASA3 and RNF125 demonstrated a similar pattern of downregulation (Fig. S3; Table S2-S4).

Fig. 2
figure 2

Comparing differently changed genes. A. Gene expression of several key proteins. Dot plot displaying the normalized count of ACE2, TMPRSS2 SARS-CoV-2 entry proteins expression, ARG1, FOXP3, CRTAM, FADD, HGF and IL-10 key proteins expression. Asterisks show significant difference (one-way anova test, Significance codes: >0.05 ‘ns’, < 0.05 ‘*’, < 0.01 ‘**’, < 0.001 ‘***’, 0 ‘****’). Points represent individual patient data. P-values were calculated using a two-tailed Wilcoxon test. B. VIP represents the contribution score of the DEGs that mostly contribute to the differentiation between COVID-19 patients severity groups. The top 30 discriminatory genes are listed from top to bottom, ranked by their VIP scores ≥ 1.3. Changes in expression for each gene are presented, wherein blue: represents decrease and red: represents increase in transcript expression.

Table 2 Top-ranked transcripts based on partial least-square discriminant analysis (PLS-DA) (Top 30 transcripts, VIP ≥ 1.3).

Functional enrichment analysis of differentially regulated genes

Functional enrichment analysis was performed on the differentially expressed genes (DEGs) to elucidate their roles in specific cellular biological processes using ShinyGO (version 0.8)54. Interestingly, the up-regulated genes in severe and critical patients were enriched in biological processes of “activation of innate immune cells” specifically in neutrophil activation and degranulation, leukocyte degranulation, granulocyte activation and inflammatory response (Fig. 3A). In addition, a series of inflammation-related processes were activated, such as “regulation of acute inflammatory response” (Fig. 3B). However, downregulated genes were mainly enriched in biological processes of “activation of adaptive immune cells” including cell and biological adhesion, T cell differentiation and activation, lymphocyte activation (Fig. 3). Furthermore, KEGG pathway analysis performed on DEGs in severe COVID-19 samples demonstrated “Metabolic pathways”, “Cell cycle”, “Th1 and Th2 cell differentiation”, and “Th17 cell differentiation” as enriched pathways (Fig. 4). The up-regulated genes in critical ICU admitted cases included the following enriched pathways “Cell cycle”, “Ferroptosis”, and “metabolism”. Whereas, cytokine-related pathways were enriched using the list of down-regulated genes in critical cohort, such as “Th1 and Th2 cell differentiation”, “Cytokine-cytokine receptor interaction”, “Cell adhesion molecules”, “Th17 cell differentiation”, “Natural killer cell mediated cytotoxicity”, “T cell receptor signaling pathway”, “Antigen processing and presentation”, and “Viral protein interaction with cytokine and cytokine receptor” (Fig. 4). Further pathways that were enriched are represented in Fig. S4.

Fig. 3
figure 3

GO-term enrichment analysis of expressed genes. Common genes-based GO-term analysis sorted by adjusted P-values. GO term functional enrichment by Biological Process (BP) were performed for A-B up and down-regulated genes in COVID-19 mild pneumonia patients. C-D. BP analysis for up and down-regulated genes in COVID-19 severe patients. E-F. BP analysis for up and down-regulated genes in COVID-19 critical patients.

Fig. 4
figure 4

KEGG pathway enrichment analysis of expressed genes. Common genes-based KEGG pathway enrichment analysis sorted by adjusted P-values. KEGG pathway enrichment analysis for A-B up and down-regulated genes in COVID-19 severe patients. C-D. KEGG pathway analysis for up and down-regulated genes in COVID-19 critical patients.

Diminished intracellular immunity and potent neutrophils activation gene signature in severe and critical cohort of COVID-19 patients

Previous clinical studies have demonstrated that COVID-19 patients experience a cytokine storm, reminiscent of the response observed in SARS-CoV as well as MERS infections15,24,55,56. To visualize the difference in the immune response between various severity groups, we used the log2 fold change values of the DEGs associated with T cell functionality, neutrophil activation and plasma cytokines to plot a heatmap (Fig. S5 and S6). In severe and critical patients, increased gene expression was observed in granulocyte- and monocyte-associated molecules (eosinophil-derived neurotoxin; RNASE2), and neutrophil activation-associated CD177 and neutrophil elastase (ELANE) (Fig. S5 A). However, these patients had lower expressions of lymphocyte-associated molecules including CD4, CD247, KLRB1, and TCF7 (Fig. S5 A). Similarly, a decrease in the expression of several cytokine genes such as CCL4, CXCL8, IFNG, IL17B, IL23 A, IL34, and genes associated with T/NK cell effector functionality including CCL4, CIMAP7, GZMA, GZMK, HCST, ID2 was identified in severe/critical patients. Analysis of interferon (IFN) and pattern recognition receptors (PRR)-induced signaling indicated that antiviral IFN responses and virus sensing PRR signaling pathways were differentially regulated according to disease severity with a dysregulated expression of type I and III interferons in severe and ICU COVID-19 cases (Fig. S5B and C). Furthermore, platelets in severe/critical COVID-19 patients had a hyperactivated phenotype with altered gene expression (Fig. S5D)57. In agreement with published research, we observe an upregulation of genes related to transforming growth factors, interferons, and interleukins such as IFITM1, IFITM3 and IL10 in severe and critical cases20,23,58.

Lymphocyte survival genes are dysregulated in severely ill patients infected with SARS-CoV-2

Previous clinical and autopsy reports have indicated reduced lymphocyte counts in critically ill COVID-19 patients with an increase in neutrophil to lymphocyte ratio (Table 1)25,59,60,61. The data from this study reveals that several genes involved in apoptosis and P53 signaling pathways (Fig. S5E and Fig. S6) including BAD, ATF4, CTSB, CTSO, PRF1 and TP53I3 are dysregulated. Interestingly, the macroautophagy machinery is among the most enriched biological processes, implicating its significance in disease severity. Therefore, gene expression analysis within this ontology term revealed significantly altered autophagosome and autophagy-related genes in patients classified as severe or critical (Fig. S6)62. For instance, autophagy regulatory genes, such as MAP1LC3, were globally upregulated in severe and critical COVID-19 patients, which positively regulates the process of autophagosome formation63,64.

Gene expression profiling revealed significant differences in circulating immune cells

The composition of circulating immune cells varies with the COVID-19 disease severity42. To further explore this finding in our samples, cell populations were estimated by deconvolution of RNAseq data (Fig. 5; fig. S7). Our analysis revealed a higher granulocyte proportion in severe and critical patients, with a lower proportion of lymphocytes and natural killer cells (Fig. 5A). Following deconvolution and adjustment by total granulocyte fraction, an increased proportion of neutrophils with a decreased proportion of eosinophils were observed, while no significant changes were detected in basophil cells (Fig. 5D and Fig. S7). Furthermore, while no difference in monocytes was observed across severity groups, the monocyte fraction showed a high expression of CD14+ and low expression of CD16+ monocytes suggesting an increase in classical monocyte population with increasing severity65 (Fig. S7). In agreement with the clinical data, we observed a decrease in pan T cells (including T cell subsets) and naïve B cells, whereas no change in memory B-cells were found (Fig. 5B and Fig. S7). Intriguingly, CD56bright NK cells and disease severity had an inverse correlation66 (Fig. 5C). Moreover, in severe/critical patients a higher percentage of myeloid versus lymphoid origin cells correlated with the clinical data that demonstrated a higher neutrophil to lymphocyte ratio, further corroborating with the hyper-inflammatory responses observed in this cohort (Fig. 5D; Table 1).

Fig. 5
figure 5

Estimated circulating immune cell populations. Proportions of circulating immune cell populations estimated from transcriptome data. (A) The average proportion of blood cells were predicted from transcriptome data using a deconvolution algorithm. (B) Lymphocyte subpopulations expressed as proportion of the absolute number of lymphocytes. (C) Natural killer cell subpopulations expressed as proportion of the absolute number of natural killer cells. (D) The proportion of neutrophils and the neutrophil to lymphocyte ratio (NLR) predicted from transcriptome data. The proportion of lymphocyte corresponds to the sum of proportions of T cells, B cells, and NK cells. Asterisks show significant difference (one-way anova test, Significance codes: >0.05 ‘ns’, < 0.05 ‘*’, < 0.01 ‘**’, < 0.001 ‘***’, 0 ‘****’). Points represent individual patient data. P-values were calculated using a two-tailed Wilcoxon test.

Modular co-expression analyses identified six key modules of hub genes in COVID-19 severity groups

Functional enrichment analysis suggested a systemic change in immune response-associated genes and pathways. To identify key gene groups namely modules that are co-expressed, we performed comprehensive modular co-expression analysis. Our analysis revealed six modules of co-expressed genes that either work together or are similarly co‐regulated in our cohort (Fig. 6). Among the six modules, M1 (Fig. 6C) and M2 modules (Fig. 6D) were significantly downregulated (Fig. 6A), whereas, M5 and M6 (Fig. 6E and F) were significantly upregulated in severe and critical groups (Fig. 6A). Over-representation analysis of the six modules uncovered several commonly enriched pathways, related to innate and adaptive immune system, including interferon signaling, cytokine-cytokine receptor interaction, and complement activation and coagulation cascades (Fig. 6C-F), in addition to cell cycle‐related pathways (Fig. S8). Within the identified modules, hub genes were characterized and highlighted by the most connected genes, which included genes associated with neutrophil activation, innate immune response to viral infection, and cell cycle pathways and ribosomes (Fig. 6B; Table S6). These findings underscore significant molecular alterations in the innate and adaptive immune system, as well as significant changes in cell cycle during SARS-CoV-2 infection.

Fig. 6
figure 6

Co-expression gene modules present in blood transcriptomes of COVID‐19 patients. (A) Bubble heatmap showing the results of gene set enrichment analysis, indicating the module (M) activity in COVID‐19 patients. Circle size and color reflect the normalized enrichment score (NES), as determined by CEMiTool. (B) Enriched modules contain hubs genes. (C) Interaction plot for M1, (D) M2, (E) M5 and (F) M6, which contains genes enriching different immunity‐associated pathways, exhibited by the GO BP and KEGG pathway bar plot at the right side of Fig. C-F. The most connected genes (hubs) are highlighted inside rectangles. The node size is proportional to its degree of interactivity. The bar plot indicates the top 10 enriched pathways from the over‐representation analysis of module M1: C, M2: D, M5: E and M6: F.

Discussion

This study explores the transcriptome of venous blood from individuals diagnosed with different severity levels of COVID-19 disease. Global RNA-seq was performed on 228 COVID-19-positive patients, including asymptomatic, mild symptomatic, mild pneumonia, severe and critical (Table 1 and Fig. S1)67,68. Whereas previous studies have investigated SARS-CoV-2 infection comparing hospitalized to non-hospitalized patients or ICU to non-ICU cases, and others categorized patients by illness type or severity level, we examined five COVID-19 severity cohorts, particular distinguishing between severe and critical cases, and including comparisons with asymptomatic patients. In addition, this study includes a diverse multi-ethnic cohort, exploring gene expression and immune responses across different ethnic groups. As the first transcriptomic investigation from the Middle East with large, ethnically varied samples, this research provides valuable insights into how genetic and environmental factors influence COVID-19 severity to foster the development of inclusive treatment strategies.

Notably, critical cohort of COVID-19 patients displayed the highest number of differentially expressed genes (DEGs) compared to asymptomatic individuals (Fig. 1). COVID-19 severity was also found to correlate with proteins that facilitate SARS-CoV-2 viral uptake, such as ACE2 and TMPRSS2 (Fig. 2). Interestingly, the relationship between viral load and disease severity remains unclear according to current literature69,70. The VIP analysis pinpointed the IL-10 receptor as the top predictor of disease severity, showing an inverse correlation and aligning with clinical data that report elevated IL-10 levels in severe cases (Fig. 2)24. Additionally, GO analysis revealed significant enrichment in biological processes related to innate immunity and the complement system, along with notable suppression of lymphocyte function likely through disrupted T cell signaling/exhaustion and calcium homeostasis. The signaling pathways and genes involved in innate immunity, in particular those related to neutrophil activation, secondary to Th-17 induction, as well as interferon signaling pathways exhibited upregulation especially in severe and critical ICU cases of COVID-19 (Figs. 3, 4 and 5 and S5). The cumulative upregulation of these pathways indicates increased cytokine or chemokine activity and the enhancement of inflammatory processes reliant on innate immune responses, consistent with prior literature71,72,73,74,75. The differential responses observed among the groups can be attributed to the variability in disease progression. Patients with more severe or critical disease outcomes exhibited more pronounced type I interferon responses63,64. Furthermore, KEGG pathways analysis demonstrated an upregulation of innate immunity genes linked to cytokine storm, including genes associated with complement and cytokine production like IL-18, interferon-based antiviral responses, reactive oxygen species, Matrix metalloproteinases (MMP) formation, secretory granule formation, and exocytosis (Fig. 4). The identified DEGs align with clinical data showing an elevated total white blood cell count, accompanied by an increased number of circulating neutrophils (Table 1), that is supported by DEGs involved in increased granulocytes and neutrophil levels (Fig. 5). Additionally, the study further highlighted a decrease in NK cell population and pro-inflammatory CD16high monocytes fraction following pulmonary damage, which worsens with oxygen dependency in ICU patients (Fig. 5). Similarly, Guardela et al. have reported 50 genes in a longitudinal study of a small cohort of ICU cases indicating NK cell and monocyte impairment with exaggerated neutrophil responses76similar to previous studies75,77.

The impact of SARS-CoV-2 infection severity on the impairment of adaptive immune responses has been extensively documented24,42,78,79,80. This study further contributes to the growing body of evidence on the impaired anti-viral adaptive immune response, where the lymphocyte count decrease with increasing COVID-19 severity, a decline that is linked to apoptosis (Table 1). As pneumonia manifests in SARS-CoV-2-infected patients, several differentially expressed genes (DEGs) promoting apoptosis are upregulated, while those exhibiting anti-apoptotic properties, including BCL-2 and genes involved in P53 pathways such as CDK6 and CDK2 in cell cycle regulation, are downregulated (Fig. 6). It is well established that IL-10 signaling suppresses activated T cells and neutrophils81,82. The reduced expression of IL-10R expression in ICU patients may lead to unregulated neutrophil activation. Conversely, lymphocytes that are unresponsive to IL-10 signaling are likely to upregulate pro-apoptotic pathways, contributing to reduction in CD4 and CD8 T lymphocyte populations in severe and ICU COVID-19 cases. This targeted suppression of T cells (Fig. 5) is likely due to downregulation of genes involved downstream of TCR signaling, such as CD4, FYN, NFATC PRKCH, MAPRE2, PEA15, GATA3, PRKCQ, and CD247, which limits clonal expansion and induces apoptosis (Fig. 2). Specifically, the downregulation of IL-7R correlates with decreased lymphocyte counts, as IL-7 signaling is crucial for the maintenance of T and B cells83,84. Additionally, in our cohort genes involved in cytotoxic anti-viral responses, such as Granzymes, are significantly downregulated in severely ill COVID-19 patients. The observed decrease in lymphocyte count as well as the expression of genes associated with disrupted T cell activation85collectively highlights dysregulated effector functions of adaptive T cell-mediated immunity. This includes both the suppression of adaptive immune responses and the induction of early apoptosis, which affects lymphocyte maintenance.

Thromboembolic complications are a major concern in severe COVID-19, with multiple studies reporting high rates of venous and arterial thromboses in hospitalized patients86,87,88,89,90,91. A thromboembolism incidence of 27.6% was observed in ICU patients86while other studies reported rates as high as 31% despite prophylaxis92. A meta-analysis showed a pooled VTE prevalence of ~ 31% in critically ill COVID-19 patients, increasing to nearly 50% with systematic screening90. These rates highlight the distinct coagulopathy of COVID-19, characterized by elevated cytokines, endothelial injury, and microthrombi87. Elevated D-dimer levels are strongly associated with disease severity and thrombotic risk93,94. Similarly, elevated CRP levels were linked to higher VTE risk, ICU admission, and mortality, particularly when both markers were elevated93,95. These findings underscore the interplay of inflammation and coagulopathy in severe COVID-19.

SARS-CoV-2 has evolved through mutations, particularly in the spike protein, giving rise to variants with distinct profiles in transmissibility, virulence, and immune evasion. These evolutionary changes have significantly shaped the trajectory of the pandemic, influencing both clinical outcomes and public health responses. Contrary to our data from the first COVID-19 wave, which identified key immune transcriptomic signatures associated with disease severity, later waves have been driven by the emergence of major SARS-CoV-2 variants of concern, including Alpha, Beta, Gamma, Delta, and Omicron. Each variant presented unique challenges, Alpha (B.1.1.7) showed increased transmissibility and a higher risk of severe outcomes96Beta (B.1.351) and Gamma (P.1) exhibited considerable immune evasion97 without clearly increasing virulence. Notably, the Delta variant was linked to increased disease severity and higher hospitalization rates, whereas the Omicron (B.1.1.529) with its numerous spike protein mutations, has been associated with widespread reinfections and breakthrough cases due to immune escape27while generally leading to less severe disease, possibly owing to prior immunity and its preference for upper respiratory tract replication98. The emergence of highly mutated variants has also led to partial escape from immunity, acquired through prior infection or vaccination. Future research should aim to incorporate variant-specific immune transcriptomic analyses, thereby enhancing our understanding of severity markers across diverse SARS-CoV-2 lineages and informing precision public health responses.

While this study provides valuable insights, it is important to acknowledge that it has its limitations. The severe cohort could not be age- and gender-matched with the asymptomatic cases due to the predominance of older individuals among those with severe disease and the higher proportion of males in our sample, a reflection of case presentation and sampling conditions. Future research should incorporate larger, demographically balanced cohorts to enable more accurate stratification. Additionally, the cross-sectional design limits causal inference. Longitudinal studies would provide deeper insights into the progression of immune responses over time and their relationship to disease outcomes. Nevertheless, with a sample size of 228 patients, one of the largest RNA-seq cohorts to date addressing five severity group from a multi-ethnic population (Table S7) using blood samples, this study is robustly equipped to address a substantial amount of biological variability. To address potential confounding effects, we implemented quality control steps, data preprocessing, and normalization. While RNA samples obtained from bedside samples using PAXgene tubes are well preserved, these tubes are not suitable for single-cell RNA sequencing due to their bulk RNA stabilization, which prevents isolation and analysis of individual cells. Additionally, while the immune cell abundances described in this manuscript have not been validated through flow cytometry, the reliability of the deconvolution methods used has been previously confirmed with flow cytometry measurements99,100. While several studies have addressed related questions, our research uniquely focuses on aspects of the COVID-19 disease spectrum that have not been examined in a single study.

Conclusions

Understanding immune responses in COVID-19 patients is crucial for evaluating treatment efficacy, predicting disease prognosis, and elucidating the variability in disease severity. Despite extensive research on SARS-CoV-2, including efforts in vaccine development aimed at reducing disease severity rather than traditional prevention, the factors driving the clinical spectrum of COVID-19 remain poorly understood. The degree of variability in clinical presentation observed in COVID-19 is exceptional among emerging highly transmissible pathogens, highlighting a unique challenge in understanding and managing COVID-19101. SARS-CoV-2 infection can manifest across a spectrum of illnesses, from asymptomatic or mild flu-like symptoms to severe pulmonary disease, which may be accompanied by significant morbidities and mortalities102. Despite the pandemic having lasted for four years, our comprehension of the host immune response to SARS-CoV-2 infection spectrum remains incomplete. This study contributes to the field by elucidating the transcriptomic differences among individuals with COVID-19, encompassing asymptomatic and mildly symptomatic cases, as well as those with mild pneumonia, severe disease, and critical conditions. To the best of our knowledge, this represents the first large-scale investigation of 228 patients that delineates gene expression-based clinical spectrum of COVID-19 within the Gulf region.

Methods

Study design

This cross-sectional study includes 228 of the initially recruited 279 COVID-19 patients infected with SARS-CoV-2 by the national public health care provider (Hamad Medical Corporation - HMC) in Qatar, between July 2020 and August 2021 identified via public screening programs or hospital admission (Fig. S1). The samples analyzed in this study were collected during the first wave of COVID-19 prior to vaccination in Qatar, between July 14 th and October 20 th 2020, in Doha Qatar. All cases were diagnosed in labs reporting to public health care facilities using RT-PCR, employing TaqPath COVID-19 Combo Kit (Thermo Fisher Scientific, Waltham, Massachusetts) or Cobas SARS-CoV-2 Test (Roche Diagnostics, Rotkreuz, Switzerland) from throat or nasopharyngeal swabs. Consented patients provided 2.5 and 4 ml of venous blood samples for PAXgene and EDTA tubes respectively, at the time of diagnosis, post-isolation or hospitalization of which 228 samples are reported in this research following quality assurance (Fig. S1). The study cohort included cases of several nationalities resident in Qatar and categorized into five groups: asymptomatic (n = 70), mild symptomatic (n = 41), mild pneumonia (n = 51), severe (n = 33), and critical (n = 33) following COVID-19 WHO guidelines103. The hospitalized COVID-19 patients with severe to critical cases were provided oxygen support in intensive care unit (ICU), out of which 31 patients died of respiratory failure. All participants were followed for a period of 3 months, with ICU mortality assessed at discharge and overall mortality recorded at the end of the follow-up period. Clinical, demographic, and anthropometric data, along with medical history and COVID-19 severity, were retrieved from HMC’s electronic healthcare system (CERNER) with patient consent. The cohort included 188 male participants, with ages ranging from 21 to 87 years. The study protocol was conducted in accordance with all guidelines and regulations and approved by the Institutional Review Boards (IRBs). None of the patients included had previously diagnosed malignancies. However, pre-existing chronic conditions were noted in some study participants (bronchial asthma in 9 patients, liver cirrhosis in 5, end stage renal disease in 7, cardiovascular disease in 29, and cerebrovascular disease in 14 patients respectively). Importantly, no other infections were identified from the patient’s clinical records. We had limited access to female patients (n = 39) due to predominant male infectivity and other constraints during sample collection.

Sample preparation and RNA isolation

Patient’s venous blood was collected in PAXgene Blood RNA Tubes (Pre Analytix), mixed to ensure complete lysis and total RNA extracted and purified using Blood miRNA kit from Qiagen (Cat. No: 763134, PreAnalytiX GmBH Hombrechtikon, Switzerland) according to the manufacturer’s instructions. The concentration and quality of RNA were assessed by Agilent Bioanalyzer 2100 (Agilent Technologies, CA).

Library construction and sequencing

The Poly-A containing mRNA molecules were purified from total RNA using poly-T oligo hybridization (dT)-attached magnetic beads from 1 µg of total RNA. The mRNA molecules were fragmented into small pieces using fragmentation reagent, followed by synthesized cDNA using SuperScript III (Invitrogen, MA). Libraries for sequencing were prepared according to the manufacturer’s instructions with mRNA Library Prep Kit (BGI) and paired-end sequencing was done by DNBSEQ high-throughput platform.

Data analysis

The raw data were subjected to QC analyses using the FastQC tool (version 0.11.9) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc). Gene expression was calculated by HTSeq104. Differentially expressed genes were identified using DESeq2 pipeline105. P-values were calculated using a paired, two-sided Wilcoxon test, and adjusted P-value (pAdj) were corrected using the Benjamini–Hochberg method. Genes with log2 fold change > 1 or < − 1, pAdj < 0.05 value were considered significant. For significance of each GSEA category, significantly regulated gene sets were evaluated with the Kolmogorov–Smirnov statistic. Demographic data were analyzed by Kruskal-Wallis rank sum test and Fisher’s exact test. A value of p < 0.05, *p < 0.001, **p < 0.0001, ***p < 0.00001 was considered statistically significant. Functional enrichment analysis was performed on the list of differentially expressed genes by using GO enrichment and Ingenuity Pathway Analysis (IPA) and clusterProfiler and ShinyGo (version 0.80) R packages54,106. KEGG pathway enrichment analysis was conducted using R package (ClusterProfiler) to identify significantly enriched biological pathways associated with the gene expression data107,108. Gene co-expression network analysis was done using CEMiTool109.

Circulating cell populations

A variety of computational methods have recently been developed to estimate cell type composition from bulk RNA sequencing data, a process commonly referred to as decomposition or deconvolution. Widely used tools such as MCP-Counter, xCell, and CIBERSORT are popular choices, though they present certain limitations, including incomplete representation of immune cell types. In this study, we employed the Gene Expression Deconvolution Interactive Tool (GEDIT)110which infers the proportions of transcriptionally active circulating immune cell types based on gene expression profiles. GEDIT includes 11 distinct reference matrices optimized for different sample types; we utilized the LM22 and ImmunoStates matrices with default settings. A comprehensive description of the GEDIT algorithm is shown in Fig. S9.