Introduction

Post-viral syndromes in the coronavirus family were observed in previous diseases related to coronavirus, such as severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS)1,2,3. Patients reported experiencing fatigue, myalgia, and psychiatric symptoms for as long as four years, and some individuals who survived SARS were symptomatic for up to 7 and 15 years4. A few months into the COVID-19 pandemic, reports of protracted COVID-19 cases began to surface. Initially, healthcare professionals (HCPs) dismissed these concerns, attributing them to anxiety and stress. However, as the frequency of such cases continued to rise, HCPs embarked on a systematic inquiry into the potential enduring consequences of COVID-195,6.

This long-term constellation of symptoms and signs, currently labeled as long COVID syndrome (LC), has become widely recognized among scientific communities. There are various definitions of LC, but the commonly utilized definition includes symptoms persisting beyond three months after the initial onset of the disease7,8. LC represents a major public health challenge, affecting approximately 20% of COVID-19 survivors9. Previous Literature show 41.7% of patients experience persistent symptoms two years post-infection, with 14.1% unable to return to work10. Chronic fatigue syndrome affects 21.4% of laboratory-confirmed cases, creating substantial healthcare system burden globally11. Less common symptoms encompass mental and cognitive disorders, headaches, joint and chest pains, hair loss, and cardiac and gastrointestinal issues. These symptoms can last six months or more after symptom onset. LC strikes individuals of all disease severities, including mild cases and younger adults who escaped hospitalization and respiratory support12. Even children, including those who had asymptomatic COVID-19, can endure debilitating symptoms for extended periods13.

LC pathophysiology is characterized by interconnected mechanisms that collectively contribute to its multisystem clinical manifestations. Viral persistence involves prolonged SARS-CoV-2 detection in multiple tissue compartments14, while mitochondrial dysfunction results in impaired cellular bioenergetics and oxidative stress15. Concurrently, immune dysregulation manifests through persistent inflammatory responses and autoimmune processes16. Additionally, autonomic nervous system dysfunction presents cardiovascular dysautonomia affecting heart rate variability17. These mechanisms interact synergistically to produce the complex symptomatology observed in LC patients.

While some studies have attempted to understand the predictors of LC development among COVID-19 patients18,19, most of these studies were small-sized and they did not use a standardized approach to defining LC or confirming COVID-19 infection20. Further investigation and research are needed to understand the range and variability of LC symptoms fully. In our study, we aim to fill the gap in the literature by assessing the prevalence and predictors of LC symptoms, their association with demographic characteristics, COVID-19 severity, and the quality of life of COVID-19 survivors by conducting a global cross-sectional study. Further, identifying patients at risk of LC is important to offer follow-up care and plan population-level public health measures.

Methods

Study settings

From April 2022 to January 2023, we conducted a multinational cross-sectional survey using online questionnaires. Eligibility spanned individuals aged 18 and above in 33 selected countries who had a PCR-confirmed COVID-19 diagnosis. Participants self-reported the timing of their COVID-19 infection and symptom onset, with PCR confirmation required for inclusion. The questionnaire development process is detailed in Supplementary File 1.

Sampling and data source

We used convenience sampling to recruit participants by distributing the online survey. The sample size of 348 was based on a standard calculation assuming an expected proportion (P) of 50% to maximize sample size with minimal prior assumptions. This was applied per country to ensure representativeness. We did not use different P values for each country to maintain consistency. Using 50% is common to provide a conservative estimate. Since we have 43 predictors in our regression model, we used the Events Per Variable (EPV) method to calculate the minimum total sample size. Assuming 20 EPV, a minimum of 860 participants were required to achieve 10 EPV. To adopt a more conservative approach with 20 EPV, a total of 1720 participants was needed from all countries. Collaborators received instructions on the data collection strategy, with a central investigator from each country overseeing the process to ensure balanced participation.

Data collection and handling

We collected participation details, demographics, medical history, COVID-19 infection course, post-COVID symptoms (neurological/cardiac/respiratory/mental health), and quality of life (physical/emotional/social impacts). Symptom assessment focused on those persisting at the time of survey completion. Participants were specifically asked to report symptoms that occurred after COVID-19 infection and persisted for > 12 weeks, distinguishing them from acute illness symptom. Participants were explicitly asked about PCR confirmation status and symptom duration. In addition, data were obtained on vaccination status, treatments, and long-term effects. Data was gathered through Google Forms and shared via social media (Facebook, Twitter, WhatsApp, and LinkedIn) with repeated postings. Each participant was allowed a single survey response to prevent duplicates. In addition, we reviewed the timestamps manually for any inconsistencies. Responses from each country were automatically translated into English and then reviewed by a bilingual translator. Finally, we compiled all data into a single datasheet for analysis.

Primary outcome

The primary outcome of interest included identifying predictors LC in PCR-positive COVID-19 patients. Long Covid syndrome was defined as “Development of signs and symptoms during or after an infection consistent with findings typical of COVID‑19 that continue for more than 12 weeks and cannot be matched to an alternative diagnosis”21,22. This operational definition ensured capture of participants with symptoms consistent with LC, distinct from acute infection symptoms.

Statistical analysis

We used the Kolmogorov–Smirnov Z test to assess normality. Descriptive statistics summarized percentages for categorical variables, and both means (± standard deviation [SD]) and medians with interquartile ranges [IQR] for continuous data. Correlations between continuous variables and LC development were evaluated using point biserial coefficients to identify important variables for subsequent analysis. The Chi-Square test examined associations between the outcome variable and categorical variables. For logistic regression, we tested the linearity of continuous variables using the Box-Tidewell test. Then we conducted a univariate logistic regression for reference. To account for intervariable confounding in the multivariate model, a preliminary logistic regression model was built to predict LC development, followed by multicollinearity assessment with variance inflation factors (VIFs). Only one variable from each set of collinear variables was retained in the final model. The model performance was assessed with Tjur’s R2 and Akaike information criterion (AIC). Analyses were conducted using R (version 4.3.2; R Project for Statistical Computing), with statistical significance set at a two-tailed P value < 0.05.

Ethical considerations

Our study protocol was approved by the Institutional Review Board of the Faculty of Medicine, Tanta University, Gharbia, Egypt (Approval number: 35960/10/22) and by the faculty of pharmacy, Applied science Private University, Amman, Jordan (Approval number: 2022-PHA-30). Informed consent was obtained from all study participants at the start of the online questionnaire after explaining the goal and methods of study. No personal data were collected. The study with conducted adhering to the tenets of the Declaration of Helsinki.

Results

Population characteristics

A total of 26,125 Participants responded to our questionnaire. Of them, 1,127 were excluded due to incomplete or invalid responses. Of the remaining 24,998 participants with reported COVID-19 infection, we included 11,801 participants who had a PCR-confirmed COVID-19 infection. According to the prespecified criteria, these patients were then annotated as those with (N = 2335, 19.8%) or without LC (N = 9466, 80.2%). In the total sample, 7196 (61%) were males, the mean age was 31.4 ± 8.5 years. The most frequently observed symptoms of Long COVID were chest pain (n = 610, 30.0%), shortness of breath (n = 898, 27.1%), dysgeusia (n = 849, 25.5%), insomnia (n = 254, 26.7%), muscle/joint symptoms (n = 1603, 24.4%), fatigue (n = 1622, 24.2%), and gastrointestinal symptoms (n = 1129, 23.3%). Detailed countries and participant numbers are presented in Supplementary Table 1, and the baseline characteristics are illustrated in Tables 1 and 2. Univariate estimates for different predictors are available in Supplementary Table 2.

Table 1 Demographic and clinical characteristics of 11,801 COVID-19 patients, stratified by long-COVID status.
Table 2 Prevalence of acute and chronic symptoms among COVID-19 patients with and without long-COVID.

Exploratory data analysis

Ethnicity, number of vaccine doses, and smoking were negatively correlated with the outcome; however, all correlations were weak (Table 3). Seventeen categorical variables, mostly dichotomous, significantly contributed to the development of LC during the single-variable analysis Tables 1 and 2. The five predictors with the highest odds were the presence of chest pain (OR 2.01; 95% CI 1.80, 2.23), both malaise and muscle symptoms (OR 1.98; 95% CI 1.80, 2.18), pre-existing GI disease (OR 1.92; 95% CI 1.60, 2.29), and shortness of breath (OR 1.83; 95% CI 1.66, 2.01). We excluded the number of vaccine doses from further analysis due to multicollinearity.

Table 3 Correlation coefficients between demographic/clinical factors and long-COVID development.

Predictors of Long Covid

The multivariable logistic regression model revealed 25 significant predictors of Long COVID, with adjusted odds ratios (AOR) as follows. Significant predictors of a higher risk of developing LC in order of effect size magnitude were ICU admission (AOR 2.08; 95% C.I. 1.36, 3.18; P = 0.001), female sex (AOR 1.8; 95% C.I. 1.61, 2.02; P < 0.001), fatigue during the infection (AOR 1.6; 95% C.I. 1.43, 1.78; P < 0.001), identifying as Hispanic (AOR 1.53; 95% C.I. 1.26, 1.85; P < 0.001), pre-existing gastrointestinal disease (AOR 1.48; 95% C.I. 1.22, 1.8; P < 0.001), muscle and joint pain (AOR 1.44; 95% C.I. 1.29, 1.61; P < 0.001), developing shortness of breath (AOR 1.43; 95% C.I. 1.28, 1.6; P < 0.001), dysgeusia (AOR 1.41; 95% C.I. 1.26, 1.59; P < 0.001), chest pain (AOR 1.33; 95% C.I. 1.17, 1.51; P < 0.001), being diagnosed with migraine (AOR 1.3; 95% C.I. 1.13, 1.49; P < 0.001), re-infection with COVID-19 (AOR 1.2; 95% C.I. 1.07, 1.34; P = 0.002) and older age (AOR 1.008; 95% C.I. 1.004, 1.012; P < 0.001).

On the other hand, the following factors were associated with a lower risk of developing LC: identifying as African American (AOR 0.37; 95% C.I. 0.29, 0.47; P < 0.001) or Asian (AOR 0.5; 95% C.I. 0.42, 0.59; P < 0.001), receiving COVID-19 vaccine (AOR 0.62; 95% C.I. 0.54, 0.71; P < 0.001), developing conjunctivitis (AOR 0.68; 95% C.I. 0.53, 0.87; P = 0.002), having Indian origin (AOR 0.68; 95% C.I. 0.52, 0.88; P = 0.004), being a current smoker (AOR 0.71; 95% C.I. 0.53, 0.95; P = 0.022), pre-existing heart disease (AOR 0.73; 95% C.I. 0.54, 0.98; P = 0.041) or anemia (AOR 0.77; 95% C.I. 0.63, 0.93; P = 0.008), and developing rash (AOR 0.77; 95% C.I. 0.61, 0.97; P = 0.028), runny nose (AOR 0.81; 95% C.I. 0.73, 0.91; P < 0.001), cough (AOR 0.85; 95% C.I. 0.77, 0.95; P = 0.003), or fever (AOR 0.86; 95% C.I. 0.77, 0.96; P = 0.005); Table. It is noteworthy that the coefficient of determination of the model was 0.08, indicating high variability which could not be accounted for by the linear model (Table 4).

Table 4 Multivariable logistic regression analysis of long-COVID risk factors: adjusted odds ratios for demographic, clinical, and symptom-based predictors.

Discussion

This multicenter study of 11,801 PCR-confirmed COVID-19 patients (19.8% developed Long COVID) identified 25 significant predictors through multivariable logistic regression analysis. The strongest risk factors for Long COVID development were ICU admission, female sex, acute-phase fatigue, Hispanic ethnicity, pre-existing gastrointestinal disease, and acute symptoms including muscle/joint pain, shortness of breath, dysgeusia, and chest pain. Conversely, protective factors included African American and Asian ethnicity, COVID-19 vaccination, conjunctivitis, Indian origin, current smoking, and, unexpectedly, pre-existing heart disease and anemia, along with certain acute symptoms like cough and fever. These findings emphasize the heterogeneous nature of Long COVID and highlight the need for personalized risk-stratification approaches to better understand the underlying mechanisms driving symptom persistence in COVID-19 survivors.

Moreno-Pérez and colleagues investigated the prevalence and factors associated with LC. Their study found that LC was detected in approximately 50.9% of patients, with varying incidence rates based on COVID severity. Notably, patients with severe pneumonia had a cumulative incidence of 58.2%, while those with mild pneumonia and without pneumonia had 36.6% and 37.0%, respectively. Unlike our findings, age, sex, comorbidities, and acute COVID-19 severity were independent predictors of LC. However, lung opacities > 50% and higher heart rates at admission in severe pneumonia cases were identified as independent predictors of LC23.

The extant literature exhibits variability regarding the correlation between the female gender and the presence of LC. Some preliminary investigations have indicated an elevated occurrence of fatigue and other symptoms in women24,25, while other studies have not identified a gender association26,27,28. Discrepancies in ethnicity, geographical location, and socio-economic status could account for these divergent findings. Hormones might contribute to the perpetuation of the hyperinflammatory state experienced during the acute phase even after convalescence29,30, and there is evidence of heightened production of IgG antibodies in females during the early stages of the disease, which could potentially lead to more favorable outcomes for women31. However, this also may play a role in the persistence of disease manifestations.

Similar to our findings, previous literature showed that advanced age was associated with persistent fatigue, musculoskeletal discomfort, and impairment in pulmonary capabilities, mirroring a decrement in organ performance and a comparatively gradual recuperative capacity25. Also, Baratta et al. corroborated the link between disease severity and enduring symptoms32. Nevertheless, instances of LC have previously been documented among individuals not admitted to hospitals diagnosed with a mild and self-remitting ailment23,24,25.

Budhiraja et al. observed that approximately 15% of individuals in their study reported symptoms persisting for over 4 weeks, with 11% experiencing symptoms for more than one year. Distinct dissimilarities were evident in the particular symptoms, with certain symptoms more prevalent among vaccinated individuals and others more frequent among the unvaccinated cohort. However, the comprehensive impact of vaccination, including the vaccine type administered, did not significantly influence the prevalence or duration of LC33. Viet-Thi et al. revealed a correlation between the amelioration of LC symptoms and the administration of vaccinations34. According to the hypothesis advanced to elucidate these observations, a theory previously assumed by Molnar et al., the ailment’s progression is potentially driven by a persisting viral reservoir and/or circulating viral fragments. The COVID-19 vaccine could potentially stimulate and activate the entire immune system, eradicating these persisting viral antigens and consequent enhancement of symptomatology35. An additional survey encompassing 900 LC patients, conducted by a UK-based advocacy group, reported that 56.7% of participants noticed an improvement in their symptoms following their initial COVID-19 vaccine dose. However, the precise underlying mechanism remains to be definitively ascertained36.

The protective association of COVID-19 vaccination may be explained by the vaccine’s ability to reduce viral load and duration of infection, potentially limiting tissue damage and subsequent inflammatory cascades37. The inverse relationship between certain acute symptoms (conjunctivitis, rash, runny nose, cough, and fever) and LC risk might reflect different immunological responses or viral tropism patterns that paradoxically result in more effective viral clearance38. The protective effect observed in current smokers is counterintuitive but has been noted in previous studies and might relate to nicotine’s immunomodulatory effects on ACE2 expression or cytokine responses39,40. Similarly, the protective associations with pre-existing heart disease and anemia could suggest altered baseline inflammatory states or medication effects that inadvertently modify the post-infection immune response41. These findings underscore the complex pathophysiology of LC and highlight the need for mechanistic studies to understand these protective relationships.

Our analysis identified multiple predictors for LC development that included both demographic and disease-specific criteria e.g. clinical manifestations, severity, and re-infection. Interestingly, receiving even one dose of the COVID-19 vaccination was associated with lower odds of developing LC. Of course. Here is a revised and consolidated paragraph that integrates all the limitations into a formal, scientific format. However, our study was not free of limitations that warrant consideration. First, from a statistical standpoint, our logistic regression model yielded a low pseudo-R2 value (0.08) and a high AIC (10,776). While this preserves the interpretation of predictor direction and relative importance, the low R2 limits the interpretability of the absolute magnitude of the effect sizes, and the high AIC suggests considerable model complexity that may indicate non-linear relationships in the data. Second, the study’s methodological design has inherent constraints. The cross-sectional nature precludes causal inference, and our reliance on a self-report online questionnaire introduces potential for self-selection and recall bias. Although eligibility required self-reported PCR confirmation, these results could not be independently verified. Our distribution method via social media also prevented the calculation of a formal response rate, and the exclusion of incomplete responses may have introduced sampling bias. Finally, the ethnicity-based analyses were not adjusted for country of residence, which may have allowed for residual confounding. Future research is needed to corroborate these findings, potentially with longitudinal designs, and to identify additional predictors that better explain the variance in LC occurrence.

Clinical implications and future directions

The identification of key predictors for LC has critical clinical relevance. Risk stratification based on factors such as ICU admission, gender, acute-phase symptoms, and pre-existing conditions can help clinicians proactively monitor high-risk patients and initiate early supportive interventions. Conversely, recognition of protective factors, such as vaccination, certain ethnic backgrounds, and specific acute symptoms, may inform public health strategies and patient counseling. The observed protective effect of even a single vaccine dose reinforces the need for continued vaccination advocacy, particularly in populations with lower uptake.

Future research should prioritize prospective, longitudinal studies incorporating objective biomarkers, imaging, and validated clinical assessments to overcome the limitations of self-reported, cross-sectional data. Mechanistic studies are essential to elucidate the immunological, virological, and genetic underpinnings of both risk and protective factors, particularly in understanding paradoxical findings such as the protective association with smoking and pre-existing heart disease. Furthermore, predictive models incorporating non-linear dynamics and multi-omic data should be developed to enhance individualized care pathways. Ultimately, a precision medicine approach integrating demographic, clinical, and immunologic profiles is needed to inform targeted therapies and rehabilitation strategies for LC patients.

Conclusion

Our large-scale analysis identified key LC predictors, with ICU admission, female sex, and acute fatigue as primary risk factors, while African American and Asian ethnicities and receiving even one dose of the vaccination showed protective effects. Future research should employ longitudinal designs with objective clinical measures, explore additional biological and psychosocial predictors, and investigate non-linear relationships to better understand long COVID pathogenesis. Priority should be given to developing more robust predictive models that can guide clinical decision-making and targeted interventions for high-risk populations.