Introduction

Globally, millions of people were infected with the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), causing coronavirus disease 2019 (COVID-19)1,2. COVID-19 is now recognized as a multi-organ disease with a broad spectrum of manifestations3and is associated with substantial morbidity and mortality1.

Post COVID-19 condition, an important sequela of SARS-CoV-2 infection, is defined as symptoms that appear three months after the onset of the infection, either as new symptoms following initial recovery from an acute COVID-19 episode or as persisting symptoms from the initial illness, lasting for at least two months. Symptoms cannot be explained by an alternative diagnosis, may fluctuate or relapse and generally have an impact on everyday functioning4. A prevalence of 32–46% of COVID-19-related symptoms at three months5,6,7 and a prevalence of 41% at > 12 months post SARS-CoV-2 infection have been reported6. In a population-based study, 50.7% of study participants reported at least one COVID-19-related symptom at nine months post infection8. A variety of persistent symptoms in post COVID-19 condition were reported including fatigue, headache, attention disorder, hair loss, dyspnoea, cough, chest pain, myalgia, joint pain, impaired mobility, cognitive impairment, olfactory and gustatory dysfunction, sleep disorder, depression, anxiety9. The most commonly reported symptoms one year after the infection were fatigue, dyspnoea, sleep disorders and myalgia5. These symptoms decreased physical and mental health-related quality of life and could reduce independence in activities of daily living7,9.

Exercise plays an important role in the prevention, but also in the therapy of numerous diseases10,11,12. A protective effect of regular physical activity against a severe course of the disease in the case of an infection with SARS-CoV-2 has been suggested in retrospective observational studies. Persistent inactivity, on the other hand, may intensify COVID-19 related symptoms13,14. The guideline on Long-/Post-COVID of the German Society of Pneumology and Respiratory Medicine and other German societies recommended a controlled instruction in physical activity or a dosed physical training for patients with post COVID-19 symptoms such as fatigue, cough and concentration problems15. A recently published report and meta-analysis by Torres and Gradidge16 suggested that exercise rehabilitation interventions improved cardiorespiratory fitness and pulmonary function, functional and physical capacity and health related quality of life in patients with post COVID-19 condition. Various types of rehabilitation interventions were implemented in the studies reviewed, including aerobic exercise, flexibility, proprioception, breathing, respiratory exercise, muscular endurance exercise, gymnastic and balance. Telehealth and home-based exercise programs likewise demonstrated beneficial effects. The authors emphasized the need for future studies with robust randomised controlled trials, as most of the included studies in the report had no control group. Furthermore, as most of the studies investigated a multi-disciplinary approach to rehabilitation, more research on the effectiveness of specific, independent interventions with a detailed description on volume and pattern of progression was encouraged16,17. The need for quality reporting in exercise interventions in health and disease to replicate the exercise program in different clinical settings was highlighted in another recent systematic review (‘If exercise is medicine, why don’t we know the dose?’)18. Evidence on post COVID-19 condition > 12 months after infection remains limited, as most studies have investigated the condition 3 to 12 months after infection. Even though some studies investigated telehealth and home-based exercise programs, there is still limited knowledge on the effectiveness of unsupervised interventions, especially in specific, independent exercise interventions.

Objective of the Physical activity & post COVID-19 condition (SPOVID) randomized controlled pilot study was to investigate general feasibility of a 12-week unsupervised aerobic exercise training program in participants with persistent symptoms (fatigue, concentration problems, breathing problems or headache) of moderate severity > 12 months after infection with SARS-CoV-2. The intervention program aimed at meeting the World Health Organization physical activity guidelines per week19. Adherence to the intervention procedure was the primary feasibility outcome. Multiple symptom-related, psychosocial, spiroergometric and body composition parameters were collected at study baseline and at follow-up examination in order to quantify potential effects of the aerobic exercise training program. This was a strictly explorative pilot study using descriptive analyses to inform future research testing the efficacy of an unsupervised aerobic exercise training program (i.e., no hypotheses were stated, no group differences were tested, no outcome efficacy cut-offs were provided, no formal a priori feasibility thresholds, e.g., for recruitment, retention, adherence were defined).

Methods

Study design and study population

The SPOVID pilot study was a randomized controlled intervention study (parallel group design) embedded in the POSTCOVE study, a prospective cohort study of 800 participants. The POSTCOVE study aimed to assess persistent symptoms, overall health status and (sub-)clinical markers for cardiovascular, metabolic and respiratory conditions in participants > 12 months after SARS-CoV-2 infection. Recruitment for the POSTCOVE cohort took place between September 2021 and May 2023). Eligible participants for POSTCOVE were residents of the City of Essen, Germany, aged 18–75 years, who had been registered by local health authorities as having tested positive for SARS-CoV-2 infection, with a date of first infection between February 2020 (the beginning of the first pandemic wave in Essen) and November 2020.

The SPOVID pilot study was explorative in nature; therefore, no formal sample size calculation was performed. Based on the assumption that approximately 10% of the POSTCOVE cohort would meet the specific inclusion criteria for the SPOVID pilot study (see below), it was assumed feasible to recruit 60 participants during the first ~ 9 month of the POSTCOVE recruitment period. This number of participants has been described as being adequate for pilot studies with the aim of assessing potential feasibility problems with a moderate to high prevalence20.

From the POSTCOVE cohort, all participants aged between 18 and 70 years (n = 479) who had completed study examination by July 22, 2022, were then screened for at least one of the self-reported, on-going COVID-19-related symptoms: fatigue, concentration difficulties, breathing problems or headache (see Fig. 1). Of these, participants with sufficient German language proficiency (n = 215) were invited to participate in the SPOVID pilot intervention study. 66 (30.7%) individuals were willing to participate, gave informed consent and underwent their baseline examination (T0) at the Department of Cardiology and Angiology, Elisabeth Hospital in Essen, between May and August 2022. This time frame was chosen to enable all participants to begin the intervention during late spring or summer.

Fig. 1
figure 1

Flow chart of study population.

At T0, participants received a comprehensive internistic-cardiological health assessment to evaluate general physical resilience and identify potential clinical contraindications for participation in a structured physical intervention program. Further exclusion criteria included a current SARS-CoV-2 infection; being classified as a trained or higher-level athlete according to the participant classification framework proposed by McKay et al. (2022)21; chronic fatigue syndrome; pregnancy or breastfeeding; current in-patient treatment; or unwillingness to begin and consistently adhere to an exercise training program during the study period. If any contraindications were identified, participants were not randomized and were referred for further medical evaluation. Four participants were excluded due to cardiovascular problems. None of the included participants met diagnostic criteria for chronic fatigue syndrome, as none reported post-exertional malaise in particular and all expressed confidence in their ability to follow a regular training program.

Of the 66 participants who gave informed consented to participate in the SPOVID pilot study, 62 met al.l inclusion criteria and had no medical contradictions (Fig. 1). These participants were individually randomized in a 1:1 ratio to either the intervention or control group, stratified by sex and age group (≤ 60 years/>60 years). Randomization was performed using the online service ALEA22which employs the minimisation technique described by Pocock and Simon23 that minimizes imbalance in the distributions of treatment numbers within the levels of each individual stratification factor.

A sport scientist blinded to the results of the medical examination, entered the relevant participant information (sex and age) into the ALEA software tool to determine the random group allocation to then generate an enrolment form containing the participant’s data and allocation results. A total of 31 participants were allocated to the intervention group and 31 to the control group (Fig. 1).

All enrolment forms were securely stored in a folder accessible only to the sport scientist responsible for randomisation. While the sport scientist guiding the intervention and supervising participants, as well as the participants themselves and the data analyst, were aware of the group allocation results, the medical staff conducting the physical examination- including spiroergometry, body composition examination and face-to-face interviews, were blinded to group allocation. Participants were instructed not to reveal their group allocation during the follow-up examination (T1).

T1 examination took place 12 weeks after baseline to assess the effects of the exercise intervention.

The intervention phase and all follow-up examinations were completed by December 2022.

A total of eight participants (n = 5 from the intervention group, n = 3 from the control group) dropped out due to the following reasons: time constraints related to training adherence or diary documentation (n = 2), personal reasons (n = 1), relocation (n = 1), inability to participate in the T1 examination (n = 2), injury unrelated to the intervention program (n = 1), and acute illness unrelated to the intervention program (n = 1). Consequently, 54 participants completed the T1 examination (Fig. 1).

The study was conducted in accordance with the guidelines and recommendations for ensuring Good Epidemiological Practice24approved by the ethics committee of the University Duisburg-Essen (approval number: 22-10565-BO), and monitored throughout its course. The study was performed in accordance with the ethical principles of the Declaration of Helsinki. The study was registered with the German Ministry of Education and Science prior to its start (#FKZ 01EP2104A/B; https://www.gesundheitsforschung-bmbf.de/de/spovid-sport-long-covid-syndrom-14348.php; registration date: January 12, 2021).

Aerobic exercise training program

The intervention group underwent a 12-week, unsupervised aerobic exercise training program focused on low intensities aligning with zone two and three of a five-zone intensity scale25. Individual training zones, along with the corresponding heart rates, were determined based on ventilatory thresholds derived from an incremental exercise test. Training zones two and three and their corresponding exercise doses were positioned below the first ventilatory threshold and between the first and second ventilatory thresholds, respectively. Prior to training initiation, the exercise group received guidance on setting aerobic exercise intensity by the same specially trained sport scientist in a single two-hour session. Subsequently, participants were instructed to independently engage in running or walking-based aerobic exercise sessions three times a week, in accordance with their individual fitness levels and corresponding training zones. Although some participants reported engaging in regular physical activity prior to enrolling in the study (Table 1), the training program still represented an increase in systematically planned, training-related physical activity for all participants. To enhance engagement and training effectiveness, the program included a blend of steady-state and interval training, along with progressive weekly increases in both training volume and intensity. Every fourth week was designed a recovery week with reduced training volume, establishing an undulatory loading scheme (Supplementary Table S1). Bi-weekly telephone check-ins were utilized for tracking progress throughout the training period.

Table 1 Characteristics of intention-to-treat analysis population at baseline examination (T0).

Participants in the control group were asked to maintain their habitual physical activity patterns between baseline (T0) and follow-up (T1) and were offered the training intervention program after the T1 examination. All participants (intervention and control) documented their training adherence and additional physical activity using an online training diary (REGmon26.

Parameters

At T0 and T1, standardised face-to-face interviews with a cardiologist were conducted to assess the following clinical parameters: persistent self-perceived symptoms > 12 months after SARS-CoV-2 infection (fatigue, concentration difficulties, headache, dyspnea) using a 0–10 numeric scale (10 = highest intensity) the intensity related to their SARS-CoV-2 infection was measured, SARS-CoV-2 re-infection prior to T0 examination, current physical performance rated on a 0–10 scale (0 representing “very restricted” and 10 representing “very powerful”). Following previous or current comorbidities and risk factors such as past surgeries, accidents and hospitalizations, present disability were collected: cardiovascular, metabolic, neurological, mental, gastrointestinal, orthopaedic and other lung disease, asthma and/or allergies, arterial hypertension, diabetes mellitus, dyslipoproteinemia and smoking status (categorized as current, former or never).

At T1, prevalent diseases, accidents, surgeries and disabilities occurring between T0 and T1 were assessed.

At T0, a separate standardized face-to-face interview with a sport scientist collected data on prior sporting activities and post-infection exercise behaviour. Information on current health status (assessed on a 1–5 scale, 1= “very good” to 5= “poor”), quality of life (assessed on a 1–4 scale with 1= “very bad” to 4= “good”), satisfaction with life in general and with health (assessed on a 1–4 scale with 1= “very dissatisfied” to 4=“very satisfied”), and sleep quality in the past month (assessed on a 1–3 scale with 1=“very good” to 3=“very bad”). Further, depressive symptoms were assessed using the 15-item Centre for Epidemiologic Studies Depression Scale (CES-D) with a range of 0 to 45 points, higher scores indicating greater symptom burden27. Also, physical activity-related health competence (PAHCO) was rated using a the PAHCO- questionnaire consisting of 42-items (condensed into 10 first-order scales and additionally pooled into three second-order scores representing movement competence, control competence and self-regulation competence)28. Symptoms of depression and physical activity-related health competence were collected at T0 and T1 in standardized and validated self-administered questionnaires. Mean scores of the three second-order scores of the PAHCO-questionnaire with the range of 1 to 4 were used in the analysis, with higher values indicating higher competence.

The internistic-cardiological health check to determine general physical resilience and potential clinical contraindications included the following examinations: vital parameter status (heart rate, blood pressure, respiratory rate), auscultation of the heart, abdominal palpation, presence of edema or bowel sounds, NYHA-classification to categorize heart failure, transthoracic echocardiography, 12-channel ECG and bodypletysmography to assess pulmonary function. Laboratory analyses were performed to determine a complete blood count and to collect several additional biomarkers such as Troponin T, pro-BNP, e-CRP and D-Dimere and 25-OH-Vit.-D status. Parameters of the bioelectrical impedance analysis and spiroergometry on the treadmill ergometer including exercise ECG were analysed to assess the influence of the exercise training program. In the bioelectrical impedance analysis height, weight, body mass index, skeletal muscle mass and body fat were determined with a standard stadiometer and a biometrical impedance analysis system (InBody Deutschland, Eschborn, Germany). To determine peak oxygen consumption (\(\dot {V}\)O2peak), peak power output (Wpeak), peak heart rate (HRpeak), peak respiratory exchange ratio (RERpeak), as well as W and HR at ventilatory thresholds 1 (VT1) and 2 (VT2), W and HR at lactate thresholds, and heart rate recovery during the first 3 min after test cessation (HRR2), a stepwise incremental cycle ergometer test was conducted using a Cyclus 2 ergometer (RBM elektronik-automation GmbH, Leipzig, Germany). The test involved spiroergometry, exercise electrocardiography (ECG), and blood lactate diagnostics. Commencing with an initial resistance of 50 W, resistance increased by 25 W every 3 min until subjective exhaustion. Participants were free to cycle at their preferred pedal rate, as the Cyclus 2 ergometer maintains a constant power condition independent of pedal cadence.

Gas exchange data were continuously collected using a breath-by-breath gas collection system (Metalyzer 3B, Cortex Biophysik GmbH, Leipzig, Germany). Gas calibrations were performed before each test in accordance with the manufacturer’s instructions. A rolling average over 30 s was applied for respiratory data smoothing, and the highest 30-second rolling averages during the test were defined as \(\dot {V}\)O2peak and RERpeak. Wpeak was calculated as follows: Wpeak = Wf + [(t/D x P)], where Wf represents the value of the last completed workload (W), t is the time (s) the last uncompleted workload was sustained, D is the duration (s) of each stage, and P is the power output difference between workloads. W and HR at VT1 and VT2 were determined visually by combining four methods: the ventilatory equivalent method, the excess carbon dioxide method, the V-slope method, and the end-tidal method. Two trained sports scientists independently and blindly assessed each participant’s graphic data, followed by a conference to reconcile any differences and arrive at a consensus for each threshold.

HR was monitored and recorded during the test via a 12-lead ECG (custo cardio 100/ERG BT, custo med GmbH, Ottobrunn, Germany), and HRpeak as well as HRR after the test were determined from the data. Capillary whole-blood samples were obtained from the earlobe before the test, during the last 15 s of each stage, and at the point of exhaustion. These samples were analyzed for lactate (La) using 20-µL capillaries, hemolyzed in 1-mL microtest tubes, and subjected to amperometric-enzymatic analysis using the Biosen C-Line Sport (EKF-diagnostic GmbH, Barleben, Germany). From the resulting lactate values, W and HR at aerobic (2 mmol/l) and anaerobic lactate thresholds (4 mmol/l) were determined, following the methodologies outlined by Mader et al.29 and Heck et al.30.

Training diary

Participants from both the intervention and the control group maintained an online training diary (REGmon)26which served as a tool for tracking their adherence to the training plan and to document any training that was completed outside the study conditions. The digital platform enabled participants to log both external and internal training loads, including the date of the training session, the type of activity (such as jogging, walking, cycling, swimming, or rowing), training volume, and training intensity. Training duration was documented in terms of time spent on each exercise session, while mean training intensity was rated by the subjects following each training session using a 10-point category-ratio (CR-10) rating of perceived exertion (RPE) scale31,32,33. The internal training load for each training session was calculated using the session rating of perceived exertion (session-RPE) method32,34. This method involved multiplying the absolute training duration in minutes by the training intensity. The online training diary also served as a repository for recording significant events that could affect training progress. Participants documented occurrences such as injuries, illness, or other relevant circumstances that might influence their training progression.

Description of the Intention-to-treat, per-protocol and as-treated analysis population

In the ITT analysis, all participants with non-missing information (n = 54) were included according to their randomization (n = 26 in intervention group, n = 28 in control group) (Table 2). The PP population consisted exclusively of participants without major protocol deviation within their assigned group defined as at least 27 documented training sessions within training zone two and three of the five-zone intensity scale25 for intervention group participants and less than 18 documented training sessions for control group participants. 3 participants from the intervention group with 18–26 exercise sessions at training zone two and three had to be excluded, because they were not concordant with either of the two groups. Likewise, 2 participants from the intervention group with a stop of training due to documented infection or injury for more than 6 weeks (out of the 12 weeks training program) as well as 6 participants (n = 4 of the intervention group, n = 2 of the control group) with SARS-CoV-2 re-infection during T0 and T1 were excluded from PP analysis. This resulted in a per-protocol population of 37 participants (n = 14 in intervention group, n = 23 in control group) (Table 2). The AT analysis classified participants according to the actual training they reported rather than the study group they were assigned to applying the same definition of training volume and intensity for the intervention and control group as in PP population. Accordingly, 3 participants randomized to the intervention group reporting less than 18 exercise sessions were assigned to the control group. Conversely, 3 participants randomized to the control group reporting more than 27 exercise sessions within training zone two or three were assigned to the intervention group. This resulted in an AT population of 43 participants (n = 17 in intervention group, n = 26 in control group) (Table 2).

Table 2 Description of the population size of the intention-to-treat, per-protocol and as-treated analysis population.

Statistical analyses

Adherence to the intervention procedure was the primary feasibility outcome. This was a strictly explorative pilot study using descriptive analyses. No hypotheses were stated, no group differences were tested, no outcome efficacy cut-offs were provided, no formal a priori feasibility thresholds were defined. Data was evaluated using intention-to-treat (ITT, n = 26 intervention and n = 28 control group), per protocol (PP, n = 14 intervention and n = 23 control group) and as-treated (AT, n = 17 intervention and n = 26 control group) analysis (Table 2). As this study was designed as a feasibility study and there was no primary endpoint, all parameters that could be potentially affected by the training intervention were explored in separate statistical models to assess direction and strength of group differences in parameter change between T0 and T1. In ITT, PP and AT analysis mean values and standard deviations (SD) were generated for the intervention and control group at T0 and T1. Scores and Likert-scale categorical parameters were treated as continuous parameters in the analyses. Furthermore, the difference of mean values at the time of T1 subtracted by T0 was calculated separately for the intervention and control group, to assess the change in the respective parameter over time in each group. Additionally linear regression models for each parameter were fitted including a time-dependent dummy variable (1 = T1, 0 = T0), a treatment group dummy variable (1 = intervention group, 0 = control group) and an interaction term between time-dependent and treatment group dummy variables to calculate the difference in differences (DID) with corresponding 95% confidence interval (95%-CI). The DID indicates the difference in change of the intervention group compared to the control group over time. Taking the different response scale widths (number of scale points ranged from 3 to 46 points) of the analysed parameters into account and for better comparability of the magnitude of effect sizes across parameters, effect parameters and confidence intervals were described per standard deviation (SD) by dividing the DID by the SD of the respective parameter at T0 in the ITT analysis population. Due to the different response scale directions (some descending from “very good” to “bad” and others ascending) positive DID did not automatically represent indication of a stronger improvement in the intervention group. Therefore, the direction of the treatment effect was reported with a “+” in the last column of each table to intuitively spot when the direction of DID went in the direction of the hypothesis (i.e., stronger improvement in the intervention group). Clopper-Pearson 95% confidence intervals were calculated in order to assess whether the proportion of DID consistent with the hypothesis could be expected by chance35. Mean values and difference of mean values were calculated using SAS® software v9.0436 and (SD-scaled) difference in difference with 95%-CI were calculated in R v4.1.237.

Direction of difference in differences (DID)

Direction of difference in differences (DID) was indicated with a “+” when direction of effect consistent to a positive DID ((T1-T0 intervention group) - (T1-T0 control group)) with a stronger positive difference (T1-T0) in the intervention group for parameters where higher values represent better health/wellbeing/fitness or consistent to a negative DID with a stronger negative difference in the intervention group for parameters where lower values represent better health/wellbeing/fitness and “-“ when direction of effect neither consistent to a positive DID ((T1-T0 intervention group) - (T1-T0 control group)) with a stronger positive difference (T1-T0) in the intervention group for parameters where higher values represent better health/wellbeing/fitness nor consistent to a negative DID with a stronger negative difference in the intervention group for parameters where lower values represent better health/wellbeing/fitness).

Results

Description of study population

Table 1 describes the characteristics of the ITT analysis population at T0 stratified by study group (Table 1). Intervention and control group consisted of 42% and 57% female participants, respectively. Mean age was 52 years in both groups. A median of 101.3 and 99.1 weeks between SARS-CoV-2 infection and T0 examination was observed for the intervention and control group, respectively. The average time between T0 and T1 examination was slightly higher in the intervention group. Supplementary Figure S1 displays the time between T0 and T1 for each participant stratified by group. Overall, few participants showed strong deviation from the target time of 12 weeks between examinations with more extreme values in the control group. The strongest deviations (16.5, 17.0, 20.9 and 21.1 weeks) in the intervention group were the consequence of documented acute illness (SARS-CoV-2 re-infection and influenza) and injuries for the duration of 2 to 4 weeks during or at the end of the exercise training period. In one case, no training was recorded because of an injury.

A number of 5 participants (21%) in the intervention group and 10 participants (36%) in the control group documented a SARS-CoV-2 re-infection before T0 examination (Table 1). Most participants performed regular (68% intervention group, 48% control group) or occasional (20% intervention, 39% control group) former sport activities and 79% in the intervention group and 70% in the control group reported sport activities after their first SARS-CoV-2 infection. There were 7 former competitive athletes in the intervention and 6 in the control group. Mean baseline self-rated quality of life and satisfaction with health were between fair to good, physical activity-related health competence (movement, control and self-regulation competence), SARS-CoV-2-related symptoms, current performance and sleep quality were self-rated as intermediate (Table 3).

Table 3 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of self-rated symptom-related and psychosocial parameters in intention-to-treat analysis.

Orthopaedic diseases were the most reported comorbidities (previous or current) at T0 with 54% in the intervention and 46% in the control group. The largest difference between groups was noted for asthma and allergies (42% vs. 61%) (Table 1). Acute diseases reported to be most prevalent between T0 and T1 examination were cardiovascular and orthopaedic diseases (Table 4). 4 participants of the intervention group and 2 participants of the control group indicated a SARS-CoV-2 re-infection during the intervention period. More surgeries and hospitalizations not related to the exercise training program occurred in the intervention group between T0 and T1 compared to the control group (Table 4).

Table 4 Disease, surgery, accident and hospitalization of intention-to-treat analysis population reported to be prevalent between baseline (T0) and follow-up examination (T1).

Description of training diary data

Planned quantitative training load data (i.e., number of training weeks, total training sessions, weekly training sessions, total training duration, weekly training duration, average training intensity, total training load, and weekly training load) and actual mean (± SD) training load data of the intervention group and the control group are presented in Supplementary Table S2-S4. In the ITT analysis, both the total average training load (duration x intensity, determined by using the session rating of perceived exertion (session-RPE) method32 of the intervention group and the total average training load of the control group were slightly higher than the planned training load for the intervention group. In the PP analysis and, ultimately, in the AT analysis, larger differences in quantitative training load data were observed between the intervention group and the control group. The average quantitative training load data in the PP analysis and, especially, in the AT analysis was clearly higher in the intervention group than the planned training load. In contrast, the control group’s average quantitative training load was notably below the planned training load for the intervention group. However, even though the mean training load data corresponded to the planned training load data, there were substantial SDs in all training load data (i.e., sessions, duration, intensity and training load) in all analysis groups, indicating that a large number of participants did not strictly adhere to the training plan. For example, only 8 (out of 26) of the intervention group participants in the ITT analysis and 10 (out of 17) of the intervention group participants in the AT analysis documented 36 training sessions or more in total according to the training diary entries. Likewise, only 13 of the intervention group participants in the AT analysis reached the planned total training loads. On the other hand, there were few participants highly exceeding the planned training sessions and total training load, resulting in high mean values in the descriptive training load data.

Compliance

Insufficient adherence to the training plan is also supported by the qualitative feedback from participants during the bi-weekly telephone conversations. For instance, most of the participants frequently engaged in activities other than running or walking-based aerobic endurance exercise, such as cycling, swimming, strength training, or soccer. Furthermore, many participants indicated throughout the unsupervised training period that they did not want to or were unable to engage in interval training or fartlek and, consequently, did not incorporate these into their routines. Overall, both the training documentation via online training diary and feedback from the regularly conducted telephone conversations indicated that the majority of participants did not strictly adhere to the guidelines for training intensity, volume, frequency, or type of physical activity, either because they could not, due to diseases or injuries not related to the intervention or because they did not want to. Finally, the training documentation itself must also be questioned, as many participants did not, as prescribed, record entries in their online training diary immediately after each session. Instead, they did so only (sometimes upon request) several days to weeks after a training session and then documented multiple sessions at once. This undermines the credibility of the quantitative training load data, especially in terms of reported training intensities and thus calculated training load.

Outcome description: ITT, PP and AT analyses approach

Self-rated symptom-related and psychosocial health parameters

In the ITT analysis considering the self-rated symptom-related and psychosocial parameters, 11 out of 14 parameters showed a positive direction of DID (presented with a “+”), indicating a direction of effect consistent to a positive DID ((T1-T0 intervention group) - (T1-T0 control group)) with a stronger positive difference (T1-T0) in the intervention group for parameters where higher values represent better health/wellbeing/fitness or consistent to a negative DID with a stronger negative difference in the intervention group for parameters where lower values represent better health/wellbeing/fitness (Table 3). The strongest SD-standardized DID was observed in headache with an effect of almost one SD. The absolute DID in headache was − 2.54 (− 6.05, 0.98) points as a result of an improvement between T0 and T1 by 2.57 points (range: 0–10 points) on the rating scale in the intervention group and an almost constant average headache symptom rating in the control group. DID of the other parameters with suggested positive direction of DID were even weaker with non-existent to moderate SD-standardized effect sizes (Table 3).

The PP analysis was consistent with the direction of DID estimates compared to the ITT-analysis in most parameters and effect sizes where slightly stronger in almost all parameters with a suggested positive direction of DID compared to the ITT analysis (Table 5). Comparing the AT analysis with the PP analysis results, direction of DID remained unchanged in most parameters and SD-standardized DIDs were slightly lower (Table 6).

Table 5 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of self-rated symptom-related and psychosocial parameters in per-protocol analysis.
Table 6 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of self-rated symptom-related and psychosocial parameters in as-treated analysis.

Spiroergometric parameters

In the ITT analysis of spiroergometric parameters 11 out of 15 parameters showed a positive direction of DID (Table 7). Very low to moderate SD-standardized DID were indicated with the strongest SD-standardized DID in heart rate at the thresholds and in heart rate recovery.

Table 7 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of spiroergometric and body composition parameters in the intention-to-treat analysis.

In PP analysis, most parameters showed a direction of DID consistent to the ITT analysis. No clear changes in SD-standardised DID compared to the ITT analysis could be observed (Table 8). The direction of DID in the AT analysis was consistent to the PP analysis and DID only changed slightly with the strongest changes in peak power (DID: 9.56 [-35.16, 54.28] and power at 2 mmol lactate threshold (DID: 10.02 [-28.90, 48.94]) (Table 9).

Table 8 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of spiroergometric and body composition parameters in per-protocol analysis.
Table 9 Baseline (T0) and follow-up (T1) means and standard deviation (SD) in intervention and control group, difference between T1 and T0 within groups, difference in differences of intervention and control group with corresponding 95% confidence intervals (95%-CI) of spiroergometric and body composition parameters in as-treated analysis.

Body composition parameters

In the ITT analysis of the body composition parameters, weight, BMI and body fat mass indicated positive direction of DID, with BMI indicating the strongest SD-standardized DID (− 0.26 [− 1.04, 0.53]) (Table 7). In the PP and AT analysis of the bioelectrical impedance analysis, direction of DID were consistent to ITT analysis with slightly weaker effect sizes (Table 8  and  9).

The proportion (p̂) of parameters with a positive direction of DID was 0.76 (25 out of 33 parameters) in the ITT analysis with a corresponding Clopper-Pearson 95% CI of 0.58 to 0.89. This indicated that the proportion of positive DID may not have been expected by chance. The proportion of positive DID in the PP (p̂ = 0.67 [0.48, 9.82]) and AT (p̂ = 0.70 [0.51, 0.84]) analysis were slightly lower.

Discussion

Here we report on the limited feasibility of a 12-week unsupervised aerobic exercise training intervention among participants experiencing persistent, moderately severe symptoms > 12 months after SARS-CoV-2 infection. The intervention of the present study proves feasibility in terms of safety and tolerability, as participants did not report any adverse events or safety concerns related to the intervention. However, feasibility is constrained by low adherence to the training plan and incomplete maintenance of the training diary, which limits the ability to draw firm conclusions regarding efficacy. This explorative pilot study used descriptive analyses to inform future research testing the efficacy of an unsupervised aerobic exercise training program.

The present study shows weak indications of a potential reduction in self-rated symptom severity and improvement in overall well-being. Across most investigated parameters including symptom-related, psychosocial, spiroergometric, and body composition outcomes, effect sizes show a positive direction even in the ITT analysis, which represents the most conservative analysis approach. The strongest standardized DID are observed for symptom-related and psychosocial parameters. Nonetheless, all explored group differences remain small to moderate, with low precision of effect size estimates (i.e., wide 95% confidence intervals) due to the small sample size. Several potential factors may contribute to the modest effect sizes in the present study. Due to the interventions aim to prevent initial overstress and exhaustion, the initial training load was kept low. Consequently, participants only reached the recommendations of the World Health Organization (WHO) for physical activity (> 150 min of light to moderate physical activity per week) at the end of the training plan. This initial low training load, defined by intensity by duration, may contribute to the very low effect sizes, particularly in the spiroergometric parameters. Higher training loads, in terms of intensity and/or duration, appear to be necessary to achieve efficacy at the cardiovascular level.

A randomized controlled trial (RCT) by Jimeno-Almazán et al., investigating higher training loads in terms of increased intensities, has indicated more substantial improvements in both psychosocial and cardiovascular fitness-related parameters in the intervention group (n = 19) compared to the control group (n = 20) in patients with post-COVID-19 condition. The 8-week supervised resistance and endurance training has demonstrated that an interval training twice a week (4–6 × 3–5 min) at a rating of perceived exertion (RPE, according to the 6–20 Borg scale) of 16, coupled with steady-state training (30–60 min) once a week at an RPE of 12, in addition to resistance training, has been safe, well-tolerated, and more effective compared to the control group, which followed the self-management WHO rehabilitation leaflet38. Additionally, other supervised intervention studies have indicated that higher training intensities in the form of high-intensity interval training (HIIT) have been safely tolerated and may improve functional capacity in individuals recently hospitalized for severe COVID-1939,40. Achieving a higher training load in aerobic exercise training programs could also be possible with longer training durations.

In the present, some participants exceeded the planned training duration, resulting in higher mean training loads compared to the planned training loads in the as-treated (AT) analysis population. The mean difference in relative VO2peak (1,8 ml/kg/min) from baseline to follow-up in the present AT population is comparable to the results of the 8-week supervised intervention by Jimeno-Almazán et al. (2,1 ml/kg/min), despite the unsupervised nature and lower intensity of the present intervention.

The present unsupervised intervention reveals limited compliance, as many participants did not strictly adhere to the training plan and failed to consistently maintain the training diary. In general, supervised interventions facilitate higher compliance, likely due to structured oversight and real-time feedback. Other unsupervised intervention studies involving previously hospitalized COVID-19 patients addressed this by monitoring compliance using intensity measurement tools. For example, Li et al. utilized heart rate telemetry devices, while Kortianou et al. measured heart rate and oxygen saturation41,42. Despite the determination of objective load parameters to monitor the compliance, these unsupervised interventions also showed limited efficacy across most of the investigated parameters, comparable to the findings of the present study.

However, in the RCT by Li et al., an unsupervised, multidisciplinary 6-week exercise program has demonstrated superior effects on functional exercise capacity (6-minute walking test), lower limb muscle strength, and physical health-related quality of life in the intervention group (n = 59) compared to the control group (n = 61). No effects have been observed on pulmonary function parameters and mental health-related quality of life41. Similarly, the observational study by Kortianou et al. (n = 22), investigated a combination of an unsupervised exercise program (including strength and endurance training) with one-hour supervised tele-rehabilitation sessions every 10 days. This approach resulted in improvements in analyzed psychological parameters, specifically anxiety, depression, and quality of life, but showed limited effects in most physical performance test parameters42which is consistent with findings from the present study.

In the here described intervention, no objective intensity measures during the training were employed. Participants were instructed to self-monitor their training intensity exclusively using Borg’s CR10 scale. Although rating of perceived exertion (RPE) is a widely used tool to gauge training intensity, its accurate application and interpretation require practice and experience. Therefore, the utilization of a RPE scale among inexperienced individuals may have therefore compromised the quality and consistency of training, particularly during interval training sessions, and could have led to inefficient training outcomes33.

In summary, unsupervised aerobic exercise interventions for individuals with post-COVID-19 condition must ensure a sufficient training load without provoking overstress and exhaustion. Ensuring high compliance in unsupervised interventions is essential, yet appears to be more achievable in supervised interventions. Further high-quality evidence is required to evaluate the efficacy of unsupervised aerobic exercise training programs in individuals with persistent symptoms after SARS-CoV-2 infection. Further studies should place particular emphasis on monitoring both training load and participant compliance.

This present study, designed as a pilot study to explore the adherence to an unsupervised aerobic exercise training program for participants with persistent symptoms > 12 months after SARS-CoV-2 infection, includes a limited sample size. Given the broad range of health-related parameters assessed to generate hypotheses for future studies, the statistical power is insufficient to precisely estimate small to moderate group differences. Additionally, a higher dropout rate among female participants in the intervention group leads to a slight difference in sex distribution after randomization, which may affect the comparability of groups.

Another limitation is the absence of an objective measurement tool for the adherence to the predefined exercise intensity in the unsupervised training setting. Other feasibility domains (e.g., recruitment rate, retention, outcome completion, tolerability) were not comprehensively evaluated. There may have been potential isomorphism due to the use of single questions to measure aspects of quality of life in the present analysis. For some participants, the interval between baseline and follow-up examinations exceeds 12 weeks, potentially influencing the consistency and comparability of results. This extended interval is primarily due to acute illness, injuries, or participants’ difficulties in scheduling the follow-up (T1) assessment in time.

This study indicates limited feasibility of a 12-week unsupervised aerobic exercise training intervention for participants with persistent symptoms of moderate severity > 12 months after infection with SARS-CoV-2, as a large proportion of participants in the intervention group did not strictly adhere to the training plan. As a result, statements about the efficacy are restricted. The recommendation for an unsupervised aerobic exercise training intervention in this population is therefore limited. Nonetheless, the present findings offer important insights to inform the design of future studies investigating the efficacy of unsupervised aerobic exercise training programs in individuals with persistent symptoms after SARS-CoV-2 infection.