Background

In December 2019, a novel coronavirus (SARS-CoV-2) was identified in Wuhan, China. A few weeks later, in January 2020, the World Health Organization (WHO) confirmed interhuman transmission. Rapidly, the virus spread across the planet, leading to the COVID-19 pandemic. By April 2024, this pandemic had resulted in more than 7 million deaths and 770 million infections worldwide1,2, making it the most significant pandemic since the 1918 Spanish flu3.

Coronaviruses can infect different animals and cause moderate to severe respiratory infections in humans. Although most symptoms resembled those of a typical respiratory disease, including fever, fatigue and cough4,5, a significant proportion of infections progressed to a more severe, even critical, form of the disease, potentially involving dyspnea, acute respiratory distress syndrome and multiple organ failure. In addition to the acute form of the disease, the infection could lead to persistent health problems (e.g., fatigue, shortness of breath, and cognitive problems) known as post-COVID-19 syndrome, also known as long COVID3,6.

The pandemic revealed that individuals who contracted the disease were not permanently immunized and could be reinfected after a relatively short period of 7 to 12 months7,8. This variability in reinfection time can be attributed to individual sensitivities, infection severity, and the evolution of the SARS-CoV-2 virus through various variants. These various mutations generate different immune responses that impact the immunity of individuals who have contracted the disease.

In this context, we aimed to study hybrid immunity, the immunity conferred by a combination of infection, reinfection and vaccination, by identifying and characterizing SARS-CoV-2 reinfection profiles. To specifically tackle this temporal and complex aspect of the hybrid immunity, we proposed to use machine learning techniques on data from Biobanque québécoise de la COVID-19 (BQC19) to group individuals according to their temporal pattern of vaccination, infection, and reinfection only, and then, characterize the groups obtained in terms of sociodemographic and clinical factors to highlight any hidden patterns or characteristics that could lead to similar temporal sequences across our population.

Methods

BQC19 is a multicenter database involving a network of 11 Quebec hospitals with five partner academic institutions and has been described elsewhere9. In brief, this panprovincial initiative collects, stores and shares data and blood samples from COVID-19 patients, both severe and non-severe cases. The biobank contains several datasets about participants’ characteristics, events and certain biological data (e.g., RNA, DNA, serum, plasma and peripheral blood mononuclear cells). It also includes longitudinal follow-up for 24 months following hospitalization (inpatient) or PCR testing (outpatient). This study was approved by Centre universitaire de santé McGill’s ethics committee within the framework of the project Determining the impact of hybrid immunity on the evolving landscape of host responses to SARS-CoV-2 in the Biobanque Québécoise de la COVID-19 (BQC19) and all methods were performed in accordance with the relevant guidelines and regulations. Each BQC19 enrolling site has established a consent process that reflects the BQC19’s standard operating procedures and all the participants provided informed consent before the start of the study. Full details are available in a previously published article9. The source population consisted of 6,272 participants included between March 2020 and August 2023. To be included in the study cohort, participants had to 1) have reached 18 years of age, 2) have a documented primary infection with a date, and 3) have a documented secondary infection (reinfection) with a date. For each individual, the longitudinal data available for the study ranged between two months and three years. As each participant’s follow-up period may vary during the study, the number of events differed from one individual to another.

Data management was performed through an iterative process of data exploration and processing of the highlighted elements. Data exploration included frequency and mode for categorical variables, measures of central tendency (mean and median) and measures of dispersion (minimum, maximum, standard deviation (SD) and variance) for numerical variables. Stratified analyses were also performed (e.g., sex, occupation, etc.) with parametric (t-test or ANOVA) or nonparametric (Wilcoxon or Kruskal‒Wallis) statistical test as required. Post-hoc tests (Tukey’s range test or Dunn’s test), when appropriate, were also conducted. Data management iterations also included standardization of values domain and cross-validation between variables to enable consistency validations between data correlated or linked by context. The various BQC19 data collections were merged to reduce missing values in the main dataset, mostly for event details. Process mining analysis was applied to support visualization of event sequences and flows10.

To achieve the aim of grouping individuals according to their specific event patterns, variables of interest about infection, reinfection and vaccination were used for the clustering (temporal variables). Firstly, as the date of primary infection did not exist in the initial dataset, it was reconstructed from the date of each participant’s first positive PCR test. Secondly, the reinfection variable had to be reconstructed since the dataset contained two variables with conflicting information (missing or different data). The variable was redefined as follows: 1) the value of the two variables when they were identical, 2) the nonmissing value when one of the variables was null, and 3) the earliest of the dates when the two variables did not contain the same value. In some cases (N = 96), more than one reinfection event was documented, and only the first documented reinfection for each individual was systematically retained. When subsequent observations were not explicitly identified as longitudinal reinfection follow-ups, they were compared with the other available dates and were not considered as a new reinfection if the date of a subsequent observation was within 14 days of the previous reinfection. This period corresponds to the internal delay set by BQC19 to consider a reinfection. Finally, vaccination dates were directly extracted from the dataset.

Participants were grouped using an agglomerative hierarchical clustering analysis based on the dissimilarity between their temporal trajectories. Specifically, we first generated a dissimilarity matrix using Dynamic Time Warping (DTW). To do so, we applied DTW on multidimensional time series constructed using the temporal variables of interest (infection, vaccinations and reinfections). Each patient was therefore represented by a matrix containing a time series for each event of interest. These individual matrices served as inputs to the DTW algorithm. A visual representation of the input data and the analytical sequence can be found as Supplementary Figure S1 online. Dynamic time warping was chosen as it was the most suitable method for the type of data in this study, especially as it takes into account temporal deformations in order to align time sequences11.

Hierarchical clustering was then performed using Ward’s minimum variance method on the DTW-based dissimilarity matrix obtained to group participants by minimizing within-cluster variance in their time-aligned trajectories. The number of groups was determined following the average silhouette statistic12. Multiple numbers of groups were compared, and the optimal number of groups was selected based on the highest average silhouette statistic, while ensuring that each group contained enough individuals to allow meaningful interpretation. For each group, a process map has been generated to support the visualization of their specific temporal sequence of events (infections, vaccinations and reinfections)13.

To describe the different groups obtained from the cluster analysis based on the temporal sequence and highlight potential characteristics leading to similar sequences, we used variables related to sociodemographic characteristics (e.g., age, sex at birth, BMI), participants’ condition, habits and environment (e.g., smoking and drug use status, occupation, household information) and the context of the contagion (e.g., number of reinfections, number of vaccines received when infection or reinfection occurs, wave of the pandemic, predominant variant). However, as variant sequencing data were only available for a small proportion of the cohort (n = 20), the actual variant was consequently not included in the variables of interest. We instead used waves defined according to an internal BQC19 timeline to ensure comparability with other related work. This timeline was visually compared to publicly available data (e.g., Our World In Data14,15), and no substantial differences were observed (see Supplementary Figure S2 online). Finally, delays between each pair of events were calculated and used to describe each cluster. All statistical analyses previously mentioned were used to describe each variable stratified by cluster.

Data cleaning and analyses were performed using R, version 4.3.016 with Rstudio, version 2023.12.1.40217 supported by package tidyverse, version 2.0.018. Hierarchical clustering and dynamic time warping were performed using packages dtw, version 1.23–113, dwtclust, version 5.5.1219 and proxy, version 0.4–2720. Process mining for events visualization was performed using package bupaR, version 0.5.410.

Results

To achieve the objective of grouping individuals according to their pattern of infection, reinfection and vaccination sequences, 318 participants were included in the study based on the previously defined inclusion criteria (see Supplementary Figure S3 online). Table 1 reports characteristics of individuals in the study cohort and Table 2 shows characteristics of their COVID episodes. Among them, 230 were women (72.3%), and 141 were healthcare workers (44.3%). The average age was 43 years (SD 13.8), and a total of 31 participants were reinfected twice (9.3%), including 6 who were reinfected three times (1.9%). The average dose of vaccine at primary infection was 1.08 (SD 1.30), and the average doses of vaccine at reinfection were 2.36 (SD 0.876), 2.29 (SD 0.864), and 2.50 (SD 0.548).

Table 1 Sociodemographic characteristics of the study population between 2020/03 and 2023/08 and clusters.
Table 2 COVID characteristics of the study population between 2020/03 and 2023/08 and clusters.

Figure 1 presents the global sequence of events identified for the cohort. Boxes represent events while the edges represent the temporal sequence between the events. All boxes and edges are completed with the relative frequency of each event or transition, as well as the median time between each transition. We used the sum of the medians of the various transitions as an approximation of sequence duration for illustrative purposes. While the medians are not addable due to their statistical properties and do not represent the actual median of the complete sequence, they can support the identification of overall trends. To avoid confusion, this approximation method will hereafter be referred to as summed medians. Mapping revealed that 56.3% of individuals in the cohort began their sequence with the primary infection, while 43.7% started with a first dose of vaccine. Among all those who received the first dose, regardless of the previous event, 83.3% were subsequently vaccinated a second time, within a median of 87 days. Among those who received a second dose, regardless of the previous trajectory, 50.3% then received a third vaccine within a median of 191 days, while 16% contracted their first infection within a median of 183 days.

Fig. 1
Fig. 1
Full size image

Sequencing map of infection, reinfection and vaccination events in the cohort. The map was generated using process mining techniques. Boxes indicate main events and their relative frequency through the whole cohort. Edges represent the consecutive sequence of events and display the median time between each event as well as the relative frequency of participants following a specific sequence of events.

Sociodemographic, participant’s condition, habits and environment characteristics

To group participants based on their infection (I), vaccination (V), and reinfection (R) patterns, we used event sequences as time series in a cluster analysis. Among the different numbers of clusters tested (2 to 6), both the five- and six-cluster solutions yielded the highest average silhouette coefficient (0.65), compared to 0.53, 0.58, and 0.62 for the two-, three-, and four-clusters solutions, respectively. However, the five-cluster solution was retained as it provided a more balanced partition, avoiding the creation of clusters with very small sample sizes. The first cluster included 138 participants (43.4%), while others respectively included 42 (13.2%), 11 (3.5%), 51 (16%) and 76 individuals (23.9%). Although small in size, cluster 3 was retained in the analysis due to its distinct temporal pattern. There was no significant difference between clusters in terms of age (\(\text{p} = \text{0.137}\)) or body mass index (BMI) (\(\text{p} = \text{0.545}\)), and the proportion of males and females in each cluster was similar to the cohort proportion. However, in the first three clusters, the ratio of health workers was lower (between 21.4% and 36.4%) than the cohort proportion (44.3%), whereas in the remaining clusters, the ratio was reversed, with 51% and 76.3%. The detailed characteristics of each group are shown in Tables 1 and 2.

Temporal description

The following section describes each group in terms of temporal sequence and presents illustrations of these sequences (Fig. 2A to Fig. 2E).

Fig. 2
Fig. 2Fig. 2
Full size image

Sequences and mapping of interest events, grouped by cluster. where A) is cluster 1, B) cluster 2, C) cluster 3, D) cluster 4 and E) cluster 5. Events mappings present the relative frequency of the trajectory (%) and the median in days between each event (I = Infection, V1 = 1st dose of vaccine, V2 = 2nd dose of vaccine, V3 = 3rd dose of vaccine, R1 = 1st reinfection, R2 = 2nd reinfection, R3 = 3rd reinfection).

Cluster 1. Participants in the first cluster were mostly infected in the first three waves of the pandemic (Table 2). As shown in Fig. 2A, which presents the event mapping for this group, individuals were first infected before receiving two doses of vaccine within a summed medians of 225.5 days (median I-V1 139.5; median V1-V2 86). The sequence then split, with 44.2% of the group who were reinfected, while the remaining received their third dose of vaccine, both events within similar median timescales (185 and 181 days, respectively). Patients who received this last dose of vaccine took a median 118 days before being reinfected. Reinfection occurred in waves five, six and seven of the pandemic.

Cluster 2. Similarly to cluster 1, most individuals in this group were infected in the first waves (66.7%), although a few individuals contracted their primary infection later in the pandemic (Table 2). Almost the entire group (97.6%) was infected before a first dose of vaccine (Fig. 2B). The sequence of the entire group converged toward a reinfection, either following first dose (47.6%, summed medians 380.5 days (172; 208.8) from primary infection), or following the initial infection (52.4%, median 270 days). Some individuals had contracted the disease twice without vaccine in their sequences, which means that this group has the particularity of containing patients with natural immunity instead of hybrid immunity (N = 21). Reinfection occurred in waves five, six and seven but mainly in waves five and six (61.9%).

Cluster 3. In this cluster, individuals were mainly infected in the second and third waves of the pandemic (81.9%), with none having been infected in the first wave (Table 2). Thus, on average, they were infected slightly later than the previous two groups. This is interesting given that, as Fig. 2C shows, the entire group started their event sequence with the first vaccine. However, the delay between this vaccine and primary infection is relatively short, with a mean and median delays of 49.5 and 16 days. Postinfection, individuals in the group all received a second vaccine before they split into two subsequences: those who were reinfected (36.4%) within a median of 215.5 days and those who received a third dose of vaccine (63.6%) within a median of 186 days. Those individuals took a median delay of 162 additional days before their reinfection. Within this group, 45.5% of individuals were reinfected in the fifth wave, and the remaining 54.5% were reinfected after the sixth wave, indicating that no reinfection occurred in wave 6 for this group.

Cluster 4. The fourth cluster was mainly infected in the fourth (31.4%) and fifth waves (58.8%), positioning the group, in terms of timeline of infection, between cluster 3 and cluster 5 (Table 2). For the entire group, the sequence of events began with two vaccines, as shown in Fig. 2D. For a majority (98%), the subsequent event is the primary infection. Once this event is reached, the group separates into two distinct trajectories: toward the third vaccine dose (37.3%, median delay 95 days) or the reinfection (62.8%, median delay 226 days). Individuals who received the 3rd dose after their primary infection were reinfected within a median of 198 days. In terms of the median delay from first vaccination, individuals with the V1-V2-I-R sequence had a summed medians delay of 479 days (71; 182; 226 days), compared with patients with the V1-V2-I-V3-R sequence, who had a summed medians delay of 546 days (71; 182; 95;198 days). Reinfections in this group occurred mainly from the seventh wave onward. It may be noted that one individual in this group presented the main temporal sequence of group 5 (V1-V2-V3-I-R1). Analysis revealed that this individual, even if it has the same sequence, received a third vaccine only one day before the first infection, making it more likely to belong to group 4 (V1-V2-I-R1).

Cluster 5. The last cluster contained the individuals who were most vaccinated prior to their primary infection. Indeed, 98.7% had received their third dose at the time of initial infection. Reinfection occurred latest among the other four groups, i.e., during waves five and six and mainly from wave seven onward (52.6%). This represents a median delay from infection of 235.5 days.

In addition, we noted interesting differences between the groups. As presented previously, at the time of their primary infection, individuals in the first cluster did not receive a vaccine, individuals in the third had mostly received one dose, individuals in the fourth, two doses (average 2.02; 95% confidence interval CI: 1.98–2.06), and those in the fifth, three doses (2.99; CI: 2.96–3.02). This situation presented a statistical difference between groups (p < 0.001), except for cluster 1, with cluster 2 (p = 0.876), and cluster 3, with cluster 4 (p = 0.232). Considering the doses at first reinfection, there was evidence of statistical difference in the average doses of the vaccine, except for cluster 1 (2.55; CI: 2,46–2,64), with cluster 3 (2.64; CI: 2.30–2.98) and cluster 4 (2.39; CI: 2.25–2.53), and cluster 3, with cluster 4 and cluster 5 (2.99; CI: 2.964–3.016). In clusters containing individuals who were reinfected twice (clusters 1, 2 and 5), there was a statistical difference between doses at second reinfection for clusters 1 and cluster 2 (p < 0.001) and between clusters 2 and cluster 5 (p < 0.001). However, for individuals who were reinfected three times (clusters 1 and 5), there was no evidence of statistical difference between groups.

Follow-up duration

In terms of follow-up duration, an analysis of the distributions revealed certain disparities. Group 1 showed a moderate distribution around the median (729 days) but was distinguished by the presence of outliers, suggesting that some participants in this group had below- and above-average follow-up times. This group was significantly different from groups 2–4-5 (p < 0.001). Group 3 stood out for its good homogeneity, low variability and absence of outliers, suggesting uniform follow-up (interquartile range (IQR) = 190, median 694 days), and had no statistical difference with any of the other groups. Group 2, on the other hand, shows the greatest heterogeneity, with a much wider interquartile range (479 days), suggesting significant differences in follow-up duration between participants in this group. Despite showing similar visual distributions with close medians and moderate variability (median of 562 and 676 days; IQRs of 191 and 151 days), there was a significant difference in follow-up duration between groups 4 and 5 (p < 0.001).

In summary, the cohort participants were grouped into five clusters, and their characterization revealed that the clusters followed a temporal progression according to the infection timing and its positioning across the pandemic waves. Reinfections, on the other hand, occurred from the fifth wave onward. The most highly vaccinated groups appear to have been infected and consequently reinfected later in the pandemic. Some groups featured a greater proportion of healthcare workers, while for others, it was the trajectory and their timeframes that were of interest. There were some disparities in follow-up duration, which need to be considered when drawing conclusions from the results.

Discussion

This project aimed to study hybrid immunity by identifying and characterizing SARS-CoV-2 reinfection profiles. Using machine learning techniques, we grouped individuals from BQC19 according to characteristics leading to similar patterns of vaccination, infection, and reinfection in a five-cluster classification.

The study showed no significant differences between the groups in terms of sociodemographic variables, except for the proportion of healthcare workers. For this variable, groups 4 (51%) and 5 (76.3%) had a higher proportion than the cohort (44.3%). These same groups had a more sustained initial vaccination sequence than the other groups did (2 doses and 3 doses, respectively, before primary infection). This seemed consistent with the vaccination policies in place during the pandemic for this at-risk population in close contact with the virus. The results therefore suggest that these policies had a positive impact, given that, for this group, primary infection occurred later during the pandemic. However, this finding, implying that healthcare workers in the cohort were infected late (59.6% of them), differs from the results of Carazo et al. (2023) in their study about healthcare workers’ protection against Omicron BA.2 reinfection conferred depending on the primary infection variant, where, for around the same period, approximately 20.7% of healthcare workers were infected21. The difference may be explained by the inclusion criteria for the documented dates in our study, which reduced the size of our cohort. This contrasts with their study, which exploited data sources from the Ministry of Health and Social Services that were potentially more exhaustive at this level.

Similarly, it was possible to observe that group 3, which received a first vaccine before being infected, was spared in wave 1. Thus, compared with groups 1 and 2, which received no vaccine prior to infection, group 3’s primary infection occurred later in waves 2 and 3, allowing us to hypothesize that although one dose was missing to achieve so-called complete vaccine immunization, the first vaccine dose may have generated a positive impact by delaying the initial infection. However, this hypothesis must be interpreted with caution, given the small number of individuals in the group.

The first group also presented interesting features in terms of vaccination efficacy. Indeed, following the second vaccine, the trajectory of the individuals split in two, some toward reinfection (median of 185 days after), while the others toward the third vaccine (median of 181 days after). This separation occurred within an almost identical median time, which might suggest that the policy of administering the third dose was relatively synchronized with a weakening of immunity. This timeframe is in line with the results of Asamoah-Boaheng et al. (2023) showing that antibody levels decrease with a half-life of 94 days and plateauing at 294 days22. Although these results relate to mRNA vaccines and vaccine type was not a variable in the present study, the results remain consistent. Also, the 3rd vaccine appeared to have delayed reinfection by 118 days (median) compared with patients who received only 2 doses. This suggests that an earlier 3rd dose could potentially have delayed more reinfections. Group 3 had a similar separation between the reinfection event and the 3rd vaccine. The latter, whose sequence prior to separation was V1-I-V2, compared with the first group’s I-V1-V2, had a greater summed medians time to reinfection (215 vs. 185 days). Similarly, when comparing the median time from second vaccine to reinfection via third vaccine dose, group 3 had a longer summed medians time (348 vs. 299 days). This might suggest that, in terms of hybrid immunity, being infected between two vaccine doses could offer slightly longer-lasting immunity, but the size of the 3rd group makes it difficult to draw such a definitive conclusion.

The study also revealed a group of patients who had not been vaccinated and, consequently, had not achieved hybrid immunity. This same group also contained individuals who had received only one vaccine, implying that they had not fully achieved hybrid immunity, given that full vaccine immunity required 2 doses. Thus, based exclusively on the sequence of events and their temporality, individuals with partial or nonexistent hybrid immunity were grouped together by the clustering algorithm. Despite its interest, this finding needs to be nuanced according to the variance in follow-up duration within the group. In fact, some individuals may simply not have had the follow-up time required to be fully vaccinated. Despite this limitation, it is worth mentioning that the algorithm’s data-driven grouping of these participants supports the finding of Sanchez-de Prada et al. (2024) that there is no significant difference between individuals vaccinated once within five months of infection and those who were not vaccinated at all.

The results showed that vaccination has a positive effect in delaying infection or reinfection. They also showed that the temporality of events greatly influenced the formation of groups by the algorithm, in the sense that primary infections and reinfections are distributed according to a temporal progression, from group 1 (the earliest infections) to group 5 (the latest).

This study has several strengths. First, we used data collected as early as the beginning of the pandemic, which allowed us to use valuable data for this study. The data management process is also a great strength of this study, as substantial work has been performed to consolidate the data, allowing us to increase our sample size. Finally, the use of machine learning made it possible to identify more complex patterns by taking events and their temporality into account, using a data-driven approach. In fact, the method enabled us to consider not only the delay between events but also their chronology, thereby accounting for pandemic waves. Thus, by first forming the groups and then characterizing them using variables that were not used in the clustering process, we were able to highlight elements that were more difficult to identify using conventional methods. Unsupervised techniques reveal interesting avenues for future investigations. To do so, we intend to use existing genomic data in BQC19 to characterize the groups, including the use of random forest to determine the most relevant variables for this purpose. As genomic data were sequenced only for a rather small sample size, it was not possible to include it in this study, as our sample size was already reduced due to inclusion criterion about documented reinfection with a date.

While there are strengths, there are also some limitations to this work. First, this study is based on longitudinal data obtained from individuals who agreed to participate in the BQC19 and its follow-up, which may introduce a selection bias. Also, even though significant attention has been given to data management, the sample size has remained small. This is mostly due to our inclusion criteria, where people need to be reinfected to be included. Second, while the infection and vaccination dates were properly collected within the datasets, we had to establish a strategy to correct the reinfection date, as two variables were present within the same dataset. However, the dates were the same for the vast majority of participants, and the same treatment was applied otherwise, limiting potential biases. In addition, the delay used to define the reinfection in this study differed from the delay found in the literature and could be considered as a limitation. Considering the lack of official consensus in the scientific community regarding this timeframe, the choice of timeframe (14 days) has been made using the threshold used among the BQC19 research community to enable meaningful comparisons. However, even with this short delay, the number of patients concerned was relatively small and, according to the distribution of delays between reinfections, increasing the threshold of consideration would have only a minimal impact on the number of individuals. Also, results showing differences between groups need to be interpreted with caution due to the variance in follow-up time within groups, given that some individuals may simply not have had the follow-up time required to have been fully vaccinated. Further analysis would be relevant to assess the impact of the difference in follow-up time for participants. Similarly, some events may also have been missed if they occurred outside the scope of participating hospitals to BQC19. Additionally, even if it assures consistency with other BQC19 works, using pandemic waves as a proxy for dominant circulating variants in the absence of variant-level data has limitations, given that multiple variants may have co-circulated during the same wave. Due to the role of variants in immune escape risk, the reinfections profiles obtained should be interpreted with caution, as they may thus reflect unmeasured heterogeneity in terms of variants. Finally, the use of summed medians can induce a distortion in the estimation of overall times. However, it was only used to provide an overall idea of the temporality of the sequences and the global trajectory, as we know that it is not the actual median of the whole sequence.

Conclusion

To our knowledge, this is the first study using data from the Biobanque québécoise de la COVID-19 to investigate reinfection patterns and hybrid immunity using a data-driven approach. In addition to highlighting the effectiveness of vaccination policies, it identified, by leveraging machine learning techniques on complex multidimensional time series, distinct groups and COVID-19 patterns of infection, reinfection and vaccination, thus providing interesting insights for further investigation. It also highlights that beyond the sequence of events, the temporal delays between events seem to play an important role in the acquisition of primary and secondary infections. In terms of hybrid immunity, the results of this study suggest that an infection between two vaccines could offer greater immunity. This finding should be treated with caution, however, given the size of the group from which it is drawn and the disparity in follow-up times. In any case, this is an interesting perspective to pursue. The delay between events played a determining role in the formation of the study groups. Consequently, their consideration in the development and adaptation of health policies, particularly regarding vaccine administration and boosters, is necessary. In addition, the study shows that machine learning algorithms represent, for public health practices, an innovative and complementary approach to analyze health data and discover hidden information that can have an impact on public health decisions. These two approaches, which combine delay analysis and machine learning, offer promising perspectives for future work, particularly in preparation for possible pandemics.