Introduction

Understanding and predicting the trajectory of infectious diseases is important in planning an effective public health response. Reported case data depend heavily on testing modalities and practices which typically change over time, resulting in considerable uncertainty in the case ascertainment rate (CAR; the fraction of infections that are officially reported). During the COVID-19 pandemic, many countries relied primarily on symptom-based testing programs to inform situational awareness and public health responses. In Aotearoa New Zealand, the CAR for COVID-19 has been influenced by factors such as access to testing, a shift from healthcare worker-administered polymerase chain reaction (PCR) tests to self-administered rapid antigen tests (RATs), reduction in rates of symptomatic and severe disease due to rising population immunity, relaxation of testing requirements and recommendations, and/or lack of perceived need to test or ‘pandemic fatigue’1,2,3. As a result, over time, officially reported cases of COVID-19 have become a less reliable measure of levels of SARS-CoV-2 infection.

Data on hospital admissions and deaths are more consistent and are less affected by testing practices and behavioural change than reported cases but are subject to additional delays4 that limit their usefulness for understanding disease dynamics. Infection prevalence surveys5 that aim to regularly test a representative sample of the population are the gold standard for tracking the spread of infectious disease, but these surveys are resource-intensive, making them harder to justify as countries move out of the acute phase of the pandemic. The UK was the only country to implement regular representative national SARS-CoV-2 prevalence surveys6,7, and there are no current plans for similar surveys in New Zealand.

Wastewater surveillance, where levels of SARS-CoV-2 RNA in wastewater samples are measured, can provide additional data on the prevalence of the virus that are unaffected by individual testing and self-reporting behaviours. Wastewater surveillance (also known as wastewater-based epidemiology or WBE) also has the potential to contribute to an integrated global network for disease surveillance8,9,10. These data, however, can be highly variable and subject to other biases, such as rainwater dilution, sampling methodologies, and changing locations of selected sampling sites. To realise this potential, appropriate models and analytical tools are needed to deliver epidemiological insights from raw data.

Two previous studies have presented novel methodologies for the real-time estimation of the effective reproduction number using wastewater data11,12, while others have leveraged or extended these methods13,14,15,16. One study used reported cases to estimate the reproduction number and then fitted a model to estimate this quantity from wastewater data17. Another study used wastewater data to fit a mathematical model of multiple viral strains18 from which estimates of the reproduction number can be derived. Other studies have analysed wastewater data but did not use it to estimate the reproduction number19,20. Only11 presented a model for simultaneously considering clinical and wastewater data, however they assume a fixed ascertainment rate. No previous work has combined wastewater-based epidemiology with reported cases to infer changes in case ascertainment over time.

Semi-mechanistic models based on the renewal equation are a popular method for epidemic forecasting and estimation of the instantaneous reproduction number21,22,23. Such methods are robust to constant under-ascertainment of cases, but may be biased by rapid changes in CAR and cannot provide any information about the total number of infections. In this paper, we extend the renewal equation framework21,22,23 for reproduction number estimation to incorporate wastewater time-series data. The model treats the instantaneous reproduction number and CAR as hidden states and reported cases and quantity of viral RNA in wastewater as observed states. We use a sequential Monte Carlo approach to infer the hidden states. We apply the model to national data from Aotearoa New Zealand on reported COVID-19 cases and the average number of SARS-CoV-2 genome copies per person per day measured in municipal wastewater samples between January 2022 and March 2023. Because the relationship between infections and wastewater concentration is only determined in the model up to an overall scaling constant, it cannot be used to infer the absolute CAR but can be used to estimate relative changes in case ascertainment over time. The model is designed to be regularly updated as new data become available, producing real-time estimates of the effective reproduction number and relative change in CAR. The model has been used to support situational awareness via regular reports to the New Zealand Ministry of Health from November 2022 to date.

From March 2020 until December 2021 New Zealand used strict border controls and intermittent non-pharmaceutical interventions to suppress and eliminate transmission of SARS-CoV-2. By the beginning of 2022, there had been a cumulative total of around three confirmed cases of COVID-19 per 1000 people and around 90% of the population over 12 years old had received at least two doses of the Pfizer-BioNTech vaccine. From October 2021, interventions were progressively eased, and in January 2022, the B.1.1.529 (Omicron) variant began to spread in the community, causing the first large wave of infection. Since then community transmission has been sustained, with multiple further waves of infection being driven by various Omicron subvariants. Between 1 January 2022 and 31 March 2023, there was a cumulative total of around 440 confirmed cases per 1,000 people, most of which were from self-administered RATs. During this period, SARS-CoV-2 concentration was regularly measured at various wastewater treatment plants, providing an additional data source on changes in community prevalence over time.

We model the epidemic dynamics and the observed case and wastewater data at the national level, aggregating over New Zealand’s population of 5.1 million and ignoring regional variations. This is similar to other studies that have aggregated regional case and/or wastewater data to produce national-level estimates in countries with a comparable population size24,25,26. Our methodology could, in principle, be applied at a finer geographical scale, although this would come at the cost of higher levels of noise.

Methods

Data

National daily reported cases of COVID-19 were obtained from the New Zealand Ministry of Health27. Until February 2022, these cases were diagnosed solely by healthcare-administered PCR testing. From February 2022, in response to the rapid increase in reported cases, RATs were widely distributed. Since then, the vast majority of reported cases have been from self-administered RATs, with results reported via an online portal. Hence, data on the number of tests conducted are not available. Reported cases are shown in Fig. 1. As these data exhibit a clear day-of-the-week effect, we remove the weekly trend before fitting the model (see Supplementary Material Section 1.2 for details).

Fig. 1: Reported daily cases and wastewater data between 1 January 2022 and 31 March 2023 in Aotearoa New Zealand.
Fig. 1: Reported daily cases and wastewater data between 1 January 2022 and 31 March 2023 in Aotearoa New Zealand.
Full size image

a Reported daily cases of COVID-19. The black line shows the adjusted case series with the multiplicative day-of-the-week effect removed (see Supplementary Material Section 1.2). b SARS-CoV-2 genome copies per person per day in sampled wastewater. The two outliers in wastewater data arise from estimates of a high wastewater flow rate in Wellington following high rainfall. Since rainfall is a source of noise in wastewater sampling, we retain these samples in our analysis. c Proportion of the total population covered by sampled wastewater catchments. Reported case data were obtained from the New Zealand Ministry of Health27 and wastewater data were obtained from ESR28.

SARS-CoV-2 concentration data from wastewater samples tested by the Institute for Environmental Science and Research (ESR) were used for this study28. Wastewater samples were collected every week at municipal wastewater treatment plants located throughout the country, serving communities with populations ranging from 400 to over 500,000 people. Typically 70–90% of the national population connected to reticulated wastewater was covered by wastewater sampling in any given week (60–124 sites, usually sampled twice per week). Each site-level measurement was normalised to provide an estimate of the number of genome copies per person per day for that site (see Supplementary Material Section 1.1). Typically multiple sites were sampled per day and, for each day that had at least one sample, we calculated the catchment-population-weighted average of the genome copies per person (see Fig. 1). Because we do not attempt to model regional variations, we assumed this provided a series of representative observations of the average concentration of genomic material in the national wastewater.

Hidden state model

We construct a state-space model (Fig. 2) consisting of time-varying hidden states (the instantaneous reproduction number Rt, daily case ascertainment rate CARt, and daily infection incidence It) and time-varying observed states (daily reported cases of COVID-19 Ct and daily wastewater observations Wt)29. We use subscript s:t to refer to all values between day s and t inclusive.

Fig. 2: Diagram of the state-space model.
Fig. 2: Diagram of the state-space model.
Full size image

Schematic of the model showing the dependency between hidden-states (dashed circles) and the observed data (solid circles). Rt is the instantaneous reproduction number on day t, CARt is the case ascertainment rate on day t, It is the number of new infections on day t, Ct is the number of reported cases on day t, and Wt is the observed wastewater, measured as the total genome copies per person per day for the sites that were sampled on day t. I1:t denotes the set of states \(\left\{{I}_{1},{I}_{2},\ldots ,{I}_{t}\right\}\). In practice, the current infections It, reported cases Ct and wastewater Wt depend only on recent values of It as specified by the generation interval distribution, the infection-to-reporting distribution, and infection-to-shedding distribution, respectively (see Methods).

We assume the hidden states Rt and CARt follow independent Gaussian random walks, encoding the fact we expect them to vary continuously over time. We also assume that the hidden state It follows a Poisson renewal process, a simple epidemic model commonly used when estimating Rt21. Thus our state-space transitions are governed by:

$$({R}_{t}| {R}_{t-1}) \sim {N}_{(0,\infty )}({R}_{t-1},{\sigma }_{R}{R}_{t-1})$$
(1)
$$(CA{R}_{t}| CA{R}_{t-1}) \sim {N}_{(0,1)}(CA{R}_{t-1},{\sigma }_{CAR})$$
(2)
$$({I}_{t}| {R}_{t},{I}_{1:t-1}) \sim Poisson\left({R}_{t}\sum\limits_{u=1}^{t-1}{g}_{u}{I}_{t-u}\right)$$
(3)

Parameters σR and σCAR determine how quickly Rt and CARt vary. The standard deviation of the transition distribution for Rt → Rt+1 is given by σRRt, which means that Rt varies more rapidly at larger values. The distribution for Rt was truncated on (0, ) and for CARt on (0, 1). Finally, gu is the pre-determined generation time distribution, describing the proportion of transmission events that occur u days after infection (see Supplementary Material Section 2.7).

We assume that the expected number of reported cases \({\mu }_{t}^{c}\) at time t is equal to CARt multiplied by the convolution of past infections with the infection-to-reporting distribution Lu:

$${\mu }_{t}^{c}=CA{R}_{t}\sum\limits_{u=1}^{t}{I}_{t-u}{L}_{u}$$
(4)

Similarly, we assume that the expected number of genome copies \({\mu }_{t}^{w}\) detected per person at time t is equal to the convolution of past infections with the infection-to-shedding distribution ωu, multiplied by a fixed parameter α representing the average total detectable genome copies shed into the wastewater by an infectious individual, divided by total national population size N:

$${\mu }_{t}^{w}=\frac{\alpha }{N}\sum\limits_{u=1}^{t}{I}_{t-u}{\omega }_{u}$$
(5)

We model reported cases using a negative binomial distribution:

$$({C}_{t}| CA{R}_{t},{I}_{1:t}) \sim NegBin\left(r={k}_{c},p=\frac{{k}_{c}}{{k}_{c}+{\mu }_{t}^{c}}\right)$$
(6)

which has mean \({\mu }_{t}^{c}\) and variance \({\mu }_{t}^{c}\left(1+\frac{{\mu }_{t}^{c}}{{k}_{c}}\right)\). A negative binomial distribution is used to account for noise in the observations beyond that predicted by a binomial distribution. This is a common choice in other methods of reproduction number estimation23,30.

The observed wastewater data Wt is the total genome copies per person from the wastewater sites sampled on day t. We model this using a shape-scale gamma distribution:

$$({W}_{t}| {I}_{1:t}) \sim \left\{\begin{array}{ll}\Gamma \left({k}_{w}{{{{{{{{\rm{pop}}}}}}}}}_{t},\frac{{\mu }_{t}^{w}}{{k}_{w}{{{{{{{{\rm{pop}}}}}}}}}_{t}}\right)\quad &\,{{\mbox{if}}}\,\,\,{{{{{{{{\rm{pop}}}}}}}}}_{t} > 0\hfill\\ {{{{{{{\mathcal{I}}}}}}}}({W}_{t}=0)\quad\hfill &\,{{\mbox{if}}}\,\,\,{{{{{{{{\rm{pop}}}}}}}}}_{t}=0\end{array}\right.$$
(7)

which has mean \({\mu }_{t}^{w}\) and variance \(\frac{{({\mu }_{t}^{w})}^{2}}{{k}_{w}{{{{{{{{\rm{pop}}}}}}}}}_{t}}\). This assumes that the observed daily data are independent draws from the national distribution, which may not hold if there are regional differences between the subsets of sites that are sampled on different days. In practice, any such differences will be absorbed into the variance of the daily observation distribution via fitting of the dispersion parameter kw. Since we marginalise out the effect of this parameter when presenting results, the increased uncertainty associated with regional variability is propagated through to the credible intervals. The variable popt refers to the total population in the catchment areas of the sampled wastewater sites on day t. Setting the variance of the observation distribution to be inversely proportional to popt allows the model to account for increased variability around the national mean on days when fewer or smaller sites were sampled. \({{{{{{{\mathcal{I}}}}}}}}\) is the indicator function, so on days when no sites were sampled, the probability of observing no wastewater samples is set to 1, and the model fits to case data alone.

Consistent with previous models31,32, this formulation assumes that the expected population shedding rate is proportional to the number of infected individuals, with observations drawn from a distribution around this mean. We used a gamma distribution, which is a reasonably flexible choice for a non-negative continuous random variable. However other distributions could be considered, such as a Weibull or log-normal.

In the absence of additional information, we are unable to estimate α, which is proportional to the average total genome copies shed by an infected individual over the course of their infection. This means we are unable to estimate the absolute value of CARt. Instead, we run the model with a range of different values for α, and estimate the change in CARt relative to its initial value. This additionally requires the assumption that α is constant over time, which is unlikely to be true in general and is a key limitation of our model (see Discussion).

In practice, the range of values of α that we used (see Table 1) was chosen by calibrating model output for the number of infections with external sources of information. Firstly, we compared model output to the number of cases in a cohort of around 20,000 border workers who were tested weekly between January and July 202233. Secondly, around 40% of all 20–25-year-olds (an age group unlikely to have a higher CAR than older adults) reported a case of COVID-19 in the 6 months from 1 February to 31 July 202227. This suggests that the overall CAR for this period was likely to be at least 0.4, which translates to an approximate upper bound of 4 million for the total cumulative number of infections up to 31 July 2022. Neither of these observations definitively determines the number of infections as they are subject to approximation, bias and uncertainty, but they nevertheless serve to bracket the likely range of values for the parameter α.

Table 1 Parameter values used in the model

The infection-to-reporting and infection-to-shedding distributions are calculated as the convolution of the incubation period distribution with the onset-to-reporting and onset-to-shedding distribution respectively. The incubation period is modelled as a Weibull distribution with a mean of 2.9 days and a standard deviation of 2.0 days34. The onset-to-reporting distribution is estimated empirically from New Zealand case data extracted on 16 September 2022, representing over 1.2 million cases, and has a mean of 1.8 days and a standard deviation of 1.8 days. The onset-to-shedding distribution comes from ref. 35 and has a mean of 0.7 days and a standard deviation of 2.6 days. The resulting infection-to-reporting distribution has a mean of 5.8 days and a standard deviation of 2.6, and the resulting infection-to-shedding distribution has a mean of 5.2 days and a standard deviation of 2.9 days (see Supplementary Fig. 1).

The model is solved using a bootstrap filter36 with fixed-lag resampling. This produces estimates for the marginal posterior distribution of the hidden states at each time step. The random walk step variance parameters (σR and σCAR) and observation variance parameters (kc and kw) are estimated using a particle marginal Metropolis-Hastings Markov chain Monte Carlo method. We use an uninformative uniform prior distributions for these parameters, with the exception of σCAR, where we use an informative prior distribution to ensure an appropriate level of smoothness in our estimates of CARt. Different parameter values are fitted in three-month blocks to allow for some variation over time. See Supplementary Material Section 2 for further details of the numerical method.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Results

Reproduction number, relative case ascertainment, and infection incidence

The estimated value of the reproduction number Rt (Fig. 3a) increases from around 1 at the beginning of 2022 to a peak of 2.46 (95% CrI 2.04, 3.20) on 18 February 2022 (95% CrI 10 Feb, 23 Feb), corresponding to the sharp increase in cases seen during the first Omicron wave, which was a mixture of the BA.1 and BA.2 variants37. The estimated value of Rt drops below 1 on 1 March 2022 (95% CrI 25 Feb, 5 Mar) and infection incidence peaked on 28 February 2022 (95% CrI 23 Feb, 7 Mar), suggesting this is when the wave peaked.

Fig. 3: Results for Aotearoa New Zealand data from 1 January 2022 to 31 March 2023.
Fig. 3: Results for Aotearoa New Zealand data from 1 January 2022 to 31 March 2023.
Full size image

a Instantaneous reproduction number Rt, b relative case ascertainment rate (compared to the central estimate on 1 April 2022), c wastewater data Wt measured in genome copies per person per day and d reported cases Ct. Results assume the average total shedding per infection does not vary over time (α = 3 × 109). Solid lines present central estimates. Shaded regions show 95% credible intervals on the value of the hidden states (subplots a and b) and 95% credible intervals on the expected reported cases and wastewater data (darker shaded regions in subplots c and d) and 95% credible intervals on the prediction distribution for wastewater data and reported cases (lighter shaded regions in subplots c and d). Black dots show the observed data.

The estimated CAR (Fig. 3b) increases rapidly between mid-February and mid-March 2022. RATs became widely available for the first time in the last week of February 2022. This likely led to a substantial increase in case ascertainment as the testing system, which had previously relied solely on laboratory-processed PCR tests, had become overwhelmed3. The estimated CAR approximately halves between April and July 2022, when a second wave of infection caused by the BA.5 Omicron subvariant33,37 occurred. This second wave was visible in both reported cases and wastewater sampling, with estimated peak infections occurring on 7 July 2022 (95% CrI 3 Jul, 12 Jul). The estimated CAR increases somewhat between mid-2022 and early 2023, with a noticeable dip in December 2022, possibly reflecting reduced testing during the Christmas and summer school holiday period (from mid-December to late-January/early-February). Alternatively, the estimated increase in CAR from mid-2022 could be explained by a decrease in the average genome copies shed by an infected individual α, although without further information we are unable to discern changes in α. Overall, the model provides a reasonably good fit to the observed data on cases and wastewater (Fig. 3c, d).

Figure 4a, b shows the estimated daily incidence and cumulative infections for three values of α, corresponding to estimated CAR values on 1 April 2022 of 0.42 (95% CrI 0.35, 0.50), 0.61 (95% CrI. 0.51, 0.71), and 0.80 (95% CrI. 0.67, 0.93), for α = 2 × 109, 3 × 109 and 4 × 109, respectively. For comparison, the graphs also show the number of cases per capita in a cohort of approximately 20,000 border workers who were tested weekly between January and July 202233, scaled according to population size. These data were used to help inform the range of values of α selected (see Methods).

Fig. 4: Sensitivity of results to the choice of α.
Fig. 4: Sensitivity of results to the choice of α.
Full size image

Estimated a daily infections It, b cumulative infections \(\mathop{\sum }_{s = 0}^{t}{I}_{s}\), c case ascertainment rate CARt, d relative case ascertainment rate (compared to the central estimate on 1 April 2022) and e instantaneous reproduction number, Rt. Results are presented for three values of α: 2 × 109, 3 × 109 and 4 × 109. Solid lines show central estimates and coloured regions are the 95% CrIs. Estimates and credible intervals on cumulative infections are calculated by taking cumulative sums of the estimates and credible intervals in panel (a). Black dots in panels (a, b) show the number of per capita cases in a cohort of regularly tested border workers, scaled according to population size. The horizontal dashed black lie in panel (b) shows the New Zealand population at the end of 2022 (5.15 million people)54. While changing α results in different estimates of infections and absolute CAR, the relative CAR and reproduction number estimates are robust to different values, provided α remains relatively constant.

Whilst peak reported cases (adjusted for the day-of-the-week effect) in the second wave were only 49% of the peak in the first wave (10,879 vs 22,038 respectively), under the assumption of constant α, the central estimate from the model suggests that true infections peaked at approximately 78% of the peak of the initial wave (Fig. 4a). Figure 4c–e shows the estimated absolute and relative CAR and R. These panels show that, while we are uncertain about the absolute level of infections and CAR, the relative CAR and reproduction number estimates are robust to reasonable choices for (constant) α.

Fitting the model to case data alone instead of cases and wastewater (see Supplementary Fig. 7) produced qualitatively similar estimates of Rt, but with greater temporal fluctuations. Fitting the model to wastewater data alone led to substantially wider credible intervals, although the overall trend was similar. Estimates of the relative CAR are only possible when fitting to case and wastewater data simultaneously.

Parameter estimates

The estimated standard deviation σR of the random walk on Rt was greatest in the first time period (1 Jan – 31 Mar 2022)—see Table 2. This is unsurprising as it coincided with the rapid increase and then decrease in incidence associated with the first Omicron wave. σR decreased in the second period (1 Apr–30 Jun 2022) and then remained relatively constant throughout the remaining periods (1 Jul 2022–31 Mar 2023). The estimated standard deviation σCAR of the random walk on CARt was also estimated to be greatest in the first time period, although this is primarily because we applied a prior distribution with a higher mean in this period (see Supplementary Material Section 2.5).

Table 2 Central estimates and 95% credible intervals for estimated model parameters in each time period

The estimated variance parameters, kc and kw, for cases and wastewater observations, were lowest in the first time period (1 Jan 2022–31 Mar 2022). This implies there is more variability in the data that is not explained by the model in this time period, possibly as a consequence of the sharper variations in incidence compared to the later time periods. A less consistent weekly pattern in reported cases during the first time period, and higher levels of noise in wastewater observations at the low concentrations seen at the beginning of 2022, could also be contributing factors.

Discussion

Wastewater-based epidemiology has been used globally for COVID-19 surveillance and has been shown to be a useful public health tool for policy and public health responses38. We present a semi-mechanistic model that combines reported cases with wastewater data to estimate the time-varying reproduction number and CAR. This work demonstrates the value of wastewater-based epidemiology and how the additional data that it provides can be combined with traditional monitoring (e.g. reported cases) to learn more about the state of an epidemic, disease dynamics, and the true number of infections in the community. This provides useful information to inform the public health response.

To make reliable estimates of the state of the epidemic from reported cases, it is essential to understand how case ascertainment changes with time. For example, are there fewer cases because there are fewer infections or because fewer people are reporting? We apply our model to national data from Aotearoa New Zealand and derive insights into changes in case ascertainment that would not be possible using case data alone. Reported cases during the second wave in July 2022 were substantially lower than in the first wave in February and March 2022. However, the model infers that there was a substantial drop in case ascertainment between these waves, and the true number of infections was likely more similar in each wave. The reduced CAR during the second and subsequent waves may have been due to a higher number of reinfections with individuals displaying fewer symptoms or due to “pandemic fatigue” and reduced compliance with public health measures, including testing. This type of insight would not be possible without regular wastewater surveillance data and without a robust analytical framework in which to integrate these data with traditional epidemiological data streams.

We apply our model to the first period of widespread community transmission of SARS-CoV-2 in New Zealand. During this time, rapid antigen tests were freely available to everyone, there was a requirement to report positive results, and a mandatory isolation period for cases with financial support via employers. Partly as a result of these factors, the CAR, while lower than in the previous elimination phase, was still reasonably high. The mandatory isolation period was removed in September 2023, which led to a substantial drop in case ascertainment. For the datasets we consider, similar (albeit noisier) estimates for the reproduction number could be obtained from case data alone. However, in a context where case ascertainment is low and/or unrepresentative, wastewater data are likely to add even greater value compared to using reported cases. In contrast, in a low-prevalence context (e.g., pre-Omicron in New Zealand), the applicability of the method would be constrained by the amount of noise in the wastewater data. In this situation, wastewater surveillance may be better used for presence/absence monitoring, for example, as an early warning system for the presence of infection in specific catchments, as opposed to quantitative estimation39.

Strengths of our model include the fact that it has relatively minimal data requirements, requiring only time series for reported cases and wastewater concentrations. The model can be fitted to datasets in which different sites are sampled on different days and some days have no observed data. This means that it could be readily applied in other jurisdictions with wastewater surveillance programs, either for SARS-CoV-2 or other pathogens such as influenza viruses38,40. It is a relatively simple model with minimal mechanistic assumptions and parsimonious parameterisation. This avoids the need for assumptions about time-varying contact patterns, transmission rates, and the level of prior immunity that are required by more complex mechanistic models. The model we present here was operationalised by ESR in late 2022, and results for Rt and relative CAR are regularly provided to the Ministry of Health to inform situational awareness and decision-making.

There are several limitations to this model and the results. We assume that the average number of genome copies shed by an infected individual (represented by the parameter α) was constant between January 2022 and March 2023 and did not depend on the infecting variant or history of prior infection or vaccination. It is possible that some of the inferred changes in CAR may be partly explained by these factors. For example, some of the inferred increase in case ascertainment between October and December 2022 may have been due to decreasing α, caused by a combination of new immune evasive subvariants displacing the previously dominant BA.5 variant41 and/or an increase in the proportion of reinfections or asymptomatic infections27. Although estimates of viral shedding rates per infected individual are available31,32, the value of α may also depend on the physical characteristics of the wastewater collection system, sample collection method, and the method used to quantify the concentration of SARS-CoV-2 RNA in samples. Therefore, α is likely to vary between jurisdictions and will require recalibration using local data.

As we are unable to estimate the true value of α, we are unable to estimate the absolute CAR. Nonetheless, relative CAR is a useful metric and, given an estimated range of values for α, we are able to provide plausible bounds on the total number of infections (Fig. 4).

Wastewater surveillance does not provide any information on how infections are distributed among population groups (e.g. age groups, ethnicity) and biases in self-administered testing mean that case counts are not representative either. This information is important for assessing the clinical burden of disease and addressing health inequities42. Thus, other approaches are needed to determine the distribution of disease burden, such as representative sampling7,43, cohort studies44 or sentinel surveillance45,46. Although wastewater surveillance could, in principle, be used to investigate differences in prevalence and case ascertainment between sites and/or regions, this would require adaptions to our method that are beyond the scope of this study.

As our model is flexible, future work could integrate hospitalisations (such as in ref. 47) and death data. In principle, this could allow the effects of varying CAR and varying rates of shedding per infection to be separated. However, this would additionally require the effects of age, immunity, ethnicity, and other variables on clinical severity to be accounted for.

Although national-level approaches to situational awareness and reproduction number estimation are common25,48,49, particularly in countries such as New Zealand with relatively small population size, this ignores regional variations. Results should, therefore, be interpreted as national averages, which could mask demographic and spatial heterogeneity. Our model could be implemented at a regional level so that local epidemic dynamics can be compared, although this would be subject to increasing levels of noise in the wastewater data at finer spatial scales. This paper has focused on modelling for inference: understanding epidemic dynamics that have already occurred. However, the state-space transition model coupled with the estimated parameters provides a natural method for forecasting23,50. Forecasts generated using this state-space transition model naturally incorporate increasing uncertainty about the future reproduction number and CAR.

While this model focused on COVID-19, there is a wealth of genetic information within municipal wastewater that could also benefit from modelling. The detection and concentration of viral, bacterial and anti-microbial resistance genes within wastewater have the ability to inform public health decision-making in a number of ways, especially as methodology is refined allowing more rapid turnaround times. As many jurisdictions seek to retain the wastewater capabilities they built during the pandemic phase of COVID-19 (and to diversify microbial targets), there is an ‘opportunity springboard’ to build tools that can predict the trajectories and spread of pathogens. Modelling has a key role to play in this journey.