Table 4 Summary of model characteristics.

From: Disease surveillance based on Internet-based linear models: an Australian case study of previously unmodeled infection diseases

Model

Training period1

Google Trends data2

Keyword selection3

Model Name4

1

52 weeks

Raw data

Continuous

52RC

2

52 weeks

Wavelet transformed

Continuous

52WC

3

104 weeks

Raw data

Continuous

104RC

4

104 weeks

Wavelet transformed

Continuous

104WC

5

156 weeks

Raw data

Continuous

156RC

6

156 weeks

Wavelet transformed

Continuous

156WC

7

52 weeks

Raw data

Set

52RS

8

52 weeks

Wavelet transformed

Set

52WS

9

104 weeks

Raw data

Set

104RS

10

104 weeks

Wavelet transformed

Set

104WS

11

156 weeks

Raw data

Set

156RS

12

156 weeks

Wavelet transformed

Set

156WS

  1. 1The training period denotes how many weeks data are available to the model for fitting, keyword selection and wavelet construction. This period was also used to determine the best lag for keywords used in these models (but was restricted to the 2009–2011 data).
  2. 2Indicates the search metrics data available for the model.
  3. 3In producing forecasts for holdout data (2012–2013), continuous models are able to reselect keywords at each time point using the previous 52, 104 or 156 weeks data; set models use a selection of keywords determined using only the 2009–2011 data.
  4. 4Models are named using a combination of the number of weeks data visible to them (52/104/156), format of search metric data (raw/wavelet transformed; R/W) and the method of keyword selection (continuous/set; C/S).