Table 2 Entire dataset, training sets, and validation sets for the two waves that occurred during the Brazilian COVID-19 outbreak.

	CBC (+)	CBC (−)
Gender	COVID-19 (+)	COVID-19 (−)	Influenza-A (+)	Influenza-B (+)	Influenza-H1N1 (+)	Other viruses (+)
Entire data
Male	11.3%	34.0%	46.7%	46.5%	48.4%	59.5%
	(122,793)	(369,787)	(3160)	(1384)	(4108)	(20,107)
Female	10.3%	44.4%	53.3%	53.5%	51.6%	40.5%
	(111,673)	(482,453)	(3604)	(1588)	(4380)	(13,691)
Training set: first wave data
Male	12.9%	9.8%	4.2%	2.1%	6.0%	12.8%
	(5859)	(4469)	(1895)	(975)	(2742)	(5825)
Female	12.1%	15.2%	4.9%	2.8%	6.9%	10.3%
	(5527)	(6918)	(2223)	(1214)	(3118)	(4656)
Validation set: first wave data
Male	4.9%	37.6%	1.0%	<0.1%	1.4%	2.3%
	(5808)	(44,637)	(1113)	(188)	(1660)	(2710)
Female	4.7%	43.4%	1.1%	<0.1%	1.6%	1.7%
	(5647)	(51,550)	(1343)	(134)	(1842)	(2028)
Training set: second wave data
Male	25.9%	10.5%	2.0%	1.0%	3.1%	6.3%
	(24,104)	(9770)	(1895)	(975)	(2742)	(5825)
Female	24.2%	15.1%	2.3%	1.3%	3.3%	5.0%
	(22,404)	(14,088)	(2223)	(1214)	(3118)	(4656)
Validation set: second wave data
Male	4.5%	38.9%	0.4%	<0.1%	0.6%	1.0%
	(11,860)	(101,655)	(1113)	(188)	(1660)	(2710)
Female	4.3%	48.1%	0.5%	<0.1%	0.7%	0.8%
	(11,021)	(125,776)	(1343)	(134)	(1842)	(2028)

Training sets were obtained after applying the inclusion–exclusion criteria to the entire data and downsampling the COVID-19(-) class in the training sets to account for class unbalance. We considered October 1st as the split point between the first and second wave data to eliminate possible incubation periods before the start of the second wave in early November. As such, validation for the first wave encompasses data from late June to late September, and validation for the second wave ranges from early October to late February. N = 1,138,728 CBCs.

Quick links

Search