Table 4 Classification results from RF prediction model by residence, water source type, and model scenario.

From: Addressing gaps in data on drinking water quality through data integration and machine learning: evidence from Ethiopia

 

Number of households

Correctly Classified

Misclassified

Total Correctly Classified (%)

Contaminated (%)

Not Contaminated (%)

Contaminated (%)

Not Contaminated (%)

a. Full Model

      

National

883

76.3

12.1

7.5

4.1

88.4

Urban

288

47.2

30.6

12.2

10.1

77.8

Rural

595

90.4

3.2

5.2

1.2

93.6

Piped on-premises

214

41.1

38.3

9.8

10.7

79.4

Public standpipe

103

77.7

13.6

4.9

3.9

91.3

Truck, vendor

41

56.1

4.9

26.8

12.2

61.0

Rainwater

6

100.0

0.0

0.0

0.0

100.0

Protected spring/well

245

86.5

2.4

9.4

1.6

89.0

Unprotected springs, well

159

97.5

0.6

1.9

0.0

98.1

Surface water

99

100.0

0.0

0.0

0.0

100.0

b. Scenarios

      

Water Source only

883

80.4

0.0

19.6

0.0

80.4

Water Source Plus

883

76.2

9.4

10.2

4.2

85.6

Geospatial only

883

76.7

11.4

8.2

3.7

88.1

Geospatial Plus

883

76.1

10.6

8.9

4.3

86.7

  1. The total number of households included in this study is 4688. The total number of households in column 2 of this table refers to the number of households in the test data.