Table 3 Mean distance error (km) per model on test data using those countries included in the training data.

From: Geographical classification of malaria parasites through applying machine learning to whole genome sequence data

Parasite

Region

Location

LIN-R

LOG-C

CNN-R

CNN-C

P. falciparum

West Africa

Benin

700

4

354

45

Burkina Faso

374

96

161

88

Gambia

775

132

317

107

Ghana

401

48

193

52

Guinea

751

515

459

402

Ivory Coast

630

681

695

728

Mali

563

345

208

271

Mauritania

615

676

382

410

Nigeria

1039

329

1169

329

Senegal

1354

274

565

263

East Africa

Kenya

693

200

297

117

Tanzania

707

3

289

0

Uganda

1198

1581

856

1856

Horn of Africa

Ethiopia

568

0

124

0

Central Africa

Cameroon

635

28

184

0

SC Africa

DRC

477

2

34

0

Southern Africa

Madagascar

490

6

1543

0

Malawi

968

432

1018

530

SEA

Bangladesh

743

9

159

0

Cambodia

312

18

112

21

Laos

276

121

152

53

Myanmar

360

10

559

0

Thailand

247

7

39

7

Vietnam

356

90

199

0

South America

Colombia

2052

0

4832

0

Peru

1820

7

2535

0

Oceania

PNG

488

0

697

0

Mean

 

470

93

245

77

P. vivax

Horn of Africa

Ethiopia

334

0

142

0

South Asia

India

500

0

517

0

SEA

Cambodia

638

25

648

0

China

2751

1033

704

1463

Myanmar

616

311

350

311

Thailand

604

0

288

0

Vietnam

156

0

578

0

SSEA

Malaysia

213

0

957

0

South America

Brazil

3080

0

2773

6

Colombia

1057

0

667

0

Mexico

134

0

1502

0

Peru

755

0

574

0

Oceania

PNG

175

0

1103

0

Mean

 

890

33

819

36

  1. DRC Democratic Republic of Congo; PNG Papua New Guinea; CNN Convolutional Neural Network; LOG-C multinomial logistic regression classifier; CNN-C CNN deep learner classifier; LIN-R penalised linear regression model; CNN-R Penalised CNN regression model; SC South Central; SEA South East Asia; SSEA Southern SEA.