Table 4 Mean distance error (km) per model on test data for unseen geographies.

From: Geographical classification of malaria parasites through applying machine learning to whole genome sequence data

Parasite

Location

LIN-R

LOG-C

CNN-R

CNN-C

P. falciparum

Cambodia

496

669

322

628

Cameroon

959

1545

1472

1636

DRC

1150

2331

2531

2456

Ethiopia

1118

1760

1252

1394

Myanmar

703

731

470

728

Peru

9050

4050

5856

2400

Mean

2246

1848

1983

1540

P. vivax

Cambodia

591

323

1709

564

Ethiopia

2499

5174

3528

4140

Malaysia

459

1594

3617

2064

Peru

2376

2943

1196

2852

Mean

1481

2508

2512

2405

  1. CNN Convolutional Neural Network; DRC Democratic Republic of Congo; LOG-C multinomial logistic regression classifier; CNN-C CNN deep learning classifier; LIN-R penalised linear regression model; CNN-R Penalised CNN regression model.