Table 1 Performance comparison of several global ancestry inference algorithms

From: Neural ADMIXTURE for rapid genomic clustering

Dataset

Algorithm

Δ(Q, QGT)

RMSE(Q, QGT)

RMSE(F, FGT)

Runtime (CPU)

Runtime (GPU)

All-Chms

ADMIXTURE

0.042

0.153

0.062

>1 day

 

AlStructure

0.064

0.159

0.032

06:04:28

 

TeraStructure

0.033

0.133

02:12:46

 

HaploNet

0.026

0.114

03:17:00

 

Neural ADMIXTURE

0.025

0.108

0.011

00:11:21

00:01:32

Chm-22

ADMIXTURE

0.048

0.161

0.068

02:56:29

 

fastSTRUCTURE

0.055

0.162

03:31:00

 

AlStructure

0.116

0.256

0.068

00:46:49

 

TeraStructure

0.050

0.170

00:43:48

 

HaploNet

0.053

0.170

01:09:29

 

Neural ADMIXTURE

0.033

0.140

0.016

00:05:46

00:00:45

Chm-22-Sim

ADMIXTURE

0.046

0.197

0.067

09:48:18

 

fastSTRUCTURE

0.069

0.237

>1 day

 

AlStructure

0.126

0.286

0.076

02:51:36

 

TeraStructure

0.040

0.175

06:37:14

 

HaploNet

0.026

0.113

02:07:54

 

Neural ADMIXTURE

0.011

0.070

6.02×103

00:20:41

00:01:34

PAB

ADMIXTURE

1.44×104

0.010

5.97×103

03:31:01

 

AlStructure

1.45 × 10−3

0.026

7.83 × 10−3

05:10:42

 

TeraStructure

1.97 × 10−4

0.012

01:13:38

 

HaploNet

0.039

0.248

02:37:09

 

Neural ADMIXTURE

4.34 × 10−3

0.055

7.01 × 10−3

00:14:27

00:01:48

Synthetic

ADMIXTURE

1.37×104

0.011

0.028

00:08:06

 

AlStructure

2.74 × 10−4

0.014

0.030

00:03:07

 

TeraStructure

1.13 × 10−3

0.032

00:03:28

 

HaploNet

0.022

0.123

00:04:04

 

Neural ADMIXTURE

8.60 × 10−4

0.030

0.028

00:01:25

00:00:12

  1. Metrics reported from the training data. Root mean squared error (RMSE) (F, FGT), as defined in the Methods section, for fastSTRUCTURE, TeraStructure, and HaploNet was not computed because the first two lack an allele frequency matrix and the third lacks interpretability. HaploNet was not run on CPU because its resource and time requirements exceed system capabilities. Runtime format is HH:MM:SS and denotes wall-clock time. A runtime longer than a day denotes that the algorithm could not finish on the described hardware within 24 h, requiring it to be run on alternative hardware for longer. The best performing method for a given metric is highlighted in bold.