Table 4 Validation of imputation results with genotypes at 21 sites on custom Sequenom SpectroCHIP on all samples.

From: RETRACTED ARTICLE: 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project

SNP

RSID

REF

ALT

FREQ

N Samples

Con (%)

Pearson r2

chr1:11205058

rs1057079

C

T

0.8

11645

99.85

0.998

chr1:47398743

rs3890011

G

C

0.525

11625

99.95

0.999

chr2:141751592

rs13007735

G

A

0.595

11652

99.8

0.998

chr2:204824283

rs10172036

T

G

0.434

9561

94.69

0.957

chr2:99779131

rs2516835

T

C

0.651

11533

99.58

0.997

chr3:186443018

rs1656922

T

C

0.356

11634

99.64

0.996

chr7:47968927

rs2686817

C

A

0.522

11621

99.09

0.992

chr8:143310815

rs11167136

G

A

0.498

11618

98.61

0.987

chr9:125424507

rs70156

A

C

0.5

11632

94.8

0.962

chr10:120917445

rs2275111

G

A

0.707

11651

99.54

0.995

chr10:95279506

rs2293277

A

T

0.565

11506

95.62

0.966

chr12:120995332

rs2292681

G

A

0.767

11653

99.88

0.998

chr13:31233063

rs3742302

G

A

0.153

11651

99.91

0.998

chr14:20665840

rs4981088

G

A

0.507

11508

93.92

0.954

chr15:77344793

rs11737

T

A

0.415

11628

99.69

0.997

chr15:90226947

rs7169981

C

A

0.317

11615

98.92

0.99

chr16:20986506

rs3115438

C

T

0.481

11423

99.79

0.998

chr17:5991344

rs2302836

C

T

0.829

11652

96.52

0.955

chr18:61170721

rs1455556

T

C

0.546

11631

97.73

0.983

chr20:50238545

rs2235862

A

G

0.237

11649

99.8

0.997

chr20:52786219

rs2296241

G

A

0.417

11638

93.99

0.96

  1. The table shows concordance between SNP genotypes from low coverage sequence data and from 21 sites genotyped on a Sequenom SpectroCHIP on all samples. The first five columns show the chromosome and position (SNP), reference allele (REF) on Human Genome Reference GRCh37.p5 and alternative allele (ALT) called in CONVERGE, and the alternative allele frequency (FREQ) in CONVERGE. The next column shows the number of samples (N Samples) with genotypes from Sequenom at each of the 21 sites. The next two columns show the comparison between imputed allele dosages and genotypes from Sequenom at the 21 sites: percentage concordance (Con (%)) was calculated per site between hard-called genotypes from imputed genotype probabilities (where the genotype with the maximum imputed genotype probability > 0.9 was called) and genotypes called from the same samples at the same loci from Sequenom. Pearson r2 was also computed per site between imputed allele dosages.