Table 1 SNP sets for achieving comparable record-matching accuracy to the full data.

From: Toward minimal SNP sets for record-matching with CODIS STR profiles

 

One-to-one

SNP query

STR query

Needle-in-haystack

 

Condition

SNP panel size

Median

Range

Median

Range

Median

Range

Median

Range

A. Baseline – randomly selected SNPs

 

Random selection

9000

1

[1–1]

1

[0.997–1]

1

[0.998–1]

0.995

[0.962–1]

B. Locus characteristics

 

Pop-MAF ≥ 1%

1800

1

[1–1]

1

[0.997–1]

1

[0.998–1]

0.992

[0.784–1]

Pop-MAF ≥ 5%

1800

1

[1–1]

1

[0.997–1]

1

[0.998–1]

0.992

[0.896–1]

C. Combinations of locus characteristics

 

MAF ≥ 5%, Distance ≤ 0.125 Mb

900

1

[1–1]

1

[0.998–1]

1

[0.997–1]

0.990

[0.879–1]

MAF ≥ 10%, Distance ≤ 0.125 Mb

900

1

[1–1]

1

[0.998–1]

1

[0.998–1]

0.992

[0.866–1]

Pop-MAF > 0%, Distance ≤ 0.0625 Mb

900

1

[1–1]

1

[0.998–1]

1

[0.998–1]

0.990

[0.939–1]

Pop-MAF > 0%, Distance ≤ 0.125 Mb

900

1

[1–1]

1

[0.998–1]

1

[0.997–1]

0.990

[0.799–1]

Pop-MAF ≥ 1%, Distance ≤ 0.0625 Mb

900

1

[1–1]

1

[0.997–1]

1

[0.998–1]

0.992

[0.936–1]

Pop-MAF ≥ 5%, Distance ≤ 0.125 Mb

900

1

[1–1]

1

[0.998–1]

1

[0.998–1]

0.994

[0.949–1]

  1. (A) Randomly selected SNPs. (B) SNPs with a specific characteristic (MAF, pop-MAF, distance to CODIS STR, and Davg conditions). (C) SNPs with two or more characteristics (combinations among MAF or pop-MAF, distance to CODIS STR, and Davg conditions). Full numerical results for the SNP selection strategies appear in Tables S5 and S6A.