Figure 1

Population structure analysis of the South Asian dataset. (a) Principal Component Analysis (PCA) of the curated dataset with (left) and without (right) external reference populations. (b) ADMIXTURE analysis of the curated dataset, K = 6 (CV error = 0.4274). For populations from the 1000 GP belonging to the South Asian region, a subset of 15 samples is represented. Population group abbreviations: T-AA tribal populations speaking Austroasiatic languages, T-DR Dravidian-speaking tribal populations, T-TB Tibeto-Burman-speaking tribal populations, nT-DR non-tribal populations speaking Dravidian languages, nT-IE non-tribal populations speaking Indo-European languages, PAK Pakistan, BEB Bangladesh, STU Sri Lanka, YRI Yoruba in Ibadan, Nigeria, Africa, CEU Utah residents with Northern and Western European ancestry, CHB Han Chinese in Beijing, China, MXL individuals with Mexican ancestry from Los Angeles, California. The full names for all populations within groups are available in Supplementary Table S3.