Abstract
The Cyprus Genome Project characterizes the genetic landscape of the Cypriot population to address the lack of population-specific data in global repositories. We employed a serial independent pool-sequencing (pool-seq) strategy to sequence DNA from 10,000 healthy bone marrow donors, randomized into 10 independent biological replicates. This study design was selected to leverage specific operational and statistical advantages. Operationally, the method enables the processing of a large cohort that would be cost-prohibitive using individual sequencing. Statistically, the use of 10 independent biological replicates allowed for the differentiation of true low-frequency variants from sequencing artifacts. Furthermore, this design enabled the calculation of empirical confidence intervals for variant frequencies. We utilized both Whole Exome Sequencing and a targeted gene panel (813 genes) to maximize read depth and sensitivity. The study identified over 4 million variants, including > 100,000 variants absent from the gnomAD v4.1 and ClinVar databases. Validation against published clinical cohorts confirmed high concordance (r > 0.92). The results highlight significant differences between local and global allele frequencies, including pathogenic variants that are common in Cyprus but rare globally. The results, including an interactive genome map with full annotations from gnomAD v4.1 and ClinVar, are publicly accessible at www.cyprusgenome.org, with the aim of advancing healthcare and facilitating future clinical research.
Data availability
The datasets generated during and/or analysed during the current study are available in the European Variation Archive (EVA) repository at [https://www.ebi.ac.uk/ena/browser/view/PRJEB89856], with study accession number [PRJEB89856] and analyses number ERZ29062157.
References
Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177(1), 26–31. https://doi.org/10.1016/j.cell.2019.02.048 (2019).
Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541(7637), 302–10. https://doi.org/10.1038/nature21347 (2017).
Mendy, M. et al. Biospecimens and biobanking in global health. Clin. Lab. Med. 38(1), 183–207. https://doi.org/10.1016/j.cll.2017.10.015 (2018).
Gerido, L. H. & Ayday, E. An Ethical Approach to Genomic Privacy Preserving Technology Development. Proc (IEEE Int Conf Healthc Inform). 2024 638–641. https://doi.org/10.1109/ichi61247 (2024).
Green, A. K. et al. The Project Data Sphere Initiative: Accelerating cancer research by sharing data. Oncologist 20(5), 464-e20. https://doi.org/10.1634/theoncologist.2014-0431 (2015).
Chern, C. J. & Beutler, E. Biochemical and electrophoretic studies of erythrocyte pyridoxine kinase in white and black Americans. Am. J. Hum. Genet. 28(1), 9–17 (1976).
Schlötterer, C., Kofler, R., Versace, E., Tobler, R. & Franssen, S. U. Combining experimental evolution with next-generation sequencing: A powerful tool to study adaptation from standing genetic variation. Heredity (Edinb) 114(5), 431–40. https://doi.org/10.1038/hdy.2014.86 (2015).
Schlötterer, C., Tobler, R., Kofler, R. & Nolte, V. Sequencing pools of individuals — Mining genome-wide polymorphism data without big funding. Nat. Rev. Genet. 15(11), 749–763. https://doi.org/10.1038/nrg3803 (2014).
Homburger, J. R. et al. Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores. Genome Med. 11 (1), 74. https://doi.org/10.1186/s13073-019-0682-2 (2019).
Ryu, S., Han, J., Norden-Krichmar, T. M., Schork, N. J. & Suh, Y. Effective discovery of rare variants by pooled target capture sequencing: A comparative analysis with individually indexed target capture sequencing. Mutat. Res. Fundam. Mol. Mech. Mutagen. 809, 24–31. https://doi.org/10.1016/j.mrfmmm.2018.03.007 (2018).
Zhang, L. et al. Three novel genetic variants in the FAM110D, CACNA1A, and NLRP12 genes are associated with susceptibility to hypertension among Dai People. Am. J. Hypertens. 34(8), 874–9. https://doi.org/10.1093/ajh/hpab040 (2021).
Zhang, P., Tillmans, L. S., Thibodeau, S. N. & Wang, L. Single-Nucleotide Polymorphisms Sequencing Identifies Candidate Functional Variants at Prostate Cancer Risk Loci. Genes 10 (7), 547. https://doi.org/10.3390/genes10070547 (2019).
Popp, B. et al. Exome pool-seq in neurodevelopmental disorders. Eur. J. Hum. Genet. 25(12), 1364–76. https://doi.org/10.1038/s41431-017-0022-1 (2017).
Lirakis, M., Nolte, V. & Schlötterer, C. Pool-GWAS on reproductive dormancy in Drosophila simulans suggests a polygenic architecture. G3 Genes Genomes Genet. 12(3), jkac027. https://doi.org/10.1093/g3journal/jkac027 (2022).
Sham, P., Bader, J. S., Craig, I., O’Donovan, M. & Owen, M. DNA pooling: A tool for large-scale association studies. Nat. Rev. Genet. 3(11), 862–71. https://doi.org/10.1038/nrg930 (2002).
Fracassetti, M., Griffin, P. C. & Willi, Y. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata. PLoS One 10(10), e0140462. https://doi.org/10.1371/journal.pone.0140462 (2015).
Moutsouri, I. et al. Comparative Y-chromosome analysis among Cypriots in the context of historical events and migrations. PLoS One. 16 (8), e0255140. https://doi.org/10.1371/journal.pone.0255140 (2021). PubMed PMID: 34424929; PubMed Central PMCID: PMC8382168.
Anaclerio, F. et al. Clinical usefulness of NGS multi-gene panel testing in hereditary cancer analysis. Front. Genet. 14, 1060504. https://doi.org/10.3389/fgene.2023.1060504 (2023).
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2), 292–4. https://doi.org/10.1093/bioinformatics/btv566 (2016).
Rellstab, C., Zoller, S., Tedder, A., Gugerli, F. & Fischer, M. C. Validation of SNP Allele Frequencies Determined by Pooled Next-Generation Sequencing in Natural Populations of a Non-Model Plant Species. PLOS ONE. 8 (11), e80422. https://doi.org/10.1371/journal.pone.0080422 (2013).
Cutler, D. J. & Jensen, J. D. To pool, or not to pool?. Genetics 186(1), 41–3. https://doi.org/10.1534/genetics.110.121012 (2010).
Votsi, C. et al. Type 2 diabetes susceptibility in the Greek-Cypriot population: Replication of associations with TCF7L2, FTO, HHEX, SLC30A8 and IGF2BP2 polymorphisms. Genes 8(1), 1. https://doi.org/10.3390/genes8010016 (2017).
Gene variants of adhesion molecules act as modifiers of disease severity. In: MS | Neurology Neuroimmunology & Neuroinflammation. https://nn.neurology.org/content/4/4/e350.short [cited 2023 Mar 3].
Chairta, P. et al. Genetic susceptibility to systemic sclerosis in the Greek-Cypriot population: A pilot study. Genet. Test. Mol. Biomarkers 24(5), 309–17. https://doi.org/10.1089/gtmb.2019.0255 (2020).
Loizidou, M. A. et al. DNA-repair genetic polymorphisms and risk of breast cancer in Cyprus. Breast Cancer Res. Treat. 115(3), 623–7. https://doi.org/10.1007/s10549-008-0084-4 (2009).
Loizidou, M. A. et al. Genetic variation in genes interacting with BRCA1/2 and risk of breast cancer in the Cypriot population. Breast Cancer Res. Treat. 121(1), 147–56. https://doi.org/10.1007/s10549-009-0518-7 (2010).
Georgiou, A. et al. Genetic and Environmental Factors Contributing to Parkinson’s Disease: A Case-Control Study in the Cypriot Population. Front. Neurol. 10, 1047. https://doi.org/10.3389/fneur.2019.01047 (2019).
Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: Pool-versus individual-based genotyping. Mol. Ecol. 22(14), 3766–79. https://doi.org/10.1111/mec.12360 (2013).
Boomsma, D. I. et al. The genome of the Netherlands: Design, and project goals. Eur. J. Hum. Genet. 22(2), 221–7. https://doi.org/10.1038/ejhg.2013.118 (2014).
Pala, M. et al. Population and individual-specific regulatory variation in Sardinia. Nat. Genet. 49(5), 700–7. https://doi.org/10.1038/ng.3840 (2017).
Athanasiou, Y. et al. Molecular and clinical investigation of cystinuria in the Greek-Cypriot population. Genet. Test. Mol. Biomarkers. 19(11), 641–5. https://doi.org/10.1089/gtmb.2015.0144 (2015).
rs5030868 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs5030868#publications [cited 2026 Feb 4].
rs61750100 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs61750100#clinical_significance [cited 2026 Feb 4].
rs1195669050 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs1195669050 [cited 2026 Feb 4].
rs1555572778 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs1555572778 [cited 2026 Feb 4].
rs863224380 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs863224380 [cited 2026 Feb 4] (2026).
rs146076691 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs146076691 [cited 2026 Feb 4] (2026).
Xiang, R. et al. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med. 16 (1), 33. https://doi.org/10.1186/s13073-024-01304-9 (2024).
Dikilitas, O. et al. Use of polygenic risk scores for coronary heart disease in ancestrally diverse populations. Curr. Cardiol. Rep. 24(9), 1169–77. https://doi.org/10.1007/s11886-022-01734-0 (2022).
Mai, J., Lu, M., Gao, Q., Zeng, J. & Xiao, J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun. Biol. 6 (1), 899. https://doi.org/10.1038/s42003-023-05279-y (2023).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48(3), 245–52. https://doi.org/10.1038/ng.3506 (2016).
Cao, C. et al. webTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res. 50(D1), D1123-30. https://doi.org/10.1093/nar/gkab957 (2022).
Cao, C. et al. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief. Bioinform. 22 (4), bbaa270. https://doi.org/10.1093/bib/bbaa270 (2021). PubMed PMID: 33200776.
Cao, C. et al. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies. Genetics 220 (2), iyab216. https://doi.org/10.1093/genetics/iyab216 (2021). PubMed PMID: 34849857; PubMed Central PMCID: PMC9208638.
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51(4), 592–9. https://doi.org/10.1038/s41588-019-0385-z (2019).
Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186 (1), 207–218. https://doi.org/10.1534/genetics.110.114397 (2010). PubMed PMID: 20457880; PubMed Central PMCID: PMC2940288.
Walter, K. et al. The UK10K project identifies rare variants in health and disease. Nature 526(7571), 82–90. https://doi.org/10.1038/nature14962 (2015).
Higasa, K. et al. Whole-genome sequencing of 3135 individuals representing the genetic diversity of the Japanese population. J. Hum. Genet. 1–8. https://doi.org/10.1038/s10038-025-01430-1 (2025).
bcl2fastq2 Conversion Software v2.20 https://emea.support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2-v2-20-software-guide-15051736-03.pdf.
Li, H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28 (14), 1838–1844. https://doi.org/10.1093/bioinformatics/bts (2012). 280 PubMed PMID: 22569178; PubMed Central PMCID: PMC3389770.
Andrews, S. & FastQC A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ [cited 2026 Feb 17] (2010).
Picard Tools. By Broad Institute. https://broadinstitute.github.io/picard/ [cited 2026 Feb 17].
Zhou, W. et al. Bias from removing read duplication in ultra-deep sequencing experiments. Bioinformatics 30(8), 1073–80. https://doi.org/10.1093/bioinformatics/btt771 (2014).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 (16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 (2009). PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–76. https://doi.org/10.1101/gr.129684.111 (2012).
Grubaugh, N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8. https://doi.org/10.1186/s13059-018-1618-7 (2019). PubMed PMID: 30621750; PubMed Central PMCID: PMC6325816.
Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinform. 13 (1), 239. https://doi.org/10.1186/1471-2105-13-239 (2012).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38 (16), e164. https://doi.org/10.1093/nar/gkq603 (2010).
Ewing, B. & Green, P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8(3), 186–194. https://doi.org/10.1101/gr.8 (1998).
Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8(3), 175–85. https://doi.org/10.1101/gr.8.3.175 (1998).
Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 32 (20), 3213. https://doi.org/10.1093/bioinformatics/btw (2016). 520 PubMed PMID: 27578802; PubMed Central PMCID: PMC5048073.
Anand, S. et al. Next generation sequencing of pooled samples: Guideline for variants’ filtering. Sci. Rep. 6(1), 33735. https://doi.org/10.1038/srep33735 (2016).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29(1), 24–26. https://doi.org/10.1038/nbt.1754 (2011).
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative genomics viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 14(2), 178–192. https://doi.org/10.1093/bib/bbs017 (2013).
Author information
Authors and Affiliations
Contributions
A. Antoniades and J.C contributed equally to this manuscript.P.C provided the main concept and idea of the manuscript as well as provided the biorender license to create Fig. 1. All authors reviewed the manuscript.C.B, D.V, E.C wrote the main manuscript textP.V contributed to the statistical analysis.A. Aristodimou and D.V performed all computational analysis.A.M, A.P, A.K, P.G, Y.K performed all laboratory experiments.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Antoniades, A., Chi, J., Brown, C. et al. Population genetic variation characterised through serial independent pool-seq: the Cyprus Genome Project. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44707-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-44707-x