Population genetic variation characterised through serial independent pool-seq: the Cyprus Genome Project

Antoniades, Athos; Chi, Jianxiang; Brown, Cameron; Vogazianos, Paris; Charalambous, Emily; Vrachnos, Dimitris; Aristodimou, Aristos; Miltiadous, Andri; Papaloizou, Andri; Koumouli, Anita; Gerasimou, Petroula; Kyprianou, Yiannos; Costeas, Paul

doi:10.1038/s41598-026-44707-x

Download PDF

Article
Open access
Published: 19 March 2026

Population genetic variation characterised through serial independent pool-seq: the Cyprus Genome Project

Athos Antoniades²^na1,
Jianxiang Chi⁵^na1,
Cameron Brown²,
Paris Vogazianos^2,3,
Emily Charalambous²,
Dimitris Vrachnos²,
Aristos Aristodimou²,
Andri Miltiadous¹,
Andri Papaloizou¹,
Anita Koumouli¹,
Petroula Gerasimou¹,
Yiannos Kyprianou¹ &
…
Paul Costeas^1,4,5

Scientific Reports , Article number: (2026) Cite this article

762 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The Cyprus Genome Project characterizes the genetic landscape of the Cypriot population to address the lack of population-specific data in global repositories. We employed a serial independent pool-sequencing (pool-seq) strategy to sequence DNA from 10,000 healthy bone marrow donors, randomized into 10 independent biological replicates. This study design was selected to leverage specific operational and statistical advantages. Operationally, the method enables the processing of a large cohort that would be cost-prohibitive using individual sequencing. Statistically, the use of 10 independent biological replicates allowed for the differentiation of true low-frequency variants from sequencing artifacts. Furthermore, this design enabled the calculation of empirical confidence intervals for variant frequencies. We utilized both Whole Exome Sequencing and a targeted gene panel (813 genes) to maximize read depth and sensitivity. The study identified over 4 million variants, including > 100,000 variants absent from the gnomAD v4.1 and ClinVar databases. Validation against published clinical cohorts confirmed high concordance (r > 0.92). The results highlight significant differences between local and global allele frequencies, including pathogenic variants that are common in Cyprus but rare globally. The results, including an interactive genome map with full annotations from gnomAD v4.1 and ClinVar, are publicly accessible at www.cyprusgenome.org, with the aim of advancing healthcare and facilitating future clinical research.

Data availability

The datasets generated during and/or analysed during the current study are available in the European Variation Archive (EVA) repository at [https://www.ebi.ac.uk/ena/browser/view/PRJEB89856], with study accession number [PRJEB89856] and analyses number ERZ29062157.

References

Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing diversity in human genetic studies. Cell 177(1), 26–31. https://doi.org/10.1016/j.cell.2019.02.048 (2019).
Google Scholar
Nielsen, R. et al. Tracing the peopling of the world through genomics. Nature 541(7637), 302–10. https://doi.org/10.1038/nature21347 (2017).
Google Scholar
Mendy, M. et al. Biospecimens and biobanking in global health. Clin. Lab. Med. 38(1), 183–207. https://doi.org/10.1016/j.cll.2017.10.015 (2018).
Google Scholar
Gerido, L. H. & Ayday, E. An Ethical Approach to Genomic Privacy Preserving Technology Development. Proc (IEEE Int Conf Healthc Inform). 2024 638–641. https://doi.org/10.1109/ichi61247 (2024).
Green, A. K. et al. The Project Data Sphere Initiative: Accelerating cancer research by sharing data. Oncologist 20(5), 464-e20. https://doi.org/10.1634/theoncologist.2014-0431 (2015).
Google Scholar
Chern, C. J. & Beutler, E. Biochemical and electrophoretic studies of erythrocyte pyridoxine kinase in white and black Americans. Am. J. Hum. Genet. 28(1), 9–17 (1976).
Google Scholar
Schlötterer, C., Kofler, R., Versace, E., Tobler, R. & Franssen, S. U. Combining experimental evolution with next-generation sequencing: A powerful tool to study adaptation from standing genetic variation. Heredity (Edinb) 114(5), 431–40. https://doi.org/10.1038/hdy.2014.86 (2015).
Google Scholar
Schlötterer, C., Tobler, R., Kofler, R. & Nolte, V. Sequencing pools of individuals — Mining genome-wide polymorphism data without big funding. Nat. Rev. Genet. 15(11), 749–763. https://doi.org/10.1038/nrg3803 (2014).
Google Scholar
Homburger, J. R. et al. Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores. Genome Med. 11 (1), 74. https://doi.org/10.1186/s13073-019-0682-2 (2019).
Google Scholar
Ryu, S., Han, J., Norden-Krichmar, T. M., Schork, N. J. & Suh, Y. Effective discovery of rare variants by pooled target capture sequencing: A comparative analysis with individually indexed target capture sequencing. Mutat. Res. Fundam. Mol. Mech. Mutagen. 809, 24–31. https://doi.org/10.1016/j.mrfmmm.2018.03.007 (2018).
Google Scholar
Zhang, L. et al. Three novel genetic variants in the FAM110D, CACNA1A, and NLRP12 genes are associated with susceptibility to hypertension among Dai People. Am. J. Hypertens. 34(8), 874–9. https://doi.org/10.1093/ajh/hpab040 (2021).
Google Scholar
Zhang, P., Tillmans, L. S., Thibodeau, S. N. & Wang, L. Single-Nucleotide Polymorphisms Sequencing Identifies Candidate Functional Variants at Prostate Cancer Risk Loci. Genes 10 (7), 547. https://doi.org/10.3390/genes10070547 (2019).
Google Scholar
Popp, B. et al. Exome pool-seq in neurodevelopmental disorders. Eur. J. Hum. Genet. 25(12), 1364–76. https://doi.org/10.1038/s41431-017-0022-1 (2017).
Google Scholar
Lirakis, M., Nolte, V. & Schlötterer, C. Pool-GWAS on reproductive dormancy in Drosophila simulans suggests a polygenic architecture. G3 Genes Genomes Genet. 12(3), jkac027. https://doi.org/10.1093/g3journal/jkac027 (2022).
Google Scholar
Sham, P., Bader, J. S., Craig, I., O’Donovan, M. & Owen, M. DNA pooling: A tool for large-scale association studies. Nat. Rev. Genet. 3(11), 862–71. https://doi.org/10.1038/nrg930 (2002).
Google Scholar
Fracassetti, M., Griffin, P. C. & Willi, Y. Validation of Pooled Whole-Genome Re-Sequencing in Arabidopsis lyrata. PLoS One 10(10), e0140462. https://doi.org/10.1371/journal.pone.0140462 (2015).
Google Scholar
Moutsouri, I. et al. Comparative Y-chromosome analysis among Cypriots in the context of historical events and migrations. PLoS One. 16 (8), e0255140. https://doi.org/10.1371/journal.pone.0255140 (2021). PubMed PMID: 34424929; PubMed Central PMCID: PMC8382168.
Google Scholar
Anaclerio, F. et al. Clinical usefulness of NGS multi-gene panel testing in hereditary cancer analysis. Front. Genet. 14, 1060504. https://doi.org/10.3389/fgene.2023.1060504 (2023).
Google Scholar
Okonechnikov, K., Conesa, A. & García-Alcalde, F. Qualimap 2: Advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32(2), 292–4. https://doi.org/10.1093/bioinformatics/btv566 (2016).
Google Scholar
Rellstab, C., Zoller, S., Tedder, A., Gugerli, F. & Fischer, M. C. Validation of SNP Allele Frequencies Determined by Pooled Next-Generation Sequencing in Natural Populations of a Non-Model Plant Species. PLOS ONE. 8 (11), e80422. https://doi.org/10.1371/journal.pone.0080422 (2013).
Google Scholar
Cutler, D. J. & Jensen, J. D. To pool, or not to pool?. Genetics 186(1), 41–3. https://doi.org/10.1534/genetics.110.121012 (2010).
Google Scholar
Votsi, C. et al. Type 2 diabetes susceptibility in the Greek-Cypriot population: Replication of associations with TCF7L2, FTO, HHEX, SLC30A8 and IGF2BP2 polymorphisms. Genes 8(1), 1. https://doi.org/10.3390/genes8010016 (2017).
Google Scholar
Gene variants of adhesion molecules act as modifiers of disease severity. In: MS | Neurology Neuroimmunology & Neuroinflammation. https://nn.neurology.org/content/4/4/e350.short [cited 2023 Mar 3].
Chairta, P. et al. Genetic susceptibility to systemic sclerosis in the Greek-Cypriot population: A pilot study. Genet. Test. Mol. Biomarkers 24(5), 309–17. https://doi.org/10.1089/gtmb.2019.0255 (2020).
Google Scholar
Loizidou, M. A. et al. DNA-repair genetic polymorphisms and risk of breast cancer in Cyprus. Breast Cancer Res. Treat. 115(3), 623–7. https://doi.org/10.1007/s10549-008-0084-4 (2009).
Google Scholar
Loizidou, M. A. et al. Genetic variation in genes interacting with BRCA1/2 and risk of breast cancer in the Cypriot population. Breast Cancer Res. Treat. 121(1), 147–56. https://doi.org/10.1007/s10549-009-0518-7 (2010).
Google Scholar
Georgiou, A. et al. Genetic and Environmental Factors Contributing to Parkinson’s Disease: A Case-Control Study in the Cypriot Population. Front. Neurol. 10, 1047. https://doi.org/10.3389/fneur.2019.01047 (2019).
Google Scholar
Gautier, M. et al. Estimation of population allele frequencies from next-generation sequencing data: Pool-versus individual-based genotyping. Mol. Ecol. 22(14), 3766–79. https://doi.org/10.1111/mec.12360 (2013).
Google Scholar
Boomsma, D. I. et al. The genome of the Netherlands: Design, and project goals. Eur. J. Hum. Genet. 22(2), 221–7. https://doi.org/10.1038/ejhg.2013.118 (2014).
Google Scholar
Pala, M. et al. Population and individual-specific regulatory variation in Sardinia. Nat. Genet. 49(5), 700–7. https://doi.org/10.1038/ng.3840 (2017).
Google Scholar
Athanasiou, Y. et al. Molecular and clinical investigation of cystinuria in the Greek-Cypriot population. Genet. Test. Mol. Biomarkers. 19(11), 641–5. https://doi.org/10.1089/gtmb.2015.0144 (2015).
Google Scholar
rs5030868 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs5030868#publications [cited 2026 Feb 4].
rs61750100 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs61750100#clinical_significance [cited 2026 Feb 4].
rs1195669050 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs1195669050 [cited 2026 Feb 4].
rs1555572778 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs1555572778 [cited 2026 Feb 4].
rs863224380 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs863224380 [cited 2026 Feb 4] (2026).
rs146076691 RefSNP. Report - dbSNP - NCBI. https://www.ncbi.nlm.nih.gov/snp/rs146076691 [cited 2026 Feb 4] (2026).
Xiang, R. et al. Recent advances in polygenic scores: translation, equitability, methods and FAIR tools. Genome Med. 16 (1), 33. https://doi.org/10.1186/s13073-024-01304-9 (2024).
Google Scholar
Dikilitas, O. et al. Use of polygenic risk scores for coronary heart disease in ancestrally diverse populations. Curr. Cardiol. Rep. 24(9), 1169–77. https://doi.org/10.1007/s11886-022-01734-0 (2022).
Google Scholar
Mai, J., Lu, M., Gao, Q., Zeng, J. & Xiao, J. Transcriptome-wide association studies: recent advances in methods, applications and available databases. Commun. Biol. 6 (1), 899. https://doi.org/10.1038/s42003-023-05279-y (2023).
Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48(3), 245–52. https://doi.org/10.1038/ng.3506 (2016).
Google Scholar
Cao, C. et al. webTWAS: A resource for disease candidate susceptibility genes identified by transcriptome-wide association study. Nucleic Acids Res. 50(D1), D1123-30. https://doi.org/10.1093/nar/gkab957 (2022).
Google Scholar
Cao, C. et al. kTWAS: integrating kernel machine with transcriptome-wide association studies improves statistical power and reveals novel genes. Brief. Bioinform. 22 (4), bbaa270. https://doi.org/10.1093/bib/bbaa270 (2021). PubMed PMID: 33200776.
Google Scholar
Cao, C. et al. Disentangling genetic feature selection and aggregation in transcriptome-wide association studies. Genetics 220 (2), iyab216. https://doi.org/10.1093/genetics/iyab216 (2021). PubMed PMID: 34849857; PubMed Central PMCID: PMC9208638.
Google Scholar
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51(4), 592–9. https://doi.org/10.1038/s41588-019-0385-z (2019).
Google Scholar
Futschik, A. & Schlötterer, C. The next generation of molecular markers from massively parallel sequencing of pooled DNA samples. Genetics 186 (1), 207–218. https://doi.org/10.1534/genetics.110.114397 (2010). PubMed PMID: 20457880; PubMed Central PMCID: PMC2940288.
Google Scholar
Walter, K. et al. The UK10K project identifies rare variants in health and disease. Nature 526(7571), 82–90. https://doi.org/10.1038/nature14962 (2015).
Google Scholar
Higasa, K. et al. Whole-genome sequencing of 3135 individuals representing the genetic diversity of the Japanese population. J. Hum. Genet. 1–8. https://doi.org/10.1038/s10038-025-01430-1 (2025).
bcl2fastq2 Conversion Software v2.20 https://emea.support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2-v2-20-software-guide-15051736-03.pdf.
Li, H. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28 (14), 1838–1844. https://doi.org/10.1093/bioinformatics/bts (2012). 280 PubMed PMID: 22569178; PubMed Central PMCID: PMC3389770.
Google Scholar
Andrews, S. & FastQC A Quality Control tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ [cited 2026 Feb 17] (2010).
Picard Tools. By Broad Institute. https://broadinstitute.github.io/picard/ [cited 2026 Feb 17].
Zhou, W. et al. Bias from removing read duplication in ultra-deep sequencing experiments. Bioinformatics 30(8), 1073–80. https://doi.org/10.1093/bioinformatics/btt771 (2014).
Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25 (16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 (2009). PubMed PMID: 19505943; PubMed Central PMCID: PMC2723002.
Google Scholar
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22(3), 568–76. https://doi.org/10.1101/gr.129684.111 (2012).
Google Scholar
Grubaugh, N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8. https://doi.org/10.1186/s13059-018-1618-7 (2019). PubMed PMID: 30621750; PubMed Central PMCID: PMC6325816.
Google Scholar
Raineri, E. et al. SNP calling by sequencing pooled samples. BMC Bioinform. 13 (1), 239. https://doi.org/10.1186/1471-2105-13-239 (2012).
Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38 (16), e164. https://doi.org/10.1093/nar/gkq603 (2010).
Google Scholar
Ewing, B. & Green, P. Base-calling of automated sequencer traces using Phred. II. Error probabilities. Genome Res. 8(3), 186–194. https://doi.org/10.1101/gr.8 (1998).
Google Scholar
Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer traces using Phred. I. Accuracy assessment. Genome Res. 8(3), 175–85. https://doi.org/10.1101/gr.8.3.175 (1998).
Google Scholar
Bansal, V. A statistical method for the detection of variants from next-generation resequencing of DNA pools. Bioinformatics 32 (20), 3213. https://doi.org/10.1093/bioinformatics/btw (2016). 520 PubMed PMID: 27578802; PubMed Central PMCID: PMC5048073.
Google Scholar
Anand, S. et al. Next generation sequencing of pooled samples: Guideline for variants’ filtering. Sci. Rep. 6(1), 33735. https://doi.org/10.1038/srep33735 (2016).
Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29(1), 24–26. https://doi.org/10.1038/nbt.1754 (2011).
Google Scholar
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative genomics viewer (IGV): High-performance genomics data visualization and exploration. Brief. Bioinform. 14(2), 178–192. https://doi.org/10.1093/bib/bbs017 (2013).
Google Scholar

Download references

Author information

Athos Antoniades and Jianxiang Chi contributed equally to this work.

Authors and Affiliations

Karaiskakio Foundation, 15, Nicandrou Papamina Avenue, Nicosia, 2032, Cyprus
Andri Miltiadous, Andri Papaloizou, Anita Koumouli, Petroula Gerasimou, Yiannos Kyprianou & Paul Costeas
Stremble Ventures Ltd, Ko 8 Germasogeia, Limassol, 4045, Cyprus
Athos Antoniades, Cameron Brown, Paris Vogazianos, Emily Charalambous, Dimitris Vrachnos & Aristos Aristodimou
European University Cyprus, 6 Diogenis Str, Engomi Nicosia, 2404, Cyprus
Paris Vogazianos
Cyprus Cancer Research Institute, 1 University Avenue, CCRI ”Nicola David - Pinedo Building, Aglantzia, 2109, Cyprus
Paul Costeas
The Centre for the Study of Heamatological and other Malignancies, Nicandrou Papamina Avenue, Nicosia, 2032, Cyprus
Jianxiang Chi & Paul Costeas

Authors

Athos Antoniades
View author publications
Search author on:PubMed Google Scholar
Jianxiang Chi
View author publications
Search author on:PubMed Google Scholar
Cameron Brown
View author publications
Search author on:PubMed Google Scholar
Paris Vogazianos
View author publications
Search author on:PubMed Google Scholar
Emily Charalambous
View author publications
Search author on:PubMed Google Scholar
Dimitris Vrachnos
View author publications
Search author on:PubMed Google Scholar
Aristos Aristodimou
View author publications
Search author on:PubMed Google Scholar
Andri Miltiadous
View author publications
Search author on:PubMed Google Scholar
Andri Papaloizou
View author publications
Search author on:PubMed Google Scholar
Anita Koumouli
View author publications
Search author on:PubMed Google Scholar
Petroula Gerasimou
View author publications
Search author on:PubMed Google Scholar
Yiannos Kyprianou
View author publications
Search author on:PubMed Google Scholar
Paul Costeas
View author publications
Search author on:PubMed Google Scholar

Contributions

A. Antoniades and J.C contributed equally to this manuscript.P.C provided the main concept and idea of the manuscript as well as provided the biorender license to create Fig. 1. All authors reviewed the manuscript.C.B, D.V, E.C wrote the main manuscript textP.V contributed to the statistical analysis.A. Aristodimou and D.V performed all computational analysis.A.M, A.P, A.K, P.G, Y.K performed all laboratory experiments.

Corresponding author

Correspondence to Athos Antoniades.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Material 1 (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Antoniades, A., Chi, J., Brown, C. et al. Population genetic variation characterised through serial independent pool-seq: the Cyprus Genome Project. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44707-x

Download citation

Received: 31 October 2025
Accepted: 13 March 2026
Published: 19 March 2026
DOI: https://doi.org/10.1038/s41598-026-44707-x