Abstract
The Kolbroek pig is indigenous to South Africa and a breed of choice for smallholder farmers. This is mainly due to its characteristics, such as disease resistance and adaptability to tropical agroecological environments. Despite these desirable traits, the genomic architecture of this breed has not been explored. In this study, we report a high-quality genome assembly of the South African Kolbroek pig sequenced at 31 X coverage through a combination of PacBio Sequel IIe HiFi and Illumina Novaseq 6000 Omni-C sequencing. The assembled genome resulted in a length of 2.6 Gb in size, including 83 Scaffolds, which consist of 19 chromosome-size scaffolds with 138.7 Mb. The BUSCO completeness at 95.5%. Genome annotation and structure prediction identified 22,025 genes with protein-coding potential. The genome provides an opportunity to investigate genetic variation across multiple pig breeds and serves as a genetic resource to develop breeding programs for the conservation and improvement of the Kolbroek pig.
Similar content being viewed by others
Code availability
Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.
References
Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science 42, 507–510 (2012).
Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).
Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics 11, 344 (2020).
Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel) 14, 236 (2024).
Ramsay, K. R. Sustainable housing: Indicators and implications (2002).
Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England) 1, 317–323 (2007).
Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics 69, 944–951 (2010).
Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability 22 (2024).
Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data 11, 840–8 (2024).
Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.
Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology 42, 367–370 (2024).
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432–1432 (2020).
Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39 (2023).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology 20, 1 (2019).
Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, 351 (2005).
Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.
Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research 12, 1269–1276 (2002).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999).
Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA 5, 13–13 (2014).
Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England) 40 (2024).
Kolbroek Annotation Files, https://doi.org/10.6084/m9.figshare.28754990 (2025).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 1–27 (2020).
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution 38, 4647–4654 (2021).
Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 38, 4214–4216 (2022).
Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics 10, 1361–1374 (2020).
Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).
OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes 10, 275–4 (2017).
Smith, R. M. et al. Kolbroek Assembly. GenBank http://identifiers.org/insdc.gca:GCA_055447695.1 (2026).
Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol. 270, 131796 (2024).
Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).
Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources 22, 1508–1520 (2022).
Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).
Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958–e4958 (2018).
Acknowledgements
We acknowledge funding and support from the University of South Africa for funding In addition, we acknowledge the Vertebrate Genome Project for their support when executing the pipeline. We would also like to thank the Staff at Inqaba Biotech for completing the library preparation of the collected samples and as well as completing the HiFi Sequencing. We would like to thank the staff at the University of Stellenbosch, where the Omni-C data was produced. We would like to acknowledge Prof Jasper Rees, Dr Sikhumbuzo Mbizeni, and Dr Thivhilaheli Richard Netshirovha for their assistance during sample collection and analyses. We would also like to that Prof Cuthbert Banga for his assistance with critical reading. We would like to acknowledge Galaxy Europe for hosting the data and supplying computing resources. This article forms part of the objectives for the Africa Biogenome Project. A special thanks to the team in Mapholi Labs for the resources and for managing the project.
Author information
Authors and Affiliations
Contributions
Conceptualization: R.M.S., L.T.N., N.O.M., S.M., A.D.; Data Curation: R.M.S., A.H.M.; Formal Analysis: R.M.S., L.T.N., T.T., S.M., A.H.M.; Funding Acquisition: N.O.M., T.M.; Investigation: R.M.S.; Methodology: R.M.S., L.T.N., S.M., A.H.; Project Administration: N.O.M., T.M.; Resources: L.T.N., S.M., Software: R.M.S., A.H.M., T.T.; Validation: A.H.M., R.M.S., L.T.N., S.M., T.T.; Visualisation: R.M.S.; Writing of the original draft: R.M.S., A.H.M.; Reviewing and editing of the manuscript: All authors.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Smith, R.M., Molotsi, A.H., Nesengani, L.T. et al. A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus. Sci Data (2026). https://doi.org/10.1038/s41597-026-07002-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-026-07002-y


