A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

Smith, Rae Marvin; Molotsi, Annelin Henriehetta; Nesengani, Lucky Tendani; Tshilate, Thendo Stanley; Mdyogolo, Sinebongo; Hlongwane, Nompilo Lucia; Masebe, Tracy Madimabi; Djikeng, Appolinaire; Mapholi, Ntanganedzeni Olivia

doi:10.1038/s41597-026-07002-y

Download PDF

Data Descriptor
Open access
Published: 09 March 2026

A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

Scientific Data , Article number: (2026) Cite this article

835 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

The Kolbroek pig is indigenous to South Africa and a breed of choice for smallholder farmers. This is mainly due to its characteristics, such as disease resistance and adaptability to tropical agroecological environments. Despite these desirable traits, the genomic architecture of this breed has not been explored. In this study, we report a high-quality genome assembly of the South African Kolbroek pig sequenced at 31 X coverage through a combination of PacBio Sequel IIe HiFi and Illumina Novaseq 6000 Omni-C sequencing. The assembled genome resulted in a length of 2.6 Gb in size, including 83 Scaffolds, which consist of 19 chromosome-size scaffolds with 138.7 Mb. The BUSCO completeness at 95.5%. Genome annotation and structure prediction identified 22,025 genes with protein-coding potential. The genome provides an opportunity to investigate genetic variation across multiple pig breeds and serves as a genetic resource to develop breeding programs for the conservation and improvement of the Kolbroek pig.

A high-quality chromosome level genome assembly of the South African indigenous Nguni goat (Capra hircus)

Article Open access 31 January 2026

Chromosome-level genome assembly of Huai pig (Sus scrofa)

Article Open access 02 October 2024

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Article Open access 03 August 2024

Code availability

Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.

Data availability

The BioProject is PRJNA1227266. The SRA data may be found via SRP576325¹³ and the GenBank accession for Sus scrofa is GCA_055447695.1³² / JBLUWV000000000; BioSample SAMN46977218. The annotation files are available at Figshare²⁵.

References

Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science 42, 507–510 (2012).
Google Scholar
Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).
Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics 11, 344 (2020).
Google Scholar
Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel) 14, 236 (2024).
Google Scholar
Ramsay, K. R. Sustainable housing: Indicators and implications (2002).
Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England) 1, 317–323 (2007).
Google Scholar
Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics 69, 944–951 (2010).
Google Scholar
Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability 22 (2024).
Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data 11, 840–8 (2024).
Google Scholar
Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.
Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology 42, 367–370 (2024).
Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432–1432 (2020).
Google Scholar
Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).
Google Scholar
Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39 (2023).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology 20, 1 (2019).
Google Scholar
Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, 351 (2005).
Google Scholar
Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.
Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research 12, 1269–1276 (2002).
Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999).
Google Scholar
Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA 5, 13–13 (2014).
Google Scholar
Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England) 40 (2024).
Kolbroek Annotation Files, https://doi.org/10.6084/m9.figshare.28754990 (2025).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 1–27 (2020).
Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution 38, 4647–4654 (2021).
Google Scholar
Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 38, 4214–4216 (2022).
Google Scholar
Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics 10, 1361–1374 (2020).
Google Scholar
Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).
OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes 10, 275–4 (2017).
Google Scholar
Smith, R. M. et al. Kolbroek Assembly. GenBank http://identifiers.org/insdc.gca:GCA_055447695.1 (2026).
Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol. 270, 131796 (2024).
Google Scholar
Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).
Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources 22, 1508–1520 (2022).
Google Scholar
Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).
Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958–e4958 (2018).
Google Scholar

Download references

Acknowledgements

We acknowledge funding and support from the University of South Africa for funding In addition, we acknowledge the Vertebrate Genome Project for their support when executing the pipeline. We would also like to thank the Staff at Inqaba Biotech for completing the library preparation of the collected samples and as well as completing the HiFi Sequencing. We would like to thank the staff at the University of Stellenbosch, where the Omni-C data was produced. We would like to acknowledge Prof Jasper Rees, Dr Sikhumbuzo Mbizeni, and Dr Thivhilaheli Richard Netshirovha for their assistance during sample collection and analyses. We would also like to that Prof Cuthbert Banga for his assistance with critical reading. We would like to acknowledge Galaxy Europe for hosting the data and supplying computing resources. This article forms part of the objectives for the Africa Biogenome Project. A special thanks to the team in Mapholi Labs for the resources and for managing the project.

Author information

Authors and Affiliations

Department of Life and Consumer Sciences, College of Agriculture and Environmental Sciences, University of South Africa, Roodepoort, South Africa
Rae Marvin Smith, Sinebongo Mdyogolo, Tracy Madimabi Masebe & Ntanganedzeni Olivia Mapholi
Department of Agriculture and Animal Health, College of Agriculture and Environmental Sciences, University of South Africa, Roodepoort, South Africa
Annelin Henriehetta Molotsi, Lucky Tendani Nesengani, Thendo Stanley Tshilate, Nompilo Lucia Hlongwane, Appolinaire Djikeng & Ntanganedzeni Olivia Mapholi
International Livestock Research Institute, Nairobi, Kenya
Appolinaire Djikeng

Authors

Rae Marvin Smith
View author publications
Search author on:PubMed Google Scholar
Annelin Henriehetta Molotsi
View author publications
Search author on:PubMed Google Scholar
Lucky Tendani Nesengani
View author publications
Search author on:PubMed Google Scholar
Thendo Stanley Tshilate
View author publications
Search author on:PubMed Google Scholar
Sinebongo Mdyogolo
View author publications
Search author on:PubMed Google Scholar
Nompilo Lucia Hlongwane
View author publications
Search author on:PubMed Google Scholar
Tracy Madimabi Masebe
View author publications
Search author on:PubMed Google Scholar
Appolinaire Djikeng
View author publications
Search author on:PubMed Google Scholar
Ntanganedzeni Olivia Mapholi
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: R.M.S., L.T.N., N.O.M., S.M., A.D.; Data Curation: R.M.S., A.H.M.; Formal Analysis: R.M.S., L.T.N., T.T., S.M., A.H.M.; Funding Acquisition: N.O.M., T.M.; Investigation: R.M.S.; Methodology: R.M.S., L.T.N., S.M., A.H.; Project Administration: N.O.M., T.M.; Resources: L.T.N., S.M., Software: R.M.S., A.H.M., T.T.; Validation: A.H.M., R.M.S., L.T.N., S.M., T.T.; Visualisation: R.M.S.; Writing of the original draft: R.M.S., A.H.M.; Reviewing and editing of the manuscript: All authors.

Corresponding authors

Correspondence to Rae Marvin Smith or Ntanganedzeni Olivia Mapholi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplimentary information (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Smith, R.M., Molotsi, A.H., Nesengani, L.T. et al. A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus. Sci Data (2026). https://doi.org/10.1038/s41597-026-07002-y

Download citation

Received: 09 May 2025
Accepted: 27 February 2026
Published: 09 March 2026
DOI: https://doi.org/10.1038/s41597-026-07002-y