Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 09 March 2026

A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus

  • Rae Marvin Smith  ORCID: orcid.org/0000-0002-1379-39491,
  • Annelin Henriehetta Molotsi2,
  • Lucky Tendani Nesengani  ORCID: orcid.org/0000-0003-2678-84252,
  • Thendo Stanley Tshilate2,
  • Sinebongo Mdyogolo1,
  • Nompilo Lucia Hlongwane2,
  • Tracy Madimabi Masebe  ORCID: orcid.org/0000-0002-1300-90771,
  • Appolinaire Djikeng2,3 &
  • …
  • Ntanganedzeni Olivia Mapholi  ORCID: orcid.org/0000-0003-4507-66691,2 

Scientific Data , Article number:  (2026) Cite this article

  • 835 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Agricultural genetics
  • Genome

Abstract

The Kolbroek pig is indigenous to South Africa and a breed of choice for smallholder farmers. This is mainly due to its characteristics, such as disease resistance and adaptability to tropical agroecological environments. Despite these desirable traits, the genomic architecture of this breed has not been explored. In this study, we report a high-quality genome assembly of the South African Kolbroek pig sequenced at 31 X coverage through a combination of PacBio Sequel IIe HiFi and Illumina Novaseq 6000 Omni-C sequencing. The assembled genome resulted in a length of 2.6 Gb in size, including 83 Scaffolds, which consist of 19 chromosome-size scaffolds with 138.7 Mb. The BUSCO completeness at 95.5%. Genome annotation and structure prediction identified 22,025 genes with protein-coding potential. The genome provides an opportunity to investigate genetic variation across multiple pig breeds and serves as a genetic resource to develop breeding programs for the conservation and improvement of the Kolbroek pig.

Similar content being viewed by others

A high-quality chromosome level genome assembly of the South African indigenous Nguni goat (Capra hircus)

Article Open access 31 January 2026

Chromosome-level genome assembly of Huai pig (Sus scrofa)

Article Open access 02 October 2024

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Article Open access 03 August 2024

Code availability

Genetic analysis was performed on the Galaxy Europe platform (https://usegalaxy.eu) and the workflow that was used is on the Vertebrate Genome Project (VGP) pipeline (https://galaxyproject.org/projects/vgp/workflows/). The tools and their versions associated with the VGP assembly pipeline are listed in Supplementary Table 1. Moreover, additional analyses that were performed are also listed. This pipeline is under development; thus, the versions that were used are specified. Where incompatibility issues were encountered, alternate versions were used, which provided options for the required input data. The VGP group updates its pipelines to incorporate newer versions of software or resolve dependency issues.

Data availability

The BioProject is PRJNA1227266. The SRA data may be found via SRP57632513 and the GenBank accession for Sus scrofa is GCA_055447695.132 / JBLUWV000000000; BioSample SAMN46977218. The annotation files are available at Figshare25.

References

  1. Halimani, T. E., Dzama, K., Chimonyo, M. & Muchadeyi, F. C. Some insights into the phenotypic and genetic diversity of indigenous pigs in southern Africa. South African Journal of Animal Science 42, 507–510 (2012).

    Google Scholar 

  2. Nicholas, G. Kolbroek – the unique local breed. Farmer’s Weekly (1999).

  3. Hlongwane, N. L., Hadebe, K., Soma, P., Dzomba, E. F. & Muchadeyi, F. C. Genome Wide Assessment of Genetic Variation and Population Distinctiveness of the Pig Family in South Africa. Frontiers in genetics 11, 344 (2020).

    Google Scholar 

  4. Hlongwane, N. L. et al. Identification of Signatures of Positive Selection That Have Shaped the Genomic Landscape of South African Pig Populations. Animals (Basel) 14, 236 (2024).

    Google Scholar 

  5. Ramsay, K. R. Sustainable housing: Indicators and implications (2002).

  6. Chimonyo, M. & Dzama, K. Estimation of genetic parameters for growth performance and carcass traits in Mukota pigs. Animal (Cambridge, England) 1, 317–323 (2007).

    Google Scholar 

  7. Halimani, T. E., Muchadeyi, F. C., Chimonyo, M. & Dzama, K. Pig genetic resource conservation: The Southern African perspective. Ecological economics 69, 944–951 (2010).

    Google Scholar 

  8. Mathobela, R. M., Molotsi, A. H., Marufu, M. C., Strydom, P. E. & Mapiye, C. Transitioning opportunities for sub-Saharan Africa’s small-scale urban pig farming towards a sustainable circular bioeconomy. International journal of agricultural sustainability 22 (2024).

  9. Wy, S. et al. Chromosome-level genome assembly of the Korean minipig (Sus scrofa). Sci Data 11, 840–8 (2024).

    Google Scholar 

  10. Andrews, S. FastQC A Quality Control tool for High Throughput Sequence Data.

  11. Larivière, D. et al. Scalable, accessible and reproducible reference genome assembly and evaluation in Galaxy. Nature biotechnology 42, 367–370 (2024).

    Google Scholar 

  12. Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat Commun 11, 1432–1432 (2020).

    Google Scholar 

  13. Smith, R. M. et al. Kolbroek HIFI and Omni-C. https://identifiers.org/ncbi/insdc.sra:SRP576325 (2025).

  14. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods 18, 170–175 (2021).

    Google Scholar 

  15. Vasimuddin, M., Misra, S., Li, H. & Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems, IEEE, May 2019).

  16. Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39 (2023).

  17. Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biology 20, 1 (2019).

    Google Scholar 

  18. Harry, E. PretextView (Paired REad TEXTure Viewer): A desktop application for viewing pretext contact maps.

  19. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21, 351 (2005).

    Google Scholar 

  20. Brown, T. et al. Genome Annotation and Other Post-Assembly Workflows for the Tree of Life.

  21. Bao, Z. & Eddy, S. R. Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Research 12, 1269–1276 (2002).

    Google Scholar 

  22. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Research 27, 573–580 (1999).

    Google Scholar 

  23. Bailly-Bechet, M., Haudry, A. & Lerat, E. “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mobile DNA 5, 13–13 (2014).

    Google Scholar 

  24. Gabriel, L., Becker, F., Hoff, K. J. & Stanke, M. Tiberius: end-to-end deep learning with an HMM for gene prediction. Bioinformatics (Oxford, England) 40 (2024).

  25. Kolbroek Annotation Files, https://doi.org/10.6084/m9.figshare.28754990 (2025).

  26. Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biology 21, 1–27 (2020).

    Google Scholar 

  27. Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Molecular Biology and Evolution 38, 4647–4654 (2021).

    Google Scholar 

  28. Formenti, G. et al. Gfastats: conversion, evaluation and manipulation of genome sequences using assembly graphs. Bioinformatics 38, 4214–4216 (2022).

    Google Scholar 

  29. Challis, R., Richards, E., Rajan, J., Cochrane, G. & Blaxter, M. BlobToolKit – Interactive Quality Assessment of Genome Assemblies. G3: genes - genomes - genetics 10, 1361–1374 (2020).

    Google Scholar 

  30. Sscrofa11.1: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_000003025.6 (2017).

  31. OHalloran, D. M. fastQ_brew: module for analysis, preprocessing, and reformatting of FASTQ sequence data. BMC Res Notes 10, 275–4 (2017).

    Google Scholar 

  32. Smith, R. M. et al. Kolbroek Assembly. GenBank http://identifiers.org/insdc.gca:GCA_055447695.1 (2026).

  33. Wang, Y. et al. A chromosome-level genome of Chenghua pig provides new insights into the domestication and local adaptation of pigs. Int. J. Biol. Macromol. 270, 131796 (2024).

    Google Scholar 

  34. Chenghua: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_037447515.1 (2024).

  35. Ma, H. et al. Long‐read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Molecular ecology resources 22, 1508–1520 (2022).

    Google Scholar 

  36. Ningxiang: NCBI GenBank. http://identifiers.org/insdc.gca:GCA_020567905.1 (2021).

  37. Cabanettes, F. & Klopp, C. D-GENIES: dot plot large genomes in an interactive, efficient and simple way. PeerJ 6, e4958–e4958 (2018).

    Google Scholar 

Download references

Acknowledgements

We acknowledge funding and support from the University of South Africa for funding In addition, we acknowledge the Vertebrate Genome Project for their support when executing the pipeline. We would also like to thank the Staff at Inqaba Biotech for completing the library preparation of the collected samples and as well as completing the HiFi Sequencing. We would like to thank the staff at the University of Stellenbosch, where the Omni-C data was produced. We would like to acknowledge Prof Jasper Rees, Dr Sikhumbuzo Mbizeni, and Dr Thivhilaheli Richard Netshirovha for their assistance during sample collection and analyses. We would also like to that Prof Cuthbert Banga for his assistance with critical reading. We would like to acknowledge Galaxy Europe for hosting the data and supplying computing resources. This article forms part of the objectives for the Africa Biogenome Project. A special thanks to the team in Mapholi Labs for the resources and for managing the project.

Author information

Authors and Affiliations

  1. Department of Life and Consumer Sciences, College of Agriculture and Environmental Sciences, University of South Africa, Roodepoort, South Africa

    Rae Marvin Smith, Sinebongo Mdyogolo, Tracy Madimabi Masebe & Ntanganedzeni Olivia Mapholi

  2. Department of Agriculture and Animal Health, College of Agriculture and Environmental Sciences, University of South Africa, Roodepoort, South Africa

    Annelin Henriehetta Molotsi, Lucky Tendani Nesengani, Thendo Stanley Tshilate, Nompilo Lucia Hlongwane, Appolinaire Djikeng & Ntanganedzeni Olivia Mapholi

  3. International Livestock Research Institute, Nairobi, Kenya

    Appolinaire Djikeng

Authors
  1. Rae Marvin Smith
    View author publications

    Search author on:PubMed Google Scholar

  2. Annelin Henriehetta Molotsi
    View author publications

    Search author on:PubMed Google Scholar

  3. Lucky Tendani Nesengani
    View author publications

    Search author on:PubMed Google Scholar

  4. Thendo Stanley Tshilate
    View author publications

    Search author on:PubMed Google Scholar

  5. Sinebongo Mdyogolo
    View author publications

    Search author on:PubMed Google Scholar

  6. Nompilo Lucia Hlongwane
    View author publications

    Search author on:PubMed Google Scholar

  7. Tracy Madimabi Masebe
    View author publications

    Search author on:PubMed Google Scholar

  8. Appolinaire Djikeng
    View author publications

    Search author on:PubMed Google Scholar

  9. Ntanganedzeni Olivia Mapholi
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Conceptualization: R.M.S., L.T.N., N.O.M., S.M., A.D.; Data Curation: R.M.S., A.H.M.; Formal Analysis: R.M.S., L.T.N., T.T., S.M., A.H.M.; Funding Acquisition: N.O.M., T.M.; Investigation: R.M.S.; Methodology: R.M.S., L.T.N., S.M., A.H.; Project Administration: N.O.M., T.M.; Resources: L.T.N., S.M., Software: R.M.S., A.H.M., T.T.; Validation: A.H.M., R.M.S., L.T.N., S.M., T.T.; Visualisation: R.M.S.; Writing of the original draft: R.M.S., A.H.M.; Reviewing and editing of the manuscript: All authors.

Corresponding authors

Correspondence to Rae Marvin Smith or Ntanganedzeni Olivia Mapholi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplimentary information (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Smith, R.M., Molotsi, A.H., Nesengani, L.T. et al. A chromosome-level genome assembly of the South African indigenous, Kolbroek pig, Sus scrofa domesticus. Sci Data (2026). https://doi.org/10.1038/s41597-026-07002-y

Download citation

  • Received: 09 May 2025

  • Accepted: 27 February 2026

  • Published: 09 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07002-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing