A chromosome-scale reference genome of the Banna miniature inbred pig

Chen, Hong-Man; Xu, Kai-Xiang; Yan, Chen; Zhao, Heng; Jiao, De-Ling; Si, Si; Liu, Zheng-Xi; Peng, Guo-Ying; Jamal, Muhammad Ameen; Lv, Min-Juan; Wang, Pei; Zeng, Yang-Zhi; Zhao, Hong-Ye; Wang, Ming-Shan; Wei, Hong-Jiang

doi:10.1038/s41597-024-04201-3

Download PDF

Data Descriptor
Open access
Published: 18 December 2024

A chromosome-scale reference genome of the Banna miniature inbred pig

Hong-Man Chen^1,2,3^na1,
Kai-Xiang Xu^1,3^na1,
Chen Yan ORCID: orcid.org/0000-0002-9238-4499⁴,
Heng Zhao^1,3,
De-Ling Jiao^1,3,
Si Si⁴,
Zheng-Xi Liu⁴,
Guo-Ying Peng^1,2,
Muhammad Ameen Jamal^1,4,
Min-Juan Lv^1,2,
Pei Wang^1,3,
Yang-Zhi Zeng¹,
Hong-Ye Zhao^1,3,
Ming-Shan Wang^4,5 &
…
Hong-Jiang Wei ORCID: orcid.org/0000-0002-5663-1093^1,2,3,4

Scientific Data volume 11, Article number: 1345 (2024) Cite this article

4264 Accesses
4 Citations
2 Altmetric
Metrics details

Subjects

Abstract

The Banna miniature inbred pig (BN) is an intensively inbred line for biomedical research and xenotransplantation due to its low individual variation and stable genetic background. Although it is originated from the Diannan miniature pig (DN), substantial genetic changes have actually occurred. However, the lack of a BN reference genome has limited studies on the complete genomic architecture and utilization as a biomedical model. Here, we present a high-quality genome for BN using PacBio HiFi and Hi-C sequencing technologies, with a total length of 2.66 Gb, a scaffold N50 of 143.60 Mb, and 97.59% of the sequences anchored to chromosomes. Its BUSCO score is 96.30%, higher than porcine reference assembly and DN. The genome contains 48.49% of repeats, 19,756 protein-coding genes, and 7,207 non-coding RNAs according to our annotation. The OMArk score shows a proteome completeness and consistency of 99.58% and 93.62%, respectively. These findings indicate that the chromosome-scale genome of BN provides a valuable resource for studying the genetic basis of inbreeding, facilitating further research and clinical applications.

A chromosome-level genome assembly of the Korean minipig (Sus scrofa)

Article Open access 03 August 2024

A high-quality chromosome-level genome assembly and annotation of Anqing Six-end-white pig (Sus scrofa)

Article Open access 20 October 2025

Long- and short-read RNA sequencing from five reproductive organs of boar

Article Open access 05 October 2023

Background & Summary

Miniature pigs play a critical role as a source of protein for humans, and represent a promising alternative solution to the shortage of human organs for transplantation. Their relatively small body size, physiological similarities to humans and potential for genetic manipulation make them ideal candidates for xenotransplantation^1,2. The breakthrough of knocking out the PERV gene in Bama miniature pigs³, along with the development of 13-gene edited miniature pigs in 2021⁴, represents the pinnacle of genetic engineering to create suitable organ donor pigs. Since 2022, several commercialized pig-to-human heart and kidney xenotransplantations have been successfully performed in the United States^5,6,7,8, marking significant progress toward clinical application. Despite the advances that have attracted worldwide attention, concerns remain here regarding the large size of commercial pigs, and the cross-match compatibility between pig cells and human cells⁹.

A step forward in the use of miniature pigs as organ donors has been achieved on May 17, 2024, when a genetically engineered liver from the Diannan miniature pig (DN) was successfully transplanted into a 71-year-old Chinese patient suffering from liver cancer¹⁰. The Banna miniature inbred pig (BN), developed from the inbreeding of the DN, a breed native to southwest China, offers inherent advantages for low individual variation along with its close resemblance of anatomical and physiological characteristics to humans^1,11,12. With more than 40 years of inbreeding history, the BN pigs provide a valuable model for biomedical research and xenotransplantation^1,11,12,13. Nevertheless, there is a dearth of genomic studies on BN pigs, and the lack of a high-quality genome for BN hampers their widespread use as a biological biomedical model and makes it more difficult to fully comprehend their genetic architecture.

Recent developments in bioinformatics tools and sequencing technologies, along with decreasing sequencing costs, have significantly advanced porcine genomics research. Here, we performed a genome assembly of the BN by integrating high-throughput chromosome conformation capture (Hi-C) in conjunction with PacBio high-fidelity (HiFi) data. We obtained approximately 92 Gb of HiFi clean reads with an average length of 14.00 Kb, achieving a sequencing depth of 36.80× (based on a genome size of 2.5 Gb), and 142 Gb (56.80×) of Hi-C clean reads. This resulted in a contig-level genome of 296 contigs with a contig N50 of 94.41 Mb. After scaffolding and final assembly, we produced a high-quality chromosomal genome of 2.66 Gb with 44 scaffolds, and a scaffold N50 of 143.60 Mb (Sscrofa11.1, 138.97 Mb). The final assembled genome size of BN was comparable to published porcine reference genomes (2.5 Gb of Sscrofa11.1, Duroc) and DN (2.65 Gb), with an anchoring rate of 97.59%, and strong collinearity with the Duroc genome. Additionally, the final BN genome had a higher scaffold N90 value and longer average and maximum scaffold lengths compared to Duroc and DN. Using the BUSCO scoring method, 96.3% of the 9,226 core genes in the mammalia_odb10 dataset were completely assembled in the BN genome.

We detected 1.29 Gb of repeat sequences which account for 48.49% of the genome, of which 99.63% were classified as known repeat elements. The annotation of the BN genome predicted a total of 19,756 protein-coding genes (PCGs), which was slightly lower than those already identified in Duroc (22,063 PCGs) and DN (21,447 PCGs). Evaluation using OMArk revealed that 99.58% of the annotated genes were complete, of which 93.62% showed consistent lineage placement, thus demonstrating high assembly completeness and annotation accuracy. In addition, 7207 non-coding RNAs (ncRNAs) were annotated in the BN genome. We also determined the mitochondrial (MT) genome of BN with a length of 16,711 bp and annotated 37 genes. This high-quality chromosome assembly of the BN genome establishes a robust cornerstone for understanding inbreeding and advancing research.

Methods

Sample collection and sequencing

The Ethics Committee of Yunnan Agricultural University approved our sampling pipeline for animal experimental procedures (Approval No. 202105009). A 33-day old male fetal sample (0028A3) from the A3 family of BN was collected from the BN farm in Jinghong City, Xishuangbanna Prefecture, Yunnan Province, China. The collected tissues were rapidly frozen in liquid nitrogen, and then stored at −80 °C until use. Whole genomic DNA was isolated from muscle by the standard phenol-chloroform protocol.

The extracted genomic DNA was purified, concentrated, and quantified using Nanodrop 1000 and Qubit assays before being prepared into the SMRTbell library. The library was sequenced on the PacBio Sequel® II systems utilizing continuous long read mode according to the official protocols at Novogene (Tianjin, China). Raw sequencing data was generated using the third-generation Circular Consensus Sequencing (CCS) default parameters of SMRT Link v8.0 (https://github.com/PacificBiosciences/pbcommand) to produce highly accurate HiFi reads.

The Hi-C library was prepared by cell cross-linking, tissue lysis, chromatin digestion with the enzyme MboI, proximal chromatin DNA ligation, biotin labeling, and DNA purification. The ends of sheared fragments with insert sizes ranging from 300 to 500 base pairs (bp) were repaired and specifically enriched. The Hi-C sequencing libraries were then amplified, quality controlled, and sequenced on the Illumina NovaSeq. 6000 platform at Novogene (Tianjin, China). After sequencing, adapter sequences were trimmed, and low-quality paired-end reads were removed using the fastp v0.22.0¹⁴ with default settings.

Genome assembly

For the contig assembly of the BN genome, we obtained 92 Gb (36.80×, estimated by genome size of 2.5 Gb) of PacBio HiFi clean reads, with a read N50 of 14.00 Kb (the average read length of 13.99 kb and the longest read of 42.35 Kb), and 142 Gb (56.80×) of Hi-C clean reads. These data were assembled using the Hi-C integrated assembly approach of Hifiasm v0.16.1-r375¹⁵ with default settings except for “-t 128 -r 9” (Fig. 1a). The preliminary assembly genome at contig-level contained 296 contigs with a contig N50 of 94.41 Mb, the longest contig of 204.73 Mb, and a total size of 2.66 Gb (Table 1). For scaffolding, the Hi-C clean reads were aligned to the contig-level assembly using Bwa v0.7.18¹⁶ with default parameters, and the SAM conversion to BAM using SAMtools v1.18¹⁷, which generated uniquely mapped paired-end reads. Subsequently, we used YaHS v1.2a.1¹⁸ with “-e GATC” to generate a scaffold-level draft genome with an N50 of 142.98 Mb.

Table 1 Summary statistics for the BN assembly.

Full size table

For the MT genome, the MT sequence (NC_000845.1) of the reference Sscrofa11.1 was retrieved from the National Center for Biotechnology Information (NCBI, https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000003025.6/). We employed MitoHiFi v3.2.1¹⁹ for MT assembly based on the draft scaffolds with default parameters (except for “-t 32 -d -o 2”), yielding a final MT genome with a length of 16,711 bp, which had 37 genes (two rRNAs, 22 tRNAs, and 13 protein-coding genes), and GC content of 39.59% (Fig. 1b). The MT genome map was visualized by CGView²⁰.

For final assembly, we employed RaGoo²¹ (parameters: “-t 128 -b -C”) and manual corrections to accurately anchor scaffolds to 20 chromosomes by addressing large-scale inversions and translocations. We generated a Hi-C heatmap with strong interactions within 20 chromosome-level scaffolds using HiC-Pro v3.1.0²² and visualized it by EndHiC²³ (Fig. 1c). Finally, we obtained a high-quality chromosome-level genome consisting of 44 scaffolds, including 20 chromosomes with an anchoring rate of 97.59%, with a size of 2.66 Gb, an N50 length of 143.60 Mb, an N95 length of 58.41 Mb, longest scaffold of 292.93 Mb, and a GC content of 42.57% (Fig. 2a–c, Tables 1 and 2). Of the 9,226 BUSCO groups, we identified 8,880 complete BUSCOs (96.3%), including 8,845 complete and single-copy BUSCOs (95.9%), and 35 complete and duplicated BUSCOs (0.4%). Fragmented BUSCOs and missing BUSCOs were 72 (0.8%) and 274 (2.9%), respectively.

Table 2 Genome assembly statistics of BN as compared to its close relative DN and Duroc.

Full size table

Assessment of the genome assembly

Firstly, Merqury v.1.3²⁴ was applied to estimate the consensus quality value (QV) and the k-mer completeness score. The optimal k-mer length was first determined as 21. The 21-kmer dataset was generated from the HiFi clean reads using the Meryl v 1.3.0²⁴. We then evaluated Merqury’s QV based on the 21-kmer dataset and the primary assembly, and found a k-mer completeness of 98.37 and a k-mer-based QV of 71.64, demonstrating a high completeness and accuracy (Fig. 1d). Secondly, we evaluated the BUSCO completeness using BUSCO v5.5.0^25,26 with the mammalia_odb10 database. The assembled contigs exhibited 96.2% completeness for conserved genes, while the final scaffolds achieved 96.3% with a total length of 2.66 Gb and a GC content of 42.57% (Fig. 2, Table 1). In the Hi-C heatmap, there was a well-organized interaction pattern within the chromosomal region, indicating that the final scaffolds were assembled without apparent large-scale structural errors (Fig. 1c). Additionally, the final assembly in scaffolds achieved the highest completeness in BUSCO assessments, along with the longest N50, N90, N95, average, and maximum scaffold lengths compared to the Duroc and DN assemblies. (Table 1).

Repeat element annotation

The repetitive elements of the BN genome were annotated by the automated Earl Grey TE annotation pipeline v4.1.0²⁷ with “-g genome -s ‘sus_scrofa’ -r ‘sus_scrofa’ -o OutDir -d yes -t 128”, configured with RepeatMasker v4.1.5²⁸ with Dfam v3.7²⁹, RepeatModeler v2.0.5³⁰, RepeatScout v1.0.6³¹, Tandem Repeat Finder v4.09³², RECON v1.08³³, LTR _retriever v2.9.4³⁴, and LTR_FINDER³⁵. In total, 1.29 Gb (48.49%) of the final chromosome-level genome was masked as repeats. Of these, retroelements were the major type, occupying 36.98% (982.26 Mb) of the genome, with long interspersed repetitive elements (LINEs) being the most abundant at 29.98% (796.33 Mb), followed by long terminal repeats (LTRs) at 5.41% (143.68 Mb), short interspersed repetitive elements (SINEs) at 1.59% (42.24 Mb), and penelope elements at 0.00% (14.18 Kb). In addition, DNA transposons accounted for 2.30% (61.01 Mb), rolling circle elements for 0.01% (0.29 Mb), unclassified elements for 0.37% (9.81 Mb), and other elements for 8.84% (234.79 Mb) (Fig. 3a).

Protein-coding genes prediction

For the gene prediction, LiftOn v1.0.2³⁶ with parameters for “-g ref.GFF -o genome.gff3 -copies -sc 0.95 genome ref.fasta -t 96” was used, which is an excellent genome annotation tool within the same species. We identified 23,853 PCGs based on the latest release of the Sscrofa 11.1.112 GFF3 file from Ensembl (https://www.ensembl.org/Sus_scrofa/Info/Index), with an average gene length of 44,630.63 bp (Fig. 3b). The average number of coding sequences (CDSs, mean length: 159.36 bp), and exons (mean length: 272.65 bp) in each gene was 21.15 and 22.68, respectively. We then assessed the predicted protein gene sequences for BUSCO completeness using OMArk v0.3.0³⁷. The results demonstrated 99.16% completeness (n: 13,050 OMArk), consisting of 42.04% single-copy BUSCOs, 57.12% duplicated BUSCOs, and only 0.84% missing BUSCOs, achieving the same level of completeness as the reference genome (Fig. 3b). The proteome consistent assessment showed high consistency with 97.64% consistent lineage placement, 1.16% inconsistent lineage placement, 1.2% unknown, and no contamination.

To better reveal the unique genomic characteristics of the BN pig, we further predicted gene models using a combined pipeline with Braker3 v3.0.8³⁸ and Gene Model Mapper v1.9³⁹ (GeMoMa; Fig. 1a). First, we prepared RNA-seq clean reads by using fastp to process 100 transcriptome raw datasets from various tissues of Sus scrofa, which were downloaded from the NCBI database (Table S1). These reads were then aligned to the genome using hisat2 v2.2.1⁴⁰ and sorted with SAMtools. Next, we used the RNA-seq data aligned to the target genome along with a self-curated protein dataset for gene prediction and identification in Braker3. The protein dataset, used for homology-based annotation, was integrated from all non-redundant vertebrate protein sequences in OrthoDB v11⁴¹, Artiodactyla proteins (accessed from NCBI on March 29, 2024), and pig reference proteins (GCF_000003025.6). Seven reference genomes were then selected: Pig (GCF_000003025.6), Human (GCF_000001405.40), Mouse (GCF_000001635.27), Cattle (GCF_002263795.3), Horse (GCF_002863925.1), Goat (GCF_001704415.2), and Dog (GCF_011100685.1). These reference genomes and their associated annotations, the Braker3 results, and all mapped RNA-seq BAM data were used to improve the annotation with GeMoMa. As a result, we annotated 19,756 protein-coding genes (PCGs) with an average length of 10.26 Kb, of which 19,271 genes had at least one untranslated region (UTR) identified (Fig. 3b, Table 1). Functional annotation using the online eggNOG-mapper v2.1.12⁴² revealed that 19,460 genes were successfully annotated with functional information. Moreover, OMArk evaluation showed that 99.58% of the annotated genes were intact, with 93.62% maintaining consistent lineage positions, underscoring the completeness of the assembly and the accuracy of the annotation.

Non-coding RNAs annotation

The microRNA (miRNA), small nucleolar RNA (snRNA), small RNA (sRNA), long non-coding RNA (lncRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA) were annotated by Infernal v1.1.433⁴³ with Rfam database. In total, we predicted 7207 non-coding RNAs (ncRNAs), including 911 miRNAs, 2145 snRNAs, nine sRNAs, 174 lncRNAs, 3013 tRNAs, 453 rRNAs, four frameshift elements, 21 IRESs, ten ribozymes, and 467 other ncRNAs (Fig. 3c).

Genome collinearity analysis

Genomic collinearity analyses of the CDSs were conducted between BN (contigs, draft scaffolds, final scaffolds) and the reference genome Sscrofa 11.1 using JCVI v1.4.15⁴⁴ with the LiftOn annotation results (Fig. 4a–c). We observed that as the assembly quality of the BN genome improved, its clustering, continuity, and collinearity improved significantly, resulting in a high-quality genome at the chromosome level. Furthermore, genome synteny assessment of BN and reference was performed using the MUMmer v4.0.0 with default settings⁴⁵ (Fig. 4d).

To validate the high collinearity between BN and Duroc, we used the genomes of four other pig breeds (including DN⁴⁶, Cross-bred⁴⁷, Ossabaw⁴⁸, and Nanchukmacdon (NCMD)⁴⁹) with genomic annotations for further collinearity comparison (Fig. 5). The results revealed that the strongest collinearity of BN with Duroc, followed by NCMD, Cross-bred, Ossabaw, and DN pigs, which indicated a high-quality chromosome-scale pig genome of our final scaffold assembly. Overall, the strong collinearity found in the CDSs, and the genome collinearity underscored the high quality of the BN sequencing and assembly. These results confirmed the accuracy of the genome assembly and annotation.

Data Records

The PacBio clean reads (92 Gb; 36.80×) and the Hi-C clean reads (142 Gb; 56.80×) have been deposited in the Genome Sequence Archive (GSA)⁵⁰ under the accession number CRA017752 with the run accession numbers CRR1232679-CRR1232683⁵¹. The final chromosome assembly has been deposited in the Genome Warehouse (GWH)⁵² of the National Genomics Data Center (NGDC) under the BioProject number PRJCA025149 with the accession number GWHEUVB00000000.1⁵³ and NCBI GenBank under the accession JBGNFW000000000⁵⁴. The annotations of the genome have been deposited in the Science Data Bank⁵⁵ and FigShare⁵⁶.

Technical Validation

The completeness and accuracy of the BN genome assembly and annotation were evaluated by multiple methods. With respect to the completeness of the genome assembly, the BUSCO analysis using the “mammal_odb10” dataset showed that the total genome size was 2.66 Gb, with a contig N50 of 94.41 Mb, and a scaffold N50 of 143.60 Mb, which was clearly higher than the 138.97 Mb for Duroc and 137.35 Mb for DN, showcasing superior continuity and quality (Fig. 2, Table 1). Our final genome also boasted the longest total length of non-N bases, the fewest number of scaffolds, and the highest anchoring rate (97.59%) compared with Duroc (97.34%) and DN (90.45%) (Table 2). The contact map of Hi-C interaction for our BN genome assembly revealed 20 scaffolds at the chromosome level, indicating a high accuracy of the genome assembly (Fig. 1c). To evaluate the completeness and consistency of the annotation, OMArk analysis was conducted using the Artiodactyls as the ancestral clade. As expected, compared to Duroc (99.2% completeness and 98.35% consistency) and DN (92.67% completeness and 95.11% consistency), the annotated protein-coding gene models of BN were more complete (99.16% and 99.58%) and more consistent (97.64% and 93.62% in gene classification and structure) using either LiftOn or Braker3 with GeMoMa annotation (Fig. 3b). This indicates that the gene content was not only complete but also of very high overall quality. We utilized JCVI for CDSs collinearity analysis to compare the contigs, draft scaffolds, and final assembly of BN with Duroc, demonstrating strong alignment at the chromosome level (Fig. 4a–c). We also employed MUMmer for whole-genome collinearity analysis, which confirmed the high accuracy of our assembly quality (Fig. 4d). Furthermore, collinearity analysis with other four other publicly available porcine genomes with PCGs annotation revealed that BN had the highest synteny and consistency with the reference genome (Fig. 5). Collectively, these metrics demonstrated that the assembly outcomes of BN were meticulously executed, indicating superior assembly and annotation quality.

Code availability

All software and pipelines used in this study were performed according to the guidelines and protocols of the respective published bioinformatics tools. Software versions and parameters are clearly described in the Methods section. Default parameters were used unless explicitly stated. No custom code was generated for these analyses.

References

Wei, H. J. et al. Comparison of the efficiency of Banna miniature inbred pig somatic cell nuclear transfer among different donor cells. PLoS One 8, e57728, https://doi.org/10.1371/journal.pone.0057728 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. Development and Genome Sequencing of a Laboratory-Inbred Miniature Pig Facilitates Study of Human Diabetic Disease. iScience 19, 162–176, https://doi.org/10.1016/j.isci.2019.07.025 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Niu, D. et al. Inactivation of porcine endogenous retrovirus in pigs using CRISPR-Cas9. Science 357, 1303–1307, https://doi.org/10.1126/science.aan4187 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Yue, Y. N. et al. Extensive germline genome engineering in pigs. Nat Biomed Eng 5, 134–143, https://doi.org/10.1038/s41551-020-00613-9 (2021).
Article CAS PubMed Google Scholar
Pan, W. et al. Cellular dynamics in pig-to-human kidney xenotransplantation. Med, https://doi.org/10.1016/j.medj.2024.05.003 (2024).
Schmauch, E. et al. Integrative multi-omics profiling in human decedents receiving pig heart xenografts. Nat Med 30, 1448–1460, https://doi.org/10.1038/s41591-024-02972-1 (2024).
Article CAS PubMed Google Scholar
Griffith, B. P. et al. Genetically Modified Porcine-to-Human Cardiac Xenotransplantation. N Engl J Med 387, 35–44, https://doi.org/10.1056/NEJMoa2201422 (2022).
Article CAS PubMed PubMed Central Google Scholar
Montgomery, R. A. et al. Results of Two Cases of Pig-to-Human Kidney Xenotransplantation. N Engl J Med 386, 1889–1898, https://doi.org/10.1056/NEJMoa2120238 (2022).
Article CAS PubMed Google Scholar
Cheung, M. D. et al. Spatiotemporal immune atlas of a clinical-grade gene-edited pig-to-human kidney xenotransplant. Nat Commun 15, 3140, https://doi.org/10.1038/s41467-024-47454-7 (2024).
Article ADS CAS PubMed PubMed Central Google Scholar
Smriti, M. First pig-to-human liver transplant recipient “doing very well”. Nature 630, 18–18, https://doi.org/10.1038/d41586-024-01613-4 (2024).
Article CAS Google Scholar
Liu, Z. et al. Long- and short-read RNA sequencing from five reproductive organs of boar. Sci Data 10, 678, https://doi.org/10.1038/s41597-023-02595-0 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, P. et al. Transcriptomic analysis of testis and epididymis tissues from Banna mini-pig inbred line boars with single-molecule long-read sequencing. †. Biol Reprod 108, 465–478, https://doi.org/10.1093/biolre/ioac216 (2023).
Article PubMed Google Scholar
Sun, X. et al. Study of renal function matching between Banna Minipig Inbred line and human. Transplant Proc 36, 2488–9, https://doi.org/10.1016/j.transproceed.2004.07.060 (2004).
Article CAS PubMed Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890, https://doi.org/10.1093/bioinformatics/bty560 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat Methods 18, 170–175, https://doi.org/10.1038/s41592-020-01056-5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–51, https://doi.org/10.1093/bioinformatics/btu356 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–9, https://doi.org/10.1093/bioinformatics/btp352 (2009).
Article CAS PubMed PubMed Central Google Scholar
Zhou, C., McCarthy, S. A. & Durbin, R. YaHS: yet another Hi-C scaffolding tool. Bioinformatics 39 https://doi.org/10.1093/bioinformatics/btac808 (2023).
Uliano-Silva, M. et al. MitoHiFi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads. BMC Bioinf. 24, 288, https://doi.org/10.1186/s12859-023-05385-y (2023).
Article CAS Google Scholar
Stothard, P. & Wishart, D. S. Circular genome visualization and exploration using CGView. Bioinformatics 21, 537–9, https://doi.org/10.1093/bioinformatics/bti054 (2005).
Article CAS PubMed Google Scholar
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol 20, 224, https://doi.org/10.1186/s13059-019-1829-6 (2019).
Article PubMed PubMed Central Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16, 259, https://doi.org/10.1186/s13059-015-0831-x (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, S. et al. EndHiC: assemble large contigs into chromosome-level scaffolds using the Hi-C links from contig ends. BMC Bioinf. 23, 528, https://doi.org/10.1186/s12859-022-05087-x (2022).
Article CAS Google Scholar
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21, 245, https://doi.org/10.1186/s13059-020-02134-9 (2020).
Article CAS PubMed PubMed Central Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–2, https://doi.org/10.1093/bioinformatics/btv351 (2015).
Article CAS PubMed Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol Biol Evol 38, 4647–4654, https://doi.org/10.1093/molbev/msab199 (2021).
Article CAS PubMed PubMed Central Google Scholar
Baril, T., Galbraith, J. & Hayward, A. Earl Grey: A Fully Automated User-Friendly Transposable Element Annotation and Analysis Pipeline. Mol Biol Evol 41, https://doi.org/10.1093/molbev/msae068 (2024).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics Chapter 4, 4.10.1–4.10.14, https://doi.org/10.1002/0471250953.bi0410s25 (2009).
Article PubMed Google Scholar
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 12, 2, https://doi.org/10.1186/s13100-020-00230-y (2021).
Article CAS PubMed PubMed Central Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci USA 117, 9451–9457, https://doi.org/10.1073/pnas.1921046117 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–8, https://doi.org/10.1093/bioinformatics/bti1018 (2005).
Article CAS PubMed Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–80, https://doi.org/10.1093/nar/27.2.573 (1999).
Article CAS PubMed PubMed Central Google Scholar
Bao, Z. & Eddy, S. R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12, 1269–76, https://doi.org/10.1101/gr.88502 (2002).
Article CAS PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol 176, 1410–1422, https://doi.org/10.1104/pp.17.01310 (2018).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35, W265–8, https://doi.org/10.1093/nar/gkm286 (2007).
Article PubMed PubMed Central Google Scholar
Chao, K. H. et al. Combining DNA and protein alignments to improve genome annotation with LiftOn. bioRxiv, https://doi.org/10.1101/2024.05.16.593026 (2024).
Nevers, Y. et al. Quality assessment of gene repertoire annotations with OMArk. Nat Biotechnol, https://doi.org/10.1038/s41587-024-02147-w (2024).
Gabriel, L. et al. BRAKER3: Fully automated genome annotation using RNA-seq and protein evidence with GeneMark-ETP, AUGUSTUS and TSEBRA. bioRxiv https://doi.org/10.1101/2023.06.10.544449 (2024).
Article PubMed PubMed Central Google Scholar
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol Biol 1962, 161–177, https://doi.org/10.1007/978-1-4939-9173-0_9 (2019).
Article CAS PubMed Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol 37, 907–915, https://doi.org/10.1038/s41587-019-0201-4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res 51, D445–d451, https://doi.org/10.1093/nar/gkac998 (2023).
Article CAS PubMed Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale. Mol Biol Evol 38, 5825–5829, https://doi.org/10.1093/molbev/msab293 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–5, https://doi.org/10.1093/bioinformatics/btt509 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tang, H., Bowers, J. E., Wang, X., Ming, R., Alam, M. & Paterson, A. H. Synteny and collinearity in plant genomes. Science 320, 486–8, https://doi.org/10.1126/science.1153917 (2008).
Article ADS CAS PubMed Google Scholar
Marçais, G., Delcher, A. L., Phillippy, A. M., Coston, R., Salzberg, S. L. & Zimin, A. MUMmer4: A fast and versatile genome alignment system. PLoS Comput Biol 14, e1005944, https://doi.org/10.1371/journal.pcbi.1005944 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Xie, H. B. et al. African Suid Genomes Provide Insights into the Local Adaptation to Diverse African Environments. Mol. Biol. Evol. 39 https://doi.org/10.1093/molbev/msac256 (2022).
Warr, A. et al. An improved pig reference genome sequence to enable pig genetics and genomics research. GigaScience 9 https://doi.org/10.1093/gigascience/giaa051 (2020).
Zhang, Y., Fan, G., Liu, X., Skovgaard, K., Sturek, M. & Heegaard, P. M. H. The genome of the naturally evolved obesity-prone Ossabaw miniature pig. iScience 24, 103081, https://doi.org/10.1016/j.isci.2021.103081 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Kwon, D. et al. A chromosome-level genome assembly of the Korean crossbred pig Nanchukmacdon (Sus scrofa). Sci Data 10, 761, https://doi.org/10.1038/s41597-023-02661-7 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. et al. The Genome Sequence Archive Family: Toward Explosive Data Growth and Diverse Data Types. Genomics Proteomics Bioinformatics 19, 578–583, https://doi.org/10.1016/j.gpb.2021.08.001 (2021).
Article PubMed PubMed Central Google Scholar
Genome Sequence Archive (GSA) https://ngdc.cncb.ac.cn/gsa/browse/CRA017752 (2024).
Chen, M. et al. Genome Warehouse: A Public Repository Housing Genome-scale Data. Genomics Proteomics Bioinformatics 19, 584–589, https://doi.org/10.1016/j.gpb.2021.04.001 (2021).
Article PubMed PubMed Central Google Scholar
Genome Warehouse https://ngdc.cncb.ac.cn/gwh/Assembly/85936/show (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_041937265.1 (2024).
Chen, H. M. & Wei, H. J. Annotation of the genome for the Banna miniature inbred pig. Science Data Bank https://doi.org/10.57760/sciencedb.10833 (2024).
Chen, H. M. & Wei, H. J. The annotation of the genome for the Banna miniature inbred pig. Figshare https://doi.org/10.6084/m9.figshare.26537974.v1 (2024).

Download references

Acknowledgements

This work was supported by grants from the Major Science and Technology Project of Yunnan Province (202102AA100054, 202102AA310047), the Yunnan Fundamental Research Projects (202301AW070012), Yunnan Province (202305AH340006). We are also grateful to all individuals and institutions who have supported this project.

Author information

These authors contributed equally: Hong-Man Chen, Kai-Xiang Xu.

Authors and Affiliations

Yunnan Province Key Laboratory for Porcine Gene Editing and Xenotransplantation, Yunnan Agricultural University, Kunming, 650201, China
Hong-Man Chen, Kai-Xiang Xu, Heng Zhao, De-Ling Jiao, Guo-Ying Peng, Muhammad Ameen Jamal, Min-Juan Lv, Pei Wang, Yang-Zhi Zeng, Hong-Ye Zhao & Hong-Jiang Wei
Faculty of Animal Science and Technology, Yunnan Agricultural University, Kunming, 650201, China
Hong-Man Chen, Guo-Ying Peng, Min-Juan Lv & Hong-Jiang Wei
College of Veterinary Medicine, Yunnan Agricultural University, Kunming, 650201, China
Hong-Man Chen, Kai-Xiang Xu, Heng Zhao, De-Ling Jiao, Pei Wang, Hong-Ye Zhao & Hong-Jiang Wei
Key Laboratory of Genetic Evolution & Animal Models, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
Chen Yan, Si Si, Zheng-Xi Liu, Muhammad Ameen Jamal, Ming-Shan Wang & Hong-Jiang Wei
Yunnan Laboratory of Molecular Biology of Domestic Animals, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, 650223, China
Ming-Shan Wang

Authors

Hong-Man Chen
View author publications
Search author on:PubMed Google Scholar
Kai-Xiang Xu
View author publications
Search author on:PubMed Google Scholar
Chen Yan
View author publications
Search author on:PubMed Google Scholar
Heng Zhao
View author publications
Search author on:PubMed Google Scholar
De-Ling Jiao
View author publications
Search author on:PubMed Google Scholar
Si Si
View author publications
Search author on:PubMed Google Scholar
Zheng-Xi Liu
View author publications
Search author on:PubMed Google Scholar
Guo-Ying Peng
View author publications
Search author on:PubMed Google Scholar
Muhammad Ameen Jamal
View author publications
Search author on:PubMed Google Scholar
Min-Juan Lv
View author publications
Search author on:PubMed Google Scholar
Pei Wang
View author publications
Search author on:PubMed Google Scholar
Yang-Zhi Zeng
View author publications
Search author on:PubMed Google Scholar
Hong-Ye Zhao
View author publications
Search author on:PubMed Google Scholar
Ming-Shan Wang
View author publications
Search author on:PubMed Google Scholar
Hong-Jiang Wei
View author publications
Search author on:PubMed Google Scholar

Contributions

H.-J.W., M.-S.W. and H.-Y.Z., conceived this research. K.-X.X., H.Z., D.-L.J., M.-J.L., M.-A.J. and P.W., collected the samples, and contributed to the experiments. H.-M.C., C.Y., S.S. and Z.-X.L., performed the analysis, and G.-Y.P., prepared the data and software. Y.-Z.Z., developed and maintained the inbred lines. H.-M.C., drafted the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Hong-Ye Zhao, Ming-Shan Wang or Hong-Jiang Wei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Table s1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, HM., Xu, KX., Yan, C. et al. A chromosome-scale reference genome of the Banna miniature inbred pig. Sci Data 11, 1345 (2024). https://doi.org/10.1038/s41597-024-04201-3

Download citation

Received: 23 July 2024
Accepted: 29 November 2024
Published: 18 December 2024
Version of record: 18 December 2024
DOI: https://doi.org/10.1038/s41597-024-04201-3