Abstract
Although advances in long-read sequencing technology and genome assembly techniques have facilitated the study of genomes, little is known about the genomes of unique Chinese indigenous breeds, including the Huai pig. Huai pig is an ancient domestic pig breed and is well-documented for its redder meat color and high forage tolerance compared to European domestic pigs. In the present study, we sequenced and assembled the Huai pig genome using PacBio, Hi-C, and Illumina sequencing technologies. The final highly contiguous chromosome-level Huai pig genome spans 2.53 Gb with a scaffold N50 of 138.92 Mb. The Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness score for the assembled genome was 95.33%. Remarkably, 23,389 protein-coding genes were annotated in the Huai-pig genome, along with 45.87% repetitive sequences. Overall, this study provided new foundational resources for future genetic research on Chinese domestic pigs.
Similar content being viewed by others
Background & Summary
The pig (Sus Scrofa) is a crucial livestock species that supplies staple protein to humans and serves as an important biomedical model owing to its anatomical and physiological similarities to humans. Belonging to the Suidae family, S. scrofa (wild boars and domestic pigs) is the only species that has spread across multiple continents1 and has been domesticated by humans for 9-10 thousand years ago (kya)2. The Huai pig is an important Chinese domestic pig, recorded in the Compendium of Materia Medica. It is an ancient breed that has been prevalent for 2 ky in northern Jiangsu Province, China3. Huai pigs are well-documented for their high meat quality, redder meat color, high forage tolerance, and lower growth rate than European domestic pigs4,5,6. A series of genetic studies have been conducted to dissect the characteristics of Huai pigs at the molecular level. For example, a transcriptome study of the Huai pig revealed significant differences in meat quality and muscle fiber content between the muscles of Huai pigs and Duroc pigs and identified related candidate genes7.
Genomic data is a powerful tool to explain the characteristics of distinct pigs. The recent pig reference genome, Sscrofa11.1, has significantly contributed to our understanding of the genetic basis of distinct phenotypes and evolutionary processes involved in porcine domestication8. To address the limited diversity in the reference genome9, several studies have assembled pig genomes of different breeds, including the Meishan pig10, Ningxiang pig11, and others12,13. However, high-quality genome assembly of Huai pigs is still lacking, and there is a strong demand for chromosome-level genomes for this breed.
In this study, using PacBio long-read and Illumina short-read sequences, we assembled the first chromosome-level Huai pig genome combined with Hi-C data (Table 1). The genome size of the Huai pig was estimated to be approximately 2.56 Gb according to the k-mer analysis of 197.78 Gb (79.11×) Illumina reads (Fig. 1b). The final genome assembly had a size of 2,533,275,462 bp, comprising 2,044 contigs with an N50 size of 11.37 Mb. After chromosome-level anchoring, 2.43 Gb (96.05%) of the assembled contigs were anchored onto 20 chromosomes (Fig. 1c), with scaffold N50 of 138.92 Mb (Table 2). In addition, we annotated 23,389 protein-coding genes in this assembly with a mean of 8.70 exons per gene (Fig. 2a, Table 3). Four types of non-coding RNAs, including transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), microRNAs (miRNAs), and small nuclear RNAs (snRNAs), were also identified in the Huai pig assembly (Fig. 2b). Besides, the repetitive elements in the Huai pig assembly were also annotated, and 45.87% of assembly regions (about 1.17 Gb) were regarded as repetitive sequences (Table 3). Among all repeat elements, long interspersed nuclear elements (LINEs) were the most abundant element, accounting for 20.67% of the entire genome (Figure S1). Our research offers a versatile resource applicable to pig breeding and a foundation for the future exploration of the genetic mechanisms of porcine traits.
The genome assembly of Huai pig. (a) Workflow for the genome assembly of Huai pig. (b) The frequency distribution of k-mer for Huai pig genome (k = 17). (c) A contact map at a 500-kb resolution of chromosome-level assembly in the Huai pig is shown. The color gradient in the accompanying bar represents the contact density, which transitions from red (high density) to green (low density) within the plot. (d) QV scores of chromosomes in the Huai pig assembly.
Genome annotation of Huai pig assembly. (a) Distribution of features in the Huai pig genome. For the outer to inner regions, each circle represents the GC content, transposable elements number, and gene number in the 500 kb nonoverlapping windows. (b) Statistics for the annotated non-coding RNAs of the Huai pig assembly. (c) Comparison of gene features between the Huai pig assembly and Sscrofa11.1. Gene features include the lengths of genes, CDS, exons, and introns. (d) Sequence divergence of repetitive elements in the Huai pig assembly.
Methods
Sample collection
A male Huai pig from Nanjing, Jiangsu Province, China (31.5267°N, 120.5875°E), was collected for de novo assembly. Seven tissues of the same individual were collected and immediately frozen in liquid nitrogen and then stored at −80 °C until RNA extraction, including the heart, liver, spleen, lung, kidney, muscle, and adipose. Blood samples were collected for DNA extraction. All animal experiments were performed under the guidance of ethical regulations from the Institutional Animal Care and Use Committee (IACUC) at the China Agricultural University (Beijing, People’s Republic of China; Approval No. AW60604202-1-1).
DNA isolation and sequencing for genomes
Genomic DNA was extracted from whole blood using the DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany). For long-read sequencing, four SMRT bell libraries were constructed using a Pacific Biosciences SMRT bell Template Prep Kit (Pacific Biosciences, Menlo Park, California, USA). Libraries were evaluated using an Agilent 4200 Bioanalyzer (Agilent Technologies, Santa Clara, California, USA). After size selection, the constructed libraries were sequenced on a Pacific Biosciences Sequel II platform (Pacific Biosciences, Menlo Park, California, USA). A paired-end library with an insert size of ~ 300 bp was constructed using the TruSeq Nano DNA Sample Preparation Kit (Illumina, San Diego, California, USA). In total, 197.78 Gb 150 bp paired-end reads were generated using an Illumina HiSeq 2000 platform. These reads were used to estimate the genome size of Huai pigs and to refine the assembly.
Approximately 10 mL of blood collected from the same Huai pig was used for the Hi-C experiment. Blood was initially crosslinked in a 2% formaldehyde solution for 15 min, and the reaction was halted by the addition of glycine. After isolating the nuclei, the chromatin was digested with MboI. The sticky ends of digested fragments were randomly biotinylated, diluted, and randomly ligated14. Subsequently, biotin-labeled DNA fragments were subjected to ultrasound shearing, followed by blunt-end repair and A-tailing. The adapters were then ligated to the DNA fragments, and polymerase chain reaction (PCR) amplification was performed to scaffold the Hi-C library. After quality control, the Hi-C library was sequenced using an Illumina paired-end sequencing platform with 2 × 150 bp reads.
Transcriptome sequencing
Total RNA was extracted from each tissue sample using the TRIzol-based RNA extraction kit (Invitrogen, Carlsbad, CA, USA). RNA degradation and contamination were monitored using 1% agarose gel electrophoresis. The total RNA concentration was quantified using a Qubit RNA Assay Kit on a Qubit 2.0 Fluorometer (Life Technologies, Carlsbad, CA, United States). RNA sequencing libraries with insert sizes ranging from 250 to 350 bp were prepared using Kapa RiboErase (Roche, Basel, Switzerland). Subsequently, all libraries were sequenced on an Illumina NovaSeq 6000 S4 platform, following the manufacturer’s instructions to obtain transcriptome profiles.
Genome size estimation and de novo assembly
Before de novo assembly, we estimated the genome size of Huai pigs using the k-mer method. Adapters and low-quality reads (base quality [Q] values < 20) in the 197.78 Gb Illumina paired-end reads were removed and trimmed using TrimGalore (v0.6.1)15. These high-quality reads were subjected to 17-mer frequency distribution analysis using Jellyfish (v2.3.0)16. The k-mer depth distribution computed using Jellyfish exhibited an explicit peak depth. Subsequently, the genome size of Huai pigs was calculated using the following formula: genome size = K-num/K-depth, where K-num represents the total number of k-mers, and K-depth corresponds to the highest k-mer frequency.
PacBio subreads were used to perform de novo genome assembly using Falcon software (v2018.03.12)17. The primary assembly was polished using Pilon (v1.23)18 with the aforementioned filtered Illumina paired-end reads. Two rounds of iterative error correction were conducted to ensure assembly accuracy. Finally, the highly accurate contigs were identified. Over 100 × Hi-C reads were used to connect the primary contigs and construct a pseudo-chromosome-level genome. After removing adapter sequences and low-quality bases, these reads were aligned to the primary genome assembly using aln and sampe commands from the Burrow-Wheeler Aligner (BWA v0.7.17)19. The alignment results and contigs from the assembly were used as inputs for LACHESIS (https://github.com/shendurelab/LACHESIS)20, with the cluster number set to 20 and anchored to pseudo-chromosomes. The chromosome name of the Huai pig genome was also determined by LACHESIS based on the alignment results between Sscrofa11.121 reference pig genome (S. scrofa) and the Huai pig genome, which achieved by blastn (v2.10.1+)22. Subsequently, the chromosome-level genome was manually optimized using JuiceBox (v2.20.00)23. Then, the PacBio subreads were corrected using LORDEC (v0.9)24 with the Illumina paired-end reads of the same sample. The chromosome-level genome was gap-filled using TGS-GapCloser (v1.2.1)25 with the corrected PacBio long reads.
Genome quality assessment
To assess the completeness and accuracy of the newly assembled Huai pig genome, we conducted the following validation. First, we mapped the whole-genome sequencing short reads of the same Huai pig against the genome using BWA to estimate the accuracy of a single base of the assembly. In addition, each chromosome’s quality value (QV) score was assessed with short reads using Merqury(v1.3)26. The CEGMA (v2-2.5)27 pipeline software with parameter “–mam,” was also run against the new assembly. BUSCO (v5.0.0)28 software, based on the lineage dataset mammalia_odb10 (creation date: 2019-11-20) was employed to assess the quality of the generated genome. Furthermore, 1,341,928 EST sequences of pig were downloaded from the UCSC database29 and aligned to the Huai pig genome using Minimap2 (v2.17)30.
Repetitive landscape and genome annotation
Homology-based and de novo methods were applied to repeat annotation. Tandem Repeats Finder (TRF, v4.09)31 and RepeatModeler (v2.0.1)32 were used to generate the de novo repeat library for the Huai pig genome, which comprised tandem and interspersed repeats. This de novo repeat library, together with the Repbase33 library, was used for the homology search of repeats through RepeatMasker (v4.1.2, https://www.repeatmasker.org/).
Gene prediction was conducted through a combination of three independent approaches, including ab initio prediction, homology-based prediction, and transcriptome-based prediction, in a repeat-masked genome. For ab initio gene prediction, BRAKER2 (v2.1.6)34 and GlimmerHMM(v3.0.4)35 were used with their default parameters. For homology-based prediction, protein sequences from human (Homo sapiens)36, mouse (Mus musculus)37, cow (Bos taurus)38, sheep (Ovis aries)39, and Sscrofa11.1 were used, and the prediction was conducted by GeMoMa (v1.9)40. For transcriptome-based prediction, RNA-Seq data were aligned to Huai pig assembly by HISAT2 (v2.2.1)41 with default parameters. StringTie (v2.1.6)42 and TransDecoder (v5.5.0; https://github.com/TransDecoder/TransDecoder) were used to assemble the transcripts and convert the candidate coding regions into gene models. Simultaneously, these RNA-Seq data were also de novo assembled by Trinity (v2.1.1)43, and PASA (v2.5.3)44 was employed to predict the gene structure. Finally, the gene models predicted through the three aforementioned approaches were combined by EvidenceModeler (v2.1.0)45 into a non-redundant set of gene structures. Protein-coding genes were functionally analyzed using six datasets, including GO_Annotation, KEGG_Annotation, KOG_Annotation, Swiss-Prot_Annotation, TrEMBL_Annotation, and NR_Annotation.
The tRNAs were predicted by tRNAscan-SE (v2.0.9)46, while the rRNA fragments were detected by barrnap (v0.9, https://github.com/tseemann/barrnap). The miRNAs and snRNAs were identified by searching the Rfam database (release 14.10) using INFERNAL (v1.1.4)47.
Genome collinearity analysis and validation of structural variants (SVs) in the Huai pig genome
To verify the quality of the Huai pig genome, six public chromosome-level pig genomes (Sscrofa11.1, USMARC48, Ningxiang49, Meishan50, Bama miniature51, and Diannan Small-ear pig52) were used to conduct the collinearity analysis. MCScanX53 was used to identify colinear blocks, and the genome collinearity graph was generated using jcvi54.
Simultaneously, to validate the difference between the Huai pig genome and other pig genomes. Huai pig genome and other five pig genomes (USMARC, Ningxiang, Meishan, Bama miniature, and Diannan Small-ear pig) were aligned to the Sscrofa11.1 reference genome, and four methods were applied to identify the SVs: Assemblytics (v1.2.1)55, smartie-sv56, SVMU57, and SyRI (v1.3)58. Specifically, the pipelines of Assemblytics and SVMU were performed on the nucmer (v4.0.0rc1)59 (-c 1000–maxgap = 500) alignment. Alignment pairs were extracted from any pair of genomes based on Minimap2 to serve as inputs for SyRI. For insertions and deletions, we merged these four results sets using SURVIVOR (v1.0.7)60 with the parameters “1000 3 1 0 0 50” and identified candidate insertions and deletions supported by at least three methods. For the inversions, we only considered the results detected by both SyRI and SVMU.
The 300 bp Huai pig-specific insertion identified in the ENPP5 gene was validated by PCR of amplicon(s) that spanned 655–900 bp of gDNA flanking the insert, the breakpoint between the gDNA and the insert. Primers that hybridized to the gDNA flanking the insert were designed using Primer3 Software (https://sourceforge.net/projects/primer3/). Amplification was performed using 2 × EasyTaq® PCR SuperMix (AS111-12). PCR was conducted as described below: 1 μL of each primer (10 μM), 1 μL of genomic DNA (about 80 ng of DNA), 12.5 μL 2 × EasyTaq PCR SuperMix and 1 μL ddH2O. Thermocycling was done for 30 cycles at 58°C annealing temperature and one minute extension time. The PCR product of the predicted size was identified in different pig breeds that were homozygous or heterozygous for the insert, using agarose gel electrophoresis.
Data Records
The assembled genome has been deposited at NCBI GenBank with the accession number JBGKAQ00000000061. The raw sequencing data of this genome and the RNA-Seq of seven tissues are available at NCBI SRA under the project PRJNA114717362. Simultaneously, the genome and the raw sequencing data are also publicly accessible in the GSA database (https://ngdc.cncb.ac.cn/gsa/) with the accession number PRJCA02438163. Additionally, files containing the protein-coding gene annotation, non-coding RNA prediction, and repeat annotation of Huai pig have been deposited in the Figshare64 database. Furthermore, the dataset supports the genome collinearity analysis and genomic variants validation, which can also be accessed in the Figshare64 database.
Technical Validation
Various methods have been applied to evaluate the completeness and accuracy of Huai pig assembly. First, the Huai pig genome assessment using Merqury26 revealed a consensus quality score of 32.86, equivalent to a base accuracy of 99.95%. Evaluation of the Huai pig genome using the CEGMA software indicated that 91.13% of the 248 full-length genes in the core gene set were predicted. Simultaneously, approximately 95.33% (8,795 of 9,226) of the single-copy orthologous genes in the “mammalia_odb10” data set were identified in our assembled genome (Table 4), similar to the Sscrofa11.1 reference genome. Furthermore, we aligned Illumina short reads (~79.11×) from the same individual against this assembly, resulting in a mapping rate and genome coverage of 98.59% and 99.38%, respectively. Finally, 1,341,928 EST sequences belonging to pigs were downloaded from the UCSC database and aligned with the Huai pig genome. The results revealed that 93.59% of the EST sequences (coverage rate > 90%) matched the Huai pig genome. These results indicated that the Huai pig genome assembly was of high quality. The ultimate predicted gene set comprised 23,389 protein-coding genes, and the functional analysis revealed that 92.96% of the predicted genes were annotated in at least one of the six public databases (Table 3). Simultaneously, the gene features in the Huai pig genome revealed similar length distributions for coding sequences, genes, exons, and introns to Sscrofa11.1 (Fig. 2c).
In addition, the Huai pig genome demonstrates strong collinearity with the Sscrofa11.1 reference genome and other public chromosome-level pig genomes. A total of 3,239 insertions and 1,400 deletions may be specific to Huai pigs (Figure S4). Especially, an insertion with a 300 bp length was located in the first CDS of the ENPP5 gene (Fig. 3b) and validated by PCR.
Genome collinearity analysis and validation of Huai pig-specific SVs. (a) Colinear blocks identified between the Huai pig genome and other pig breeds’ genomes. (b) Insertion of ENPP5 gene. The left diagram shows the structure of the ENPP5 gene and the 300 bp insertion located in its CDS. The diagram on the right-hand side shows the PCR results for this insertion. The numbers 1 to 8 represent different breeds: Duroc, Landrace, Yorkshire, Bama miniature, Diannan Small-ear, Ningxiang, Meishan, and Huai pigs.
Code availability
No specific code was used in this study. The data analyses adhered to the manuals and protocols offered by the creators of the corresponding bioinformatics tools, the parameter settings of which were outlined in the methods section.
References
Groenen, M. A. M. A decade of pig genome sequencing: a window on pig domestication and evolution. Genet. Sel. Evol. 48 (2016).
Frantz, L. et al. The Evolution of Suidae. Annu. Rev. Anim. Biosci. 4, 61–85 (2016). Vol 4.
Wang, X. et al. Genetic Evaluation and Population Structure of Jiangsu Native Pigs in China Revealed by SINE Insertion Polymorphisms. Animals 12, 1345 (2022).
Liu, H. et al. Genome-Wide Association Study and FST Analysis Reveal Four Quantitative Trait Loci and Six Candidate Genes for Meat Color in Pigs. Front. Genet. 13 (2022).
Cheng, P. Livestock Breeds of China. (Food and Agriculture Organization of the United Nations, Rome, 1985).
Yeqiu, Z. et al. Effects of rice bran source high fibre diet on growth performance and intestine function of Suhuai pigs. J. Nanjing Agric. Univ. (2016).
Li, X. et al. Transcriptomic Profiling of Meat Quality Traits of Skeletal Muscles of the Chinese Indigenous Huai Pig and Duroc Pig. Genes 14, 1548 (2023).
Warr, A. et al. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience 9 (2020).
Sherman, R. M. et al. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51, 30–+ (2019).
Zhou, R. et al. The Meishan pig genome reveals structural variation-mediated gene expression and phenotypic divergence underlying Asian pig domestication. Mol. Ecol. Resour. 21, 2077–2092 (2021).
Ma, H. M. et al. Long-read assembly of the Chinese indigenous Ningxiang pig genome and identification of genetic variations in fat metabolism among different breeds. Mol. Ecol. Resour. (2021).
Zhang, L. et al. Development and Genome Sequencing of a Laboratory-Inbred Miniature Pig Facilitates Study of Human Diabetic Disease. Iscience 19, 162‐+ (2019).
Zhang, Y. et al. The genome of the naturally evolved obesity-prone Ossabaw miniature pig. iScience 24 (2021).
Wang, M. et al. Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat. Plants 4, 90–97 (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Marcais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–70 (2011).
Chin, C. S. et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13, 1050–1054 (2016).
Walker, B. J. et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. Plos One 9 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–+ (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000003025.6 (2017).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Durand, N. C. et al. Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst. 3, 99–101 (2016).
Salmela, L. & Rivals, E. LoRDEC: accurate and efficient long read error correction. Bioinformatics 30, 3506–14 (2014).
Xu, M. et al. TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads. GigaScience 9, giaa094 (2020).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Parra, G., Bradnam, K., Ning, Z., Keane, T. & Korf, I. Assessing the gene space in draft genomes. Nucleic Acids Res. 37, 289–297 (2009).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27, 573–80 (1999).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457 (2020).
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 1–6 (2015).
Bruna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genom Bioinform 3, lqaa108 (2021).
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000001405.29 (2022).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_000001635.9 (2020).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_002263795.2 (2018).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_002742125.1 (2017).
Keilwagen, J., Hartung, F. & Grau, J. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol. Biol. Clifton NJ 1962, 161–177 (2019).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Haas, B. J. et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9 (2008).
Chan, P. P. & Lowe, T. M. tRNAscan-SE: Searching for tRNA genes in genomic sequences. Methods Mol. Biol. Clifton NJ 1962, 1 (2019).
Nawrocki, E. P. & Eddy, S. R. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_002844635.1 (2017).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_020567905.1 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_017957985.1 (2021).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_007644095.1 (2019).
National Genomics Data Center https://ngdc.cncb.ac.cn/gwh/Assembly/1052/show (2020).
Wang, Y. et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res. 40, e49 (2012).
Tang, H. et al. JCVI: A versatile toolkit for comparative genomics analysis. iMeta 3, e211 (2024).
Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Kronenberg, Z. N. et al. High-resolution comparative analysis of great ape genomes. Science 360 (2018).
Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019).
Goel, M., Sun, H., Jiao, W.-B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 1–13 (2019).
Marçais, G. et al. MUMmer4: A fast and versatile genome alignment system. PLOS Comput. Biol. 14, e1005944 (2018).
Jeffares, D. C. et al. Transient structural variations have strong effects on quantitative traits and reproductive isolation in fission yeast. Nat. Commun. 8 (2017).
NCBI GenBank https://identifiers.org/ncbi/insdc:JBGKAQ000000000 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP526475 (2024).
National Genomics Data Center https://ngdc.cncb.ac.cn/bioproject/browse/PRJCA024381 (2024).
Du, H. & Liu, J.-F. The chromosomal-level genome represents the gene evolution and genetic variants in the Huai pig. Figshare https://doi.org/10.6084/m9.figshare.25804891.v2 (2024).
Acknowledgements
This work was financially supported by the National Key Research and Development Program of China (2021YFD1200801), the Earmarked Fund for China Agriculture Research System (No. CARS-pig-35), the National Natural Science Foundations of China (32302708), Science and Technology Program of Guizhou Province (Qian Kehe Support [2022] Key 032), and the 2115 Talent Development Program of China Agricultural University. We would like to thank the High-performance Computing Platform of China Agricultural University for computing support.
Author information
Authors and Affiliations
Contributions
J-F.L. conceived and designed the experiments. H.D. designed the analytical strategy and performed analysis processes. S.L. participated in the PCR experiment and revised this manuscript. Q.H. assisted in writing the manuscript. L.Z. Supervision and reviewing. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Du, H., Lu, S., Huang, Q. et al. Chromosome-level genome assembly of Huai pig (Sus scrofa). Sci Data 11, 1072 (2024). https://doi.org/10.1038/s41597-024-03921-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03921-w