Abstract
Parent-of-origin effects refer to the phenomenon whereby the gene expression and corresponding phenotype are influenced by paternal or maternal origin, and uncovering the underlying regulatory mechanisms remains a challenging task. To address this, we designed three sets of trio families by crossing divergent pig breeds to generate sufficient heterozygous loci and collected back fat and longissimus dorsi for multi-omics sequencing. Parental phases of sequencing reads are efficiently determined by leveraging long-read sequencing technology. We generate a phased multi-omics resource in hybrid pigs, identifying 10,516 phase-specific gene expressions, 104,708 methylated regions, 132,602 histone modifications, 25,667 CTCF binding sites, and, based on in situ Hi-C, 7884 topologically associated domain boundaries and 8573 chromatin loops. Our results reveal that, although nearly 83% of gene expression differences between parental phases are regulated by DNA methylation, a subset is still influenced by other epigenetic modifications. This study not only highlights complex imprinting regulation models of H19-IGF2 and SGCE-PEG10 but also provides insight into the inheritance of advantageous traits in hybrid offspring.
Similar content being viewed by others
Introduction
The majority of mammalian genomes are diploid, and both paternal and maternal haploid are regarded as functionally equivalent1. However, an imbalance between the two haploids is evident at the early stage of zygote formation2, known as the parent-of-origin effects (POE), which refer to the phenomenon where the expression of a gene depends on whether it was inherited from the mother or the father. POE can have a significant impact on embryonic development and the onset of human disease3,4,5,6, and has been utilized in livestock breeding to exploit heterosis7. The most fundamental step in investigating POE is accurately distinguishing between paternal and maternal phases. However, accurately assigning gene expression and epigenetic data to parental phases represents a substantial challenge for diploid organisms. Consequently, the constraints that impeded the detection of POE in phasing were primarily as follows: 1) The absence of defined trio pedigree information hampers the ability to identify the parental origins of gene expression and genetic information8; 2) The scarcity and uneven distribution of heterozygous SNP loci utilized for phasing result in inadequate coverage across the entire genome9; 3) The efficacy of haplotype phasing is diminished when relying solely on SNPs as phase markers without the supplementary assistance of long reads to construct haplotype blocks10.
In modern agricultural breeding, POE plays a crucial role in the inheritance of genetic advantages in hybrid individuals and the formation of complex traits. They have a significant impact on the genetic contribution to economic traits such as meat quality, fat deposition, and metabolic efficiency11,12,13. Compared to early studies, which relied on the observation of phenotypic differences between parents and offspring to infer POE, modern POE research delves deeper into the roles of parental genes at the levels of gene expression and epigenetics. Studies investigating DNA methylation and chromatin accessibility have identified POE in hybrid pigs14, while DNA methylation and gene expression have been explored to uncover POE in hybrid bovines15 and maize endosperm16. These studies highlight that research on POE not only provides a novel theoretical foundation for traditional animal genetic breeding but also offers more precise tools for analyzing the genetic advantages of future hybrid individuals.
POE has been widely attributed to parent-specific methylation patterns that control imbalanced gene expression17. However, recent studies have observed that the differences in histone modifications and three-dimensional chromatin structure are also associated with POE18,19. Additionally, these epigenetic differences have also been reported to colocalize with expression quantitative trait loci (eQTL)20. These findings indicate that the mechanisms underlying POE are likely complex and that single-layer epigenetic data may be insufficient to fully elucidate the causes of POE events. Therefore, integrating diverse epigenetic data can provide a more comprehensive view of the models and mechanisms that underpin POE.
In this study, we crossed six genetically and phenotypically divergent pigs (Large White x Erhualian, Duroc x Liangguang, and Berkshire x Ganxi) to maximize heterozygosity, ensure broad applicability, and constructed Euro-Chinese trio families. Building on this, a phasing strategy assisted by long reads was developed. In parallel, we generated transcriptomes and multiple epigenetic data, including DNA methylation, histone modifications (H3K27ac, H3K4me3, H3K4me1, H3K27me3), CTCF, and Hi-C data. By employing a phased analysis across these comprehensive data sets, we were able to identify differential gene expression and transcription between parental phases, thereby expanding the repertoire of imprinted genes in pigs. We also integrated multi-omic data to provide further insights into POE on genes from diverse perspectives.
Results
Experimental design and quality control (QC) of data sets
To obtain a sufficient heterozygous genotype in offspring to accurately distinguish paternal and maternal haplotypes and their corresponding sequencing reads, we crossed three pairs of European commercial and Chinese domestic pig breeds (Large White x Erhualian, Duroc x Liangguang, and Berkshire x Ganxi) to generate three sets of trio families, each containing two hybrid offspring (Fig. 1a). For convenience, the haploid genomes in hybrid pigs inherited from Large White and Erhualian breeds are denoted as LE_P and LE_M. Those inherited from Duroc and Liangguang breeds are named as DL_P and DL_M, and those from Berkshire and Ganxi breeds are denoted as BG_P and BG_M. Numbers are added to indicate biological replicates. Ear tissue from six parental pigs was collected for short-read whole-genome sequencing (WGS). From six hybrid offspring aged 90-100 days, samples of back fat (BF) and longissimus dorsi (LD) were collected for multi-omic analyses. These analyses include short-read whole-genome sequencing (WGS), long-read sequencing (ONT), RNA sequencing (RNA-seq), full-length transcriptome sequencing (PacBio CCS), and genome-wide DNA methylation sequencing (GM-seq) for BF and LD. Additionally, CUT&Tag sequencing data from 50 samples of the two tissues are incorporated, focusing on four histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K27me3) and one transcription factor (CTCF). Furthermore, the Hi-C sequencing of five LD samples is included to explore the 3D chromatin structure (Fig. 1a, Supplementary Fig. 1a, b). In total, 6.19 billion WGS short reads, 88.33 million long reads, 0.91 billion RNA sequencing reads, 4.37 million full-length transcriptome sequencing reads, 8.61 billion DNA methylation sequencing reads, 1,285 billion Hi-C sequencing reads and 1.83 billion CUT&Tag sequencing reads were obtained in this study (Supplementary Data 1–4).
a The multi-omic diagram comparing BF and LD, and the paternal and maternal phase of hybrid pigs generated by crossing European commercial pigs with Chinese indigenous pigs. Elements were provided by Servier Medical Art (https://smart.servier.com/), licensed under CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/). b Pearson correlations of gene expression between BF and LD, and biological replicates (DL1, DL2, LE1, LE2, BG1), were calculated using normalized reads coverage within the gene body across the entire genome. c Average DNA methylation levels in BF and LD tissues near protein-coding genes, including at the transcription start site (TSS) and transcription termination site (TES). d Pearson correlations among assays for different histone modifications and CTCF, between BF and LD and biological replicates (DL1, DL2, LE1, LE2, BG1), were calculated based on normalized signals within 1 kb windows across the whole genome. e Average histone modification and CTCF signals near protein-coding genes, at the TSS and TES. f High-quality Hi-C contact map at variable resolutions of 100 kb, 25 kb, 5 kb, and 1 kb, with zoomed-in areas highlighted by yellow boxes. Source data are provided as a Source Data file.
To guarantee the dependability of our findings, we conducted a thorough quality assessment of all the datasets. For RNA sequencing, the gene expression profile was clearly distinguished between BF and LD (Fig. 1b). With regard to DNA methylation, 64,048,702 and 64,167,233 methylated loci (CpGs) were identified in BF and LD, respectively. The DNA methylation profile was characterized by a bimodal distribution, with most sites being either fully methylated or completely unmethylated (Supplementary Fig. 1d–f). Notably, BF exhibited higher overall methylation levels than LD (Supplementary Fig. 1c), which is consistent with previous reports21. As expected, methylation was markedly depleted at transcription start sites (TSSs) in both tissues (Fig. 1c). Regarding CUT&Tag data, the aligned fragment size distribution showed enrichment around 200 bp and 400 bp, consistent with the anticipated lengths of nucleosome monomers and dimers, respectively (Supplementary Fig. 1g). In total, we identified a range of 47,037 to 195,802 consensus peaks across different histone modifications and CTCF in the two tissues, showing high reproducibility rate (60%–87.84%) with previously reported datasets in pigs22 (Supplementary Fig. 1h). Compared with data from the same assay in other mammals (human23,24 and mouse24,25,26,27,28,29,30), H3K4me3 exhibits higher evolutionary conservation than other assays (Supplementary Fig. 1i, j). Furthermore, the conservation of epigenetic signals between pigs and humans is slightly higher than that between pigs and mice, further supporting previous findings on the evolutionary conservation of epigenetic landscapes among humans, pigs, and mice31, and reinforcing the notion that pigs could be a more suitable biomedical model32. The hierarchical clustering of samples based on the signals of the four histone modifications and CTCF showed that the samples were grouped first by assay and then by tissue type, and active histone modifications (H3K27ac, H3K4me3, and H3K4me1) and inactive histone modification (H3K27me3) were grouped separately (Fig. 1d). The enrichment of different histone modifications and CTCF around TSSs indicates that active histone modifications are more likely to occur at TSSs (Fig. 1e), which is in agreement with the prevailing understanding of gene expression patterns. Our Hi-C dataset generated an average of nine billion valid read pairs, achieving up to 1 kb resolution, which enables the capture of subtle genome interactions (Fig. 1f, Supplementary Fig. 1k–m). Furthermore, the hierarchical structure of B compartments is characterized by significantly lower GC content and TSS densities in comparison to A compartments (Supplementary Fig. 1n). The lengths of topologically associating domains (TADs) were mainly enriched from 200 to 300 kb, and TAD boundaries showed TSS enrichment (Supplementary Fig. 1o, p). On average, 39,817 chromatin loops per sample were identified at 1-kb resolution, with loop lengths predominantly falling within 1 Mb—a range broadly recognized as functionally relevant for cis-regulatory interactions33 (Supplementary Fig. 1q). The average interaction strength significantly decreased as the loop interval increased (Supplementary Fig. 1r). To the best of our knowledge, this is the most comprehensive multi-epigenetic atlas for phasing in porcine. This dataset provides an invaluable foundation for a systematic understanding of the impact of porcine POE on genes.
Phasing and tracing hybrids genomes
It is of paramount importance to accurately detect the genetic and epigenetic differences between the parental phases in order to gain a deeper understanding of the POE mechanisms. To investigate this and explore phase-specific gene expression and epigenetic regulation at the whole-genome level, we developed a multi-omics phasing strategy, designated phase-tag, based on genome sequencing data from trios (Fig. 2a).
a Acquisition of phased SNP information of hybrids based on the WGS (top) and phasing flow of multi-omics sequencing data (bottom). b Phasing accuracy evaluated by switch error rate for each hybrid. c Phasing efficiency across all autosomes in sample DL1. d SNP density among all autosomes in sample DL1. e Comparison of shared and differential k-mers (in millions) obtained through phase-tag and trio binning phasing strategies in sample DL1. f IGV viewer showing a phasing example for a specific region. From top to bottom: SNP information for parents and offspring (gray, homozygous; dark blue, heterozygous), reads distribution across the parental phases (blue for paternal reads; red for maternal reads), and distribution of unassignable reads. The red bars highlight SNPs that are heterozygous in both parents and offspring, which can be utilized for the phase-tag phasing strategy, but not for the trio binning strategy. Source data are provided as a Source Data file.
First, the variants of the offspring were phased by leveraging information from trio pedigrees, long reads, and short-read sequences34. Subsequently, the reads were tagged with their respective parental origins using the WhatsHap software35. Through the phase-tag strategy, we obtained an average of 11,084,942 heterozygous variants in each hybrid offspring for tracing genetic information from the parent origin (Supplementary Data 5). The accuracy and efficiency of the phasing pipeline were also evaluated. The switch error rate refers to the proportion of incorrect phase assignments between parental alleles. For all hybrid offspring, the rate ranges from 0.19% to 0.25%, which is within the normal range36,37 (Fig. 2b), and the proportions of assigned paternal and maternal reads across the whole genome were ranged from 48.93% to 63.02% (Fig. 2c, Supplementary Fig. 2a). As the phasing pipeline is highly dependent on SNP density and uniformity, the distribution of whole-genome SNPs was examined to evaluate the assignable reads across the genome (Fig. 2d, Supplementary Fig. 2b). The mean interval of variants was approximately ~ 250 bp/SNP, while there is also a proportion region with SNP interval > 10 kb (chr1:279,860,000-279,890,000, chr13:172,534,000-172,550,000) remained, which hampers distinguishment of read parental origins. Additionally, the efficacy of the conventional reads phasing approach, designated as trio binning38, was also assessed. This strategy employs specific k-mers derived from the father and mother to differentiate paternal and maternal reads. It was observed a considerable degree of overlap in k-mers generated (with a total of 86% being shared) between phase-tag and trio binning strategy. However, the phase-tag strategy demonstrated a greater power on phase-specific k-mers in comparison to the trio binning strategy. A proportion of indistinguishable parent-offspring heterozygous loci remain in the trio binning strategy, while phase-tag is able to effectively phase by hitchhiking adjacent phased variants (Fig. 2e, Supplementary Fig. 2c). As illustrated in the selected region (chr1:742,381-747,833), long-read sequencing technology is capable of generating phase information over extensive continuous segments. It enables the complete assignment of heterozygous SNPs in both parents and offspring to parental phases, a capability that the trio binning method lacks (Fig. 2f). Our phasing analysis combines the advantages of highly heterozygous variants and long reads to obtain more phasing information, which facilitates the execution of phased multi-omics integration studies at the genome-wide level.
Differential landscape between paternal and maternal phase on gene expression, DNA methylation, histone modification and chromatin conformation
By assigning multi-omic sequencing reads to paternal and maternal phases, we were able to directly observe differences in expression and epigenetic modifications associated with POE. In total, 7466 and 6544 phase-specific expression genes (PSEs) were identified in BF and LD, respectively, with 3848 genes shared between BF and LD (Fig. 3a). Additionally, we also analyzed differentially expressed transcripts (DET) between parental phases. Due to the high variability of transcripts among individuals, the focus was placed on transcripts that were consistently present across samples, with 5006 identified in BF and 2600 in LD (Supplementary Fig. 3a, c). These consistent DETs were significantly enriched in pathways associated with tissue function (Supplementary Fig. 3b, d). Interestingly, several genes were identified as containing DETs despite not being classified as PSEs, including CKM and TPM2 in LD, and CIDEA in BF (Supplementary Fig. 3e–g).
a The frequency of identified PSEs across all samples of BF and LD. b Enrichment of PSMRs in different genomic elements of BF and LD. c The frequency of identified PSHMs and PSCTCF across all samples within LD. d The proportions of phase-specific compartment and the distribution of A VS B compartment and B VS A compartment in LD. The chi-squared test was employed to examine the frequency of compartment switching. e The number and proportion of identified different types of PSTAD boundaries. f An overview of the number of PSEs associated with single or complex phase-specific epigenetic marks. Source data are provided as a Source Data file.
To quantitatively assess the association between phase-resolved epigenetic features and POE on PSEs, we also conducted phasing analysis on multi-epigenetic datasets. Genome-wide methylation levels from parental phases showed no significant phase deviation, consistent with the diploid methylation pattern described, a reduction at the TSS and an increase at the gene body in methylation were observed (Supplementary Fig. 4a). A total of 107,784 phase-specific methylated regions (PSMRs) were identified in BF and 103,799 in LD, with lengths ranging from 50 to 200 bp (Supplementary Fig. 4b). Both BF and LD showed similar patterns, with hypermethylation in both phases predominantly enriched in the 3’UTR and 5’UTR, and to a lesser extent in exons and introns (Fig. 3b).
Phase-specific histone modifications (PSHMs) and phase-specific CTCF binding (PSCTCF) were also identified. In BF, 30,721 phase-specific peaks were detected for H3K27ac, 9030 for H3K4me3, 35,538 for H3K4me1, 15,795 for H3K27me3, and 10,859 for CTCF. In LD, the corresponding numbers were 27,945 for H3K27ac, 9406 for H3K4me3, 34,721 for H3K4me1, 17,903 for H3K27me3, and 16,960 for CTCF. On average, these phase-specific signals accounted for 13.3% of H3K27ac peaks, 12.04% of H3K4me3 peaks, 9.11% of H3K4me1 peaks, 7.03% of H3K27me3 peaks, and 6.52% of CTCF peaks (Supplementary Data 6). This result is comparable to the approximately 10% allele-specific variation in chromatin states observed in the larger human population39. Moreover, a total of 465 PSHMs and PSCTCFs in LD and 310 in BF were consistently shared across more than half of the samples (Fig. 3c, Supplementary Fig. 4f). The relatively low consistency may be attributed to differences in breed and cell types within the bulk tissue samples.
Phase-specific 3D chromatin structures were identified from phased Hi-C maps. The analysis revealed 672 compartments switching from compartment A in the paternal phase to compartment B in the maternal phase, and 679 showing the reverse pattern, accounting for 5.80% of all compartments. The distribution of these phase-specific compartments between paternal and maternal phases was found to be insignificant (chi-square test, p = 0.439) (Fig. 3d). Additionally, the phased TADs exhibited characteristics similar to those observed in diploid (Supplementary Fig. 4c, d). A total of 7884 phase-specific TAD boundaries (PSTAD boundaries) were identified, with 65.6% reflecting typical differences (Merge, Shifted, and Split)40, indicating the emergence of novel TAD boundaries in one phase and the formation of phase-specific subTADs (Fig. 3e). Compared to non-PSTAD boundaries, PSTAD boundaries showed a reduction in CTCF binding sites and deviations from the standard CTCF motif (Supplementary Fig. 4e). To explore chromatin interactions at parental phase level, we assessed the resolution of phased Hi-C maps and confirmed a maximum resolution of 4.5 kb. Consequently, loop calling was performed on the phased Hi-C maps at a 5 kb resolution, identifying an average of 11,039 loops in the paternal phase and 11,402 loops in the maternal phase. Differential analysis revealed 4480 paternal-specific loops (FDR < 0.05) and 4093 maternal-specific loops (FDR < 0.05) (Supplementary Fig. 4g). Following the establishment of the phased multi-omics database, an integrated analysis of multiple omics layers was conducted to further investigate the association between PSEs and phase-specific epigenetic modifications in BF and LD. Similar proportions of associations were identified within both tissues, with 32.87% of PSEs in BF and 31.41% in LD associated with a single epigenetic mark. Furthermore, 35.53% of PSEs in BF and 52.15% in LD are influenced by more than one epigenetic mark (Fig. 3f, Supplementary Fig. 4h). Among these associations, although PSEs are predominantly associated with PSMRs (89.73% in BF and 70.21% in LD), we also observed associations between PSEs and PSHMs, PSCTCF, PSTAD boundaries and PSLoops.
In the conventional practice of animal genetic breeding, commercial pigs are produced from three-line hybridization to better utilize the parental phenotypic advantages. To better understand how POE influences the inheritance of advantageous traits in hybrids, we classified PSEs associated with phase-specific epigenetic modifications in BF and LD according to paternal (European origins) and maternal (Chinese origins) preferences. A functional enrichment analysis was then performed for each tissue (Supplementary Fig. 4i–l). In BF, paternal-dominance PSEs were predominantly enriched in pathways of lipid biosynthetic process, which means hybrid pig preference processes of low lipid biosynthetic and high hydroxy metabolism of western commercial breeds. In contrast, maternal-dominance PSEs were primarily enriched in pathways of carboxylic acid metabolic process, metabolism of lipids, glycerolipid metabolism, and response to toxic substances, processes of accelerating carboxylic acid and fat metabolism. In LD, paternal-dominance PSEs demonstrated significant enrichment in pathways of nervous system development, carbohydrate metabolism and protein catabolism while maternal-dominance PSEs exhibited enrichment in pathways related to organic hydroxy compound metabolism and lipid metabolism. These findings suggest that hybrids exhibit advantageous traits through parental-specific preferences, gene expression, and biological pathways.
POE regulated by DNA methylation
POE primarily involves genomic imprinting, widely recognized as driven by differences in maternal and paternal DNA methylation patterns17. To investigate genome-wide POE regulated by DNA methylation, the positional relationships between PSEs and PSMRs were examined. It was observed that the distribution of PSEs closely resembled that of PSMRs, including 31 known imprinted genes, representing 70% of those in the imprinted gene database, which is consistent with the expectation (Fig. 4a, Supplementary Fig. 5a). Compared to PSEs, imprinted genes demonstrate lower tissue or developmental stage specificity but maintain more conserved expression preference across tissues14. To enlarge the imprinted gene repository in pigs, PSEs with consistent phase-preferential expression over half of testing samples were defined as imprinted-like genes. We identified 683 imprinted-like genes in BF and 693 in LD, of which 121 were tissue-shared, including 16 imprinted genes that had previously been reported in both pig and human (Supplementary Fig. 5b, c and Supplementary Data 7). Functional enrichment analysis revealed that tissue-shared imprinted-like genes were significantly enriched in pathways associated with fundamental metabolic processes and the cell cycle (Supplementary Fig. 5e). Tissue-specific imprinted-like genes were also enriched in pathways linked to tissue-specific functions (Supplementary Fig. 5d, f), with BF-specific imprinted-like genes enriched in fatty acid biosynthesis and LD-specific imprinted-like genes involved in actin cytoskeleton organization.
a Genome-wide (Autosomal) distribution of PSMRs and PSEs identified in LD with imprinted genes showing significant expression differences between phases labeled in the genomic regions. b Correlation between differential expression levels of imprinted-like genes and differential methylation levels in their promoters in LD. The cor.test function was used to calculate the Pearson correlation coefficient (r) and the corresponding p-values from the two-sided t-test. Red scatter points indicating imprinted genes across various samples. c Tracks for PSMR, DNA methylation levels, and gene expression levels of PEG3 between the paternal and maternal phase in BF and LD. d An example of ADH1C demonstrating phase-specific DNA methylation levels in the promoter region and phase-specific gene expression levels across samples of BF and LD. e Tracks for PSMR, DNA methylation levels, and gene expression levels of ADH1C between the paternal and maternal phases in BF and LD. f Different transcript types of ADH1C in BF and LD. g Expression levels of ADH1C transcripts between the paternal and maternal phase in BF and LD. Source data are provided as a Source Data file.
To assess how PSMRs contribute to POEs, we analyzed the correlation between differential promoter methylation and gene expression differences between the parental phases, using imprinted-like genes as a model. A significant negative correlation was observed in both LD (p = 0.0079) and BF (p = 3e-5) (Fig. 4b, Supplementary Fig. 5g). In tissue-shared imprinted-like genes, 53.71% of those were negatively regulated by PSMRs within promoters. Typically, four known imprinted genes: PEG3, SNRPN, NDN, and MAGEL2, exhibited complete maternal methylation in the promoter and paternal-specific expression (Fig. 4c, Supplementary Fig. 5h, i). It supports that the imprinted genes can be directly regulated by PSMR. Additionally, it is also notable that the PSMR in the ADH1C promoter was found to be negatively correlated with its preferential expression and was highly conserved in both BF and LD (Fig. 4d, e). In the paternal phase, the promoter was highly methylated, leading to low expression, whereas the maternal phase showed low methylation and high expression. It suggests that ADH1C may be a novel imprinted gene regulated by DNA methylation. Further analysis of ADH1C transcripts revealed higher transcript diversity in BF, but both BF and LD displayed similar maternal expression of the predominant transcript. Additionally, all transcripts in the maternal phase exhibited reduced DNA methylation in the promoter and first exon, further revealing the underlying reasons for phase-specific expression (Fig. 4f, g).
For tissue-specific imprinted-like genes, 28.64% of BF-specific and 31.99% of LD-specific genes were negatively regulated by PSMRs in their promoters, highlighting distinct PSMR patterns between BF and LD. For instance, a BF-specific imprinted-like gene, RPS11, displayed the higher maternal expression with the higher paternal methylation in its promoter in BF but not in LD (Supplementary Fig. 5j). It has been reported to promote lipid synthesis and accumulation41. Conversely, a LD-specific imprinted-like gene, KPTN, was reported to encode the actin-related proteins42 and exhibited the higher paternal expression, accompanied by the higher maternal methylation in promoter. In contrast, BF showed balanced expression and methylation between parental phases (Supplementary Fig. 5k). These findings illustrate that unbalanced promoter methylation directly causes phase-specific gene expression imbalances. This phenomenon is particularly evident in tissue-shared imprinted-like genes. Moreover, the tissue-specific imprinted-like genes are regulated by tissue-specific PSMRs, reflecting tissue-related functional specificity. However, PSEs cannot be solely attributed to methylation, suggesting the involvement of other mechanisms underlying POEs.
POE regulated by PSHM and phase-specific chromatin structure
It has been reported in studies of human cell lines that allele-specific histone modifications are highly enriched for loci that exhibit allele-specific expression, including within known imprinted regions43. This evidence suggested that PSHM may also contribute to POE on PSEs. A significant positive correlation was observed between the signal levels of differential H3K27ac, H3K4me3, H3K4me1, and CTCF, and PSEs in both BF and LD, with the exception of differential H3K27me3, which supports the potential role of PSHM in regulating POE (Fig. 5a, Supplementary Fig. 6a). Our study focused on the influence of PSHM and PSCTCF on PSE in both the promoter regions and distal regulatory regions, independent of PSMR.
a In LD, the correlation between PSEs and intensities of different PSHMs and PSCTCF within promoter regions and regulatory regions. b The number of PSEs associated with PSHMs and PSCTCF in promoter regions and loop-mediated regulatory regions in LD. c In LD, the proportion of PSEs with PSHMs or PSCTCFs in the promoter region (top). The proportion of PSEs with loop connections that have PSHMs or PSCTCFs in the regulatory regions (bottom). Pink indicates PSEs with one or more PSHMs or PSCTCFs colocalized in the region of interest; Blue indicates PSEs without any PSHMs or PSCTCFs colocalized in the region of interest. d, e An example of a tissue-shared PSE, THYN1, regulated by similar PSHM patterns in both BF (d) and LD (e). f, g An example of a tissue-specific PSE, IMMP2L, regulated by PSHM in BF (f) but exhibiting balanced regulation in LD (g). Source data are provided as a Source Data file.
In the promoter regions, a total of 1643 PSEs associated with PSHM were identified in both BF and LD, including that 196 are tissue-shared, 639 are BF-specific, and 808 are LD-specific. Specifically, 333 (LD) and 433 (BF) PSEs were associated with differential H3K27ac, 218 (LD) and 387 (BF) with differential H3K4me3, 303 (LD) and 284 (BF) with differential H3K4me1, 136 (LD) and 103 (BF) with differential H3K27me3, and 263 (LD) and 192 (BF) with differential CTCF binding (Fig. 5b, Supplementary Fig. 6b). These results account for 18.37% (LD) and 11.18% (BF) of all PSEs regulated in the promoter regions (Fig. 5c, Supplementary Fig. 6c). It was observed that tissue-conserved PSEs are regulated by similar PSHM, exhibiting the same phase preference in both BF and LD. For example, THYN1 and PLAC8 show higher paternal expression in both tissues, which correlates with increased paternal H3K27ac and H3K4me3 signals at their promoter regions (Fig. 5d, e, Supplementary Fig. 6d, e). Both genes play crucial roles in regulating cell division, differentiation, and apoptosis, further supporting that conserved PSEs are involved in regulating fundamental biological processes44 (https://www.proteinatlas.org/).
Furthermore, tissue-specific PSEs are regulated by corresponding tissue-specific PSHM. For instance, MUSK, an LD-specific PSE involved in muscle development45, exhibits higher maternal expression, which correlates with increased maternal H3K27ac signals at its promoter (Supplementary Fig. 6f, g). In contrast, IMMP2L, a BF-specific PSE associated with body weight gain in mice46, shows higher maternal expression and elevated maternal H3K27ac signals at its promoter, whereas the expression and histone modification signals of parental phases are balanced in LD (Fig. 5f, g). These results indicate that PSHM can mediate tissue-specific regulation, reflecting the tissue specificity of POE.
In regulatory regions, we identified 99 PSEs associated with differential H3K27ac, 49 with differential H3K4me3, 121 with differential H3K4me1, 30 with differential H3K27me3, and 87 with differential CTCF binding in LD (Fig. 5b). These account for 15.04% of loop-mediated PSEs (Fig. 5c). A typical example is MTSS1, which shows higher paternal expression correlated with increased H3K27ac and H3K4me1 signals in the upstream regulatory region connected to the promoter by a phase-balanced loop (Fig. 6a). To validate the EPI (Enhancer-Promoter interaction), we first performed luciferase reporter assays in pig fibroblast cells to confirm the element function of the regulatory region. Compared to the controls, significantly higher luciferase signals indicated that the cloned element exhibited transcriptional activation activity (Fig. 6b, Supplementary Data 8). Additionally, in LD tissue, the results of 3C-qPCR confirmed the interaction between the promoter and the regulatory element (Fig. 6c, Supplementary Data 9). To further substantiate the functional relevance of this interaction, we conducted CRISPR/Cas9-mediated deletion of the enhancer region in pig fibroblast cells. Notably, enhancer knockout led to a significant reduction in the expression of MTSS1 (Fig. 6d, e, Supplementary Fig. 6h and Supplementary Data 11), reinforcing the critical regulatory role of this enhancer in gene activation. In addition, the paternally highly expressed WSCD1 exhibited higher maternal H3K27me3 signals in its regulatory region (Supplementary Fig. 6i). These results suggest that PSHM acts as a regulatory element controlling the PSE.
a The MTSS1 is regulated by the PSHM within the loop-mediated regulatory region in LD. b Plasmid construction for luciferase reporter assay (top). The fluorescence intensity of reporter genes was used to examine the activity of the regulatory element. Bar plots, from left to right, represent the following groups: no vector, control plasmid, cells transfected with a recombinant plasmid containing the promoter sequence, and cells transfected with a recombinant plasmid containing both the enhancer and promoter sequences (bottom). c The 3C-qPCR assay for MTSS1 was designed to include genomic regions predicted by Hi-C to interact, restriction enzyme sites, histone modification peaks identified by Cut&Tag, and the positions of dual-luciferase reporter insertion sequences (top). The PCR detection model is illustrated in the lower-left panel. Through ligation reactions, crosslinking, and subsequent reverse crosslinking steps, the two distal genomic regions were brought into proximity, reducing the intervening sequence to less than 300 bp, thereby enabling detection by qPCR (lower-left). The PCR results are presented as a bar chart in the lower-right panel. gDNA input and IgG-enriched DNA served as positive and negative controls, respectively, using primers targeting the GAPDH promoter region. To further validate the experimental results, the proximal flanking sequence of the detected region was amplified as an internal control. The detection of ligation products in the +ligase group confirmed the successful ligation of distal regions following reverse crosslinking (lower-right). d CRISPR/Cas9-mediated deletion strategy and sgRNA design targeting the enhancer of MTSS1. e Relative expression of MTSS1 (mean ± SD) following disruption of enhancers separately by sgRNAs. In all bar graphs related to functional validation, each data point represents technical replicates, with a consistent sample size (n = 3). Two-tailed t-tests were performed to calculate the corresponding P-values. Source data are provided as a Source Data file.
The PSTAD boundary is distinguished by a high prevalence of chromatin interaction imbalances between the parental phases, which may facilitate the formation of PSEs19. We examined the PSEs within 200 kb surrounding the PSTAD boundaries. A total of 791 (12.0%) PSEs and 25.9% (7/27) of the imprinted genes in LD were identified (Supplementary Data 10). These included clusters of imprinted genes for which the imprinting mechanisms have been well-investigated and found to be associated with PSTAD boundaries, which include the SNRPN and H19-IGF2 locus19. Subsequent investigation focused on associations between PSLoops and PSEs. A total of 789 PSEs’ promoters (12.03%) and 12 imprinted genes’ promoters co-localized at anchors of PSLoops (Supplementary Data 10). For example, the maternal imprinted gene, KBTBD6, which exhibited paternal-specific loops anchored at its promoter region (Supplementary Fig. 6j). The distal anchor of this loop demonstrated paternal-specific CTCF binding, which further provides a rationale for the observed loop differences. These findings indicate that PSMR alone is insufficient to explain POE mechanisms. PSHM and phase-specific 3D chromatin structures also play crucial roles in regulating gene PSE.
Multi-omic analysis reveals a complex regulatory mechanism underlying POE
The regulation of PSE is also dependent on the interaction of multiple phase-specific epigenetic modifications. The study of this phenomenon provides valuable insight into the complexity of POE. Previous studies have shown that complex POE is typically controlled by imprinting control regions (ICRs)47,48,49, which regulate phase-specific methylation and, in turn, alter the other epigenic modifications between parental phases. In both tissues, we performed a colocalization analysis of PSMR within PSEs associated with complex POE, examining their relationship with other epigenetic modifications. Our results revealed that H3K27ac, H3K4me3, H3K4me1, and CTCF were predominantly colocalized with the hypomethylated phase, whereas H3K27me3 was primarily colocalized with the hypermethylated phase (Fig. 7a). These findings may be explained by the distinct affinities of active histone modifications for different DNA methylation states, as well as the methylation-sensitive binding properties of CTCF50,51,52. It also suggests potential interactions among different epigenetic marks, resulting in the complex POE on PSEs. To further elucidate the complex POE models, we investigated the imprinted genes IGF2 and H19, which have been well characterized in humans and mice53,54, using multi-omic data in pigs. At the DNA methylation level, complete methylation of the paternal haplotype and complete unmethylation of the maternal phase at an ICR between H19 and IGF2 lead to the paternal expression of IGF2 and maternal expression of H19 (Fig. 7b). At the CTCF level, the ICR contains binding motifs for CTCF. The methylation of the paternal ICR prevents CTCF binding, whereas unmethylation of the maternal ICR promotes CTCF binding. The binding of CTCF to the ICR results in the formation of an insulator between IGF2 and H19 on the maternal phase, preventing the enhancer’s activation of IGF2 and promoting H19 expression (Fig. 7b). In the paternal phase, methylation of the ICR disrupts CTCF binding, thereby preventing the formation of an insulator between H19 and IGF2. This allows the enhancer to activate IGF2 expression. At the chromatin conformation level, distinct TADs are observed between parental phases. A larger TAD containing the enhancers, H19, ICR, and IGF2 is observed in the paternal phase, whereas two subTADs (one containing the enhancer and H19, and the other containing the ICR and IGF2) are observed in the maternal phase. The presence of a TAD boundary on the maternal phase, situated between H19 and IGF2 enhances the interaction of regulatory elements with the nearby H19 while increasing the insulation from the more distal IGF2 (Fig. 7b, c). Our findings not only provided comprehensive support for the current POE model of IGF2 and H19 in humans and mice but also revealed that PSHM in the ICR co-localized upstream of H19. Specifically, in the maternal phase, this site exhibited the higher H3K27ac and H3K4me3 signals, indicative of a more active promoter, which may facilitate the initiation of H19 transcription in the maternal phase. To investigate more potential stable POE cases that could serve as models of complex POE model, we screened PSEs simultaneously associated with multiple phase-specific epigenetic features. We also revealed that SGCE–PEG10, resembling the regulatory pattern of H19–IGF2, is governed by a combination of phase-specific epigenetic modifications. The promoter regions of SGCE and PEG10 exhibit PSMR, which is characterized by hypermethylation in the maternal phase and hypomethylation in the paternal phase, in accordance with previous reports14,55. PSCTCF binding, along with differential H3K27ac and H3K4me3 signals, co-localized with the PSMR. CTCF exhibits a specific binding affinity for the unmethylated phase, accompanied by the higher H3K27ac and H3K4me3 signals. Conversely, in the maternal phase, promoter hypermethylation coincides with loss of CTCF binding and depletion of active histone marks (Fig. 7d). Hi-C analysis provided additional insight into the chromatin context underlying phase-specific regulation of SGCE-PEG10. In the paternal phase, the upstream region of the PSMR resides within an active A compartment, while in the maternal phase it lies within a repressive B compartment. The PSCTCF enables a PSTAD boundary extending to the promoter regions of SGCE and PEG10, establishing a stable PSTAD in the paternal phase due to the convergent orientation of the CTCF motifs at the PSTAD boundaries56. Within this PSTAD, a PSloop connects the upstream H3K27ac signal with the promoter regions of both genes (Fig. 7b). To comprehensively validate the functional relevance of this distal regulatory interaction, we employed a combination of dual-luciferase reporter assays, 3C-qPCR, and CRISPR/Cas9-mediated enhancer disruption. In pig fibroblast cells, the cloned elements exhibited robust transcriptional activation activity, producing significantly higher luciferase signals compared to controls (Supplementary Fig. 7a, Supplementary Data 8). In LD tissue 3C-qPCR further confirmed physical interactions between enhancer and promoter elements (Supplementary Fig. 7b, Supplementary Data 9). Moreover, deletion of the enhancer sequences in pig fibroblast cells led to a significantly decrease in the expression of SGCE and PEG10, further supporting the functional importance of these enhancers (Supplementary Fig. 7c–e Supplementary Data 11). Furthermore, comparison of chromatin interaction frequencies between phases revealed a significant paternal-specific deviation in the region 100–250 kb upstream of the PSMR (T-test, p = 0.000125, Fig. 7e), reinforcing the insulation function of this boundary. Together, these results demonstrate that the PSMR in the SGCE and PEG10 promoters acts as an ICR, forming PSTAD boundaries by regulating CTCF binding and enabling phase-specific EPI, leading to the imprinted expression of SGCE-PEG10 (Fig. 7f).
a The number of PSEs with PSHM in the opposite direction from PSMR and PSHM in the same direction from PSMR in BF and LD. b The well-known POE model of the H19-IGF2 locus, along with the epigenetic landscape of parental allele-specific modifications at this locus. c Schematic diagram of the regulation of H19 and IGF2 between the paternal phase and the maternal phase. d The 3D chromatin structure and epigenetic landscape of SGCE-PEG10 between the paternal phase and the maternal phase. The blue column highlighted ICR of SGCE-PEG10. The tracks from top to bottom represent ICR-related elements, including PSTAD boundaries, phase-specific compartments, PSLoops, PSCTCF, PSMR, differential DNA methylation levels, differential expression levels, and PSHMs between the paternal and maternal phases. e Chromatin interactions in 500 kb upstream and downstream of the PSMR between the paternal and maternal phases, showing a significant difference in the frequency of chromatin interactions in the 100–250 kb upstream region. Box plots illustrate the median (depicted by the center line) along with the 25th and 75th percentiles (represented by the box edges), while the whiskers show the 1.5x interquartile range above and below the box. The paired two-sided t-test was employed to calculate the differences of chromatin interactions between phases. f Schematic diagram of the regulation of SGCE- PEG10 between the paternal phase and the maternal phase.
Discussion
In this study, a wide pig breeds from Europe and China were utilized to construct Euro-Chinese hybrids. The number of heterozygous SNPs was approximately 5.04 million in European pigs and about 7.90 million in Chinese breeds. In hybrids, the number of heterozygous SNPs was around 10.20 million, which is higher than that in either European or Chinese indigenous breeds (Supplementary Data 5). Distinct variants between parental haploids are the basis for phasing. The higher number of heterozygous SNPs we used in hybrids indicates a 30% to 100% increase in phasing potential compared to purebred individuals. To avoid phasing failures when both parents and offspring are heterozygous using the trio binning strategy, we phase hybrids by combining pedigree information with long-read sequencing to accurately distinguish paternal and maternal alleles at heterozygous loci. This approach enhances phasing potential by an additional 5-6% for heterozygous loci between parents and offspring. In comparison to the previous study, which reported that 24% of reads were assigned to one haplotype57, our results remarkably boost the assignable reads to an average 55%.
For parental preference expressing genes, we perform GO and KEGG analysis to uncover the functions to better understand phenotypic advantages in hybrids from paternal and maternal phases, respectively. We identified distinct functional enrichment differences in paternal preferences and maternal preferences genes in both BF and LD. These differences align with the phenotypic characteristics of European and Chinese breeds. Specifically, European breeds are associated with higher efficiency in carbohydrate metabolism, lipid metabolism, and protein degradation with muscle-related traits, supporting rapid growth and lower fat deposition. In contrast, Chinese breeds exhibit higher activity in amino acid metabolism, organic hydroxy compound metabolism, and lipid metabolism in fat tissues, along with stronger responses to toxic stress. These metabolic traits are closely linked to advantages in endurance, stress adaptation, and fat distribution. Hybrids preferentially over-express the gene of the parent with the dominant traits and under-express the gene with the inferior traits, which potentially explains heterosis that hybrids combine both parental advantages. Using multi-omics data, we further explored the molecular mechanisms by which POE influences the preferential expression of advantageous parental alleles in hybrid offspring. We discovered novel imprinting features of ADH1C, which are introduced by PSMR and highly expressed in the maternal phase in both BF and LD. It has been reported to play key cell-autonomous roles in adipocyte differentiation58 and is related to carcass traits and marbling in pigs59, reflecting the advantages of Chinese origin in fat deposition and intramuscular fat formation. For tissue-specific PSEs, we identified KPTN that only displays paternal expression bias and is influenced by PSMR in LD. It has been reported to be significantly associated with loin muscle depth (-log10(p)=3.4803) in pheGWAS (https://pigbiobank.ipiginc.com/home) of pigs. Similarly, IMMP2L only exhibits maternal expression bias and is regulated by PSHM in BF. It is also significantly associated with teat number (-log10(p)=6.8557) in pheGWAS (https://pigbiobank.ipiginc.com/home) of pigs. In regard to PSE regulation by phase-specific regulatory elements, experimental validation has been conducted to demonstrate that MTSS1 is regulated by paternal EPI. This gene has also been recently reported to influence actin filament polymerization and cytoskeletal remodeling through its interaction with actin60. These findings further highlight the advantageous fat traits of the Chinese breed and the beneficial muscle traits of the European breed. It enhances our understanding of the epigenetic regulation underlying the inheritance of advantageous traits in hybrid offspring, particularly those associated with parental-specific gene expression.
To date, DNA methylation is the only epigenetic modification in mammals that has been definitively shown to cause genomic imprinting61,62,63. The hypermethylation in the promoter region of the gene directly represses transcription in a single phase, which is reflected in known imprinted genes like SNRPN, PEG3, and PON2, etc. In this study, our results demonstrate that PSHM, in addition to DNA methylation, was found to be able to independently cause PSE both in the promoter region and in the regulatory region. It is also supported by Li et al.‘s discovery of haplotype-specific silencers affecting ASE regulation in rice64. Recently, research has even proposed the existence of non-canonical imprinting in mice that is independent of DNA methylation65. From a three-dimensional genomic perspective, the distinct spatial folding of chromosomes from paternal or maternal phase often provides a differential physical environment in gene regulation, as previous reports of KCNQ1-KCNQ1OT1, SNRPN, and DLK1-DIO319,66,67. The CTCF plays a pivotal role at the boundaries between TADs, where it stabilizes the cohesin complex68. Our observations indicate that CTCF binding and DNA methylation between phases also appear to exhibit an antagonistic relationship, which is consistent with the result of methylation-sensitive binding of CTCF that has been reported in studies of a variety of cell lines and species52,69,70. These findings suggest that focusing on POE due to methylation alone may be inadequate, as multiple epigenetic modifications have the potential to contribute to the formation of POE.
The regulatory mechanism and model for imprinting are widely regarded as a prime example for elucidating the mechanisms of gene expression regulation1. Like the intricate and ingenious imprinting regulatory model of H19-IGF2, which was initially proposed in mice71. Over the past few decades, numerous studies have been conducted to investigate the regulatory mechanism of H19-IGF2 in pigs. Researches initially established the relationship between the imprinting status of porcine IGF2 and DNA methylation at the H19 differentially methylated region (DMR) and the IGF2 DMRs 1 and 272. Subsequently, a 3D genome study of pig haploids reported the identification of chromatin 3D interactions similar to those observed in the mouse model73. Despite these advancements, direct evidence that DNA methylation alters the 3D chromatin structure and EPI, as observed in mouse model, remains elusive in pigs. We observed clear differences from different hierarchical levels of chromatin structures, DNA methylation levels, histone modifications, and gene expressions between the parental phase in H19-IGF2 locus in hybrids’ muscle tissue. The imprinting regulatory model of H19-IGF2 established in mice is also applicable to pigs, highlighting the remarkable conservation of imprinting mechanisms across species. A comparable complex POE regulatory mechanism has also been identified in SGCE-PEG10 locus. The latest research on this locus in pigs has primarily focused on the phase-specific DNA methylation and the phase-specific chromatin accessibility in the promoter region14. However, our data provide strong evidence for the relationship between methylation and 3D chromatin structural changes. Furthermore, phase-specific enhancer regulation of this locus was identified in our findings and experimentally validated. This discovery offers a systematic insight into another complex imprinting regulatory model mediated by methylation-induced chromatin modifications, thus enhancing our understanding of the genomic imprinting regulatory network.
In summary, by enhancing phasing resolution in Euro-Chinese hybrids, we systematically dissected how DNA methylation, histone modifications, and 3D chromatin architecture contribute to POE in muscle and fat tissues. Our study expands the classical imprinting regulatory models of H19–IGF2 and SGCE–PEG10 in pigs and provides a framework for understanding allele-specific epigenetic regulation in complex trait inheritance.
Methods
Ethics statement
All animal procedures were performed in accordance with the guidelines for the care and use of laboratory animals issued by the Ministry of Agriculture and Rural Affairs of China. The study protocol was approved by the Animal Welfare and Ethics Committee of Jiangxi Agricultural University (Approval No. JXAU2020023, approved on December 17, 2021).
Pig materials
We selected three pairs of domestic pig breeds as parents and their hybrid offspring. Each kind of hybrid offspring has two biological replicates, which are composed of Large White × Erhualian, Berkshire × Ganxi and Duroc × Liangguang. Back fat (BF) and longissimus dorsi (LD) samples were collected, aged 90-100 days, from each hybrid pig. The samples were snap-frozen in liquid nitrogen and stored at −80 °C. To verify true hybrids and family information, we estimated breed proportion and parentage assignment in each family. SNPs were genotyped using CC1 Porcine SNP50K BeadChip (Illumina, San Diego, CA, USA), and Genome Studio software was employed to call the genotypes of SNPs74. Breed proportion estimated by using the ADMIXTURE75 program with default parameter and IBD, Mendelian errors was calculated with Plink (V1.9)76.
Luciferase reporter assay
Primary pig fibroblast cells were isolated according to our previous study77. The isolated cells were cultured in Dulbecco’s modified Eagle’s medium (DMEM, Gibco, USA) supplemented with 12% fetal bovine serum (ExCell Bio, Australia), 100 IU/ml penicillin, and 100 µg/ml streptomycin in a humidified atmosphere containing 5% CO 2 at 37 °C. The genomic region of promoter for genes (MTSS1, SGCE, and PEG10) was synthesized and cloned into a pGL3-basic reporter vector78 using HindIII and XhoI sites at the 5’ and 3’ ends, respectively. To further dissect into enhancer fragments based on Cut&Tag signals. DNA fragment of enhancers was cloned into the pGL3-promoter reporter plasmid using BamHI and SalI sites at 5’ and 3’ends, respectively. All restriction enzymes and T4 DNA ligase were purchased from New England Biolabs (Ipswich, MA, USA). Gel extraction, plasmid purification and PCR purification kits were obtained from IGE Biotechnology (Guangzhou, China). To examine the identified enhancers activity regulated MTSS1 and SGCE-PEG10. Cells were inoculated into a 24-well plates with 1×105 cells per well for 1 day before transfection. The following day, cells were co-transfected with Plasmids (900 ng pGL3/pGL3-promoter DNA insert/pGL3-promoter-enhancer DNA insert and 100 ng pRLTK as internal control), the group with no transfection as the negative control, an empty plasmid (pGL3) served as the control group. Transfection was carried out using Lipofectamine 3000 (Invitrogen) according to the manufacturer’s instructions. After 48 h, the firefly and renilla luciferase activities were measured using the Dual Luciferase Reporter Assay System Kit (Promega, E1910)79 and a multifunctional microplate reader (Biotek, USA) according to the manufacturer’s instructions. The final relative luciferase activity was calculated as the ratio of firefly luciferase activity to renilla luciferase activity. All data were obtained from three replicates and statistical analysis was performed with a two-tailed T test.
Quantitative analysis of chromatin conformation capture (3C-qPCR) assay
The 3C-qPCR assay was conducted to validate the interaction between the regulatory element and promoter80. Briefly, LD tissue (~2 g) was crosslinked with 2% paraformaldehyde in PBS buffer under vacuum for 30 min on ice. The cross-linking reaction was stopped by adding ice-cold 2 M glycine (final concentration = 0.125 M) for 5 min on ice. The crosslinked tissue was finished, and the nuclei were separated and purified with nuclei extraction (NE) buffer. To prepare for digestion, the purified nuclei pellets were resuspended in 1.2× restriction buffer and treated with 0.2% SDS at 37 °C for 20 min. Later, SDS was removed by incubating with 2% Triton X-100 for 30 min. 3 C analysis was performed on the MTSS1 or SGCE-PEG10 locus using three REs (EcoRI and HindIII) first, followed by Sau96I independently. Overnight digestion at 37 °C was carried out with 400 U of RE each. Incubation at 65 °C for 20 min stopped the digestion process. Ligation was carried out with 100 U of T4 DNA ligase, initially at 16 °C for 5 h, then at room temperature for 45 min. Reverse cross-linking was performed overnight at 65 °C using proteinase K. Following crosslink reversal, DNA was recovered using phenol/chloroform extraction and ethanol precipitation. The relative interaction frequencies of one fragment with another were calculated using quantitative PCR (qPCR) data and the SYBR Green I master mix. The gDNA input and IgG beads were tested for the GAPDH promoter primer set as the positive and negative controls. In order to account for primer amplification efficiencies, two controls were used to assess the specific ligation events. One ligase is present, while the other is absent as a negative control within the group. Each primer pair set in the qPCR dataset was normalized to the free beads (IgG) control sample. For quantification purposes, the chromatin state of two additional possible interactions was assumed to be stable between groups when compared to the nascent short unbroken gDNA in the target 3 C site near the DNA-protein (crosslink region) complex. All results in this study represent the average of three replicates.
CRISPR/Cas9-Mediated Targeted Disruption of Enhancer Regions
To investigate the regulatory function of MTSS1 and SGCE-PEG10 enhancer regions, we designed two specific sgRNAs for each enhancer using the CRISPOR online tool (http://crispor.tefor.net), selecting sequences with minimal predicted off-target effects based on MIT specificity score greater than 95. The sgRNAs were cloned into the pSpCas9(BB)−2A-Puro (PX459) (#62988) vector using the Golden Gate assembly method. Primary pig fibroblasts were cultured under the same conditions as described above and seeded to reach 70–80% confluency at the time of electroporation. Cells were detached using 0.25% Trypsin (Gibco, USA), centrifuged at 200 × g for 5 min, and resuspended in 100 µL Transter-E buffer containing a total of 10 µg sgRNA plasmid mixture (5 µg for each of two sgRNAs). Electroporation was performed using a dual-pulse program (FF113 + CA137) in Lonza Nucleocuvette™ vessels77. Following electroporation, cells were transferred into 60 mm dishes containing 5 mL of pre-warmed complete medium. At 48 h post-electroporation, puromycin (2 µg/mL) was added for selection over 2 days and cells were collected for downstream analysis. Validation of the enhancer deletion was performed by PCR amplification of genomic DNA from both wild-type and knockout cells. The primer sequences used for the enhancer region of MTSS1 were MTSS1_F: ACTGGGGGCTTGATGTGTAA and MTSS1_R: TTGTGGACCTAATGGTGGTG. For the enhancer region of SGCE–PEG10, the primers were SGCE–PEG10_F: AGGGAGTGGAACGATGTGTA and SGCE–PEG10_R: CCGCCTGTTCACCTCCTG. To assess the impact of enhancer deletion on gene expression, RT–qPCR was performed to quantify MTSS1, SGCE and PEG10 transcript levels in cells before and after enhancer deletion. Results are presented as the mean of three replicates.
Short reads WGS (illumina) library construction and sequencing
Genomic DNA were extracted with 0.2 μg per sample using NEBNext® UltraTM DNA Library Prep Kit for illumina (NEB, USA, Catalog #: E7370L) following the manufacturer’s protocol and added index codes to each sample. Briefly, DNA samples were randomly sheared to ~350 bp in size by sonication. Then, end polishing, A-tailing, ligation, and PCR amplification were performed on the fragmented DNA successively. At last, PCR products were purified by AMPure XPsystem (Beverly, USA). We assessed library quality and quantified library volume by the Agilent 5400 system (Agilent, USA) and real-time PCR (1.5 nM), respectively. The eligible library was sequenced with illumina platforms using PE150 strategy in Novogene Bioinformatics Technology Co., Ltd (Beijing, China).
Long reads WGS (ONT) library construction and sequencing
A total of 49 μl genomic DNA was sheared to fragments of a certain length with g-TUBE, of which 1 μl was used to detect the quality and quantity of fragmented DNA and the rest was used to perform the following steps according to the manufacturer’s instructions. Briefly, fragmented DNA were then subjected to the NEBNext FFPE DNA repair and NEBNext end-repair / dA-tailing steps. Then DNA was ligated to the Adapter using T4 Ligase (from the NEBNext Quick Ligation Module). Following Adapter ligation, the products were purified by adding the AMPure XP beads in accordance with the ONT purification protocol. The final qualified DNA libraries were then used to sequence on ONT PromethION platform.
RNA-seq library construction and sequencing
The backfat and longissimus dorsi muscle of the hybrid progeny were collected and total RNA was isolated from each sample. The NEB Next Ultra RNA Library Prep Kit for illumina (NEBUSA, Catalog #: E7530L) was used to generate the sequencing library as recommended by the manufacturer, and index codes were added to each sample. In brief, mRNA was purified from total RNA with poly-T oligo-attached magnetic beads. After fragmentation, random hexamer primers were used for the first-strand cDNA synthesis. Then the second strand cDNA synthesis was performed with dUTP, instead of dTTP. Subsequently, end repairing, A-tailing, adapters ligation, size selection, USER enzyme digestion, amplification, and purification were sequentially performed to produce the final directional library. At last, the Agilent 5400 system (Agilent, USA) was used to assess library quality and real-time PCR (1.5 nM) was used to quantify library volume. The quantified libraries were collected on the illumina platform for sequencing.
Genomic methylation library construction and sequencing
Genomic methylation library using TET-assisted pyridine borane sequencing (TAPS)81 technology and detected whole genome methylation levels at Geneplus-Beijing Institute. Genomic DNA was extracted following the manufacturer’s instructions and the purity, integrity, and concentration of the DNA samples were also checked. After the sample was qualified, a certain proportion of negative and positive control DNA was added. Genomic DNA were randomly sheared to 200 ~ 300 bp in size with Covaris S220, followed by end repairing, A-tailing, and unique adaptor from the DNB sequencing instrument ligation. Then the TAPS treatments were conducted, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) were oxidized to 5-carboxyl cytosine (5caC) by TET2 enzyme, then 5caC was treated with pyridine borane and reduced to dihydrouracil (DHU) and DHU can be used as PCR template. At last, PCR amplification was performed and the PCR product of the methylated cytosine (5mC) converted to thymine (T) was generated with MGI T7 sequencing platform. The product was quantified by Qubit fluorometer and the length distribution of the product was also quantified. After the library was constructed and the quantified library was sequenced.
CUT&Tag library construction and sequencing
CUT&Tag was performed using NovoNGS® CUT&Tag 2.0 High-Sensitivity Kit (NovoProtein, Catalog #N259-YH01, RRID: AB_2886259) and optimized as previously described82. Briefly, the cells are bound to Concanavalin A-coated magnetic beads, and the cell membrane is permeabilized by Digitonin. The enzyme pA-Tn5 Transposase precisely binds the DNA sequence near the target protein under the antibody guidance and results in factor-targeted tagmentation. DNA sequence is tagmented, with adapters added at the same time at both ends, which can be enriched by PCR to form the sequencing-ready libraries. After the PCR reaction, libraries were purified with the AMPure beads and library quality was assessed on the Agilent Bioanalyzer 2100 system. The clustering of the index-coded samples was performed on a cBot Cluster Generation System using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer’s instructions. The library preparations were sequenced on Illumina Novaseq platform at Tianjin Novogene Bioinformatic Technology Co., Ltd (Beijing, China) and 150 bp paired-end reads were generated.
In situ Hi-C library construction and sequencing
Longissimus dorsi tissue of hybrid pigs at 90-100 days were sampled for constructing Hi-C library following the protocol as pervious reported33 with minor modification. Simply, Longissimus dorsi tissue was crosslinked by 40 ml 2% formaldehyde solution for 15 min and subsequently added Glycine to quench the crosslinking reaction. Crosslinked tissue was ground up with liquid nitrogen, and nuclei were extracted by centrifuge for three times, which included filtration at 4000 rpm and 4 °C for 20 min and extraction at 14,000 rpm and 4 °C for 10 min. The chromatin was then solubilized with dilute SDS and incubated for 10 min at 65 °C. After quenching the SDS with Triton X-100, the restriction enzyme (400 units MboI) was administered overnight at 37 °C on a rocking platform. The ends of the DNA were labeled by biotin-14-dCTP (Invitrogen, 19518-018) and were linked for crosslinked fragments. After proximal chromatin DNA was religated, the chromatins were reversed crosslinked by incubating with proteinase K (Invitrogen, 25530-031) at 65 °C and then extracting DNA using phenol-chloroform. T4 DNA polymerase (NEB, M0203L) was used to discard biotin-C from non-ligated fragment ends. Sonication was used to cleave fragments to a size of 100-500 base pairs. T4 DNA polymerase, T4 polynucleotide kinase (NEB, M0201), and Klenow DNA polymerase were used to repair the fragment ends. Streptavidin magnetic beads were used to enrich biotin-tagged target fragments. The fragment ends were A-tailed by Klenow (NEB, M0212L) and then ligated using an Illumina paired-end sequencing adaptor (Illumina, PN 1001782, 1001783). Hi-C libraries were constructed following Illumina protocols and sequenced using PE150 reads with sequencing in Illumina HiSeq instruments.
Iso-seq library construction and sequencing
The total RNA of each sample was isolated according to guidelines of the manufacturer. RNA integrity was checked with an Agilent 2100 Bioanalyzer Instrument using the RNA 6000 Nano kit, and sample quality with RIN > 7 was used for Iso-Seq library preparation. The first-strand cDNA synthesis was performed and the second-strand cDNA synthesis was also subsequently conducted according to PacBio instructions. Following PCR cycle optimization, PCR amplification was performed. Then, PCR products were purified with SMRTbell cleanup beads. The amplified library was subjected to the following steps: end repairing, A-tailing, SMRTbell adapter ligation, Nuclease treatment, and purification. After these processes, the final sequencing library was produced. At last, quality assessment of the library was implemented using Agilent 2100 Bioanalyzer Instrument and Qubit fluorometer. The final library was sequenced with the PacBio sequencing platform in Novogene Bioinformatics Technology Co., Ltd (Beijing, China).
Phasing pipeline construction
WGS data mapping
Raw sequencing reads from short-read WGS (illumina) library were adapter trimmed and filtered with a length less than 100 bp using Fastp v0.23.283. After that, High quality reads were mapped to Sus scrofa 11.1 reference genome using BWA v0.7.1784 mem algorithm. Mapped reads with high quality score (MAPQ > 30) were extracted. Then, PCR duplicates were filtered, sorted, and converted into a BAM file using SAMTools v1.985. PromethION long reads sequencing data from hybrid offspring were also mapped to Sus scrofa 11.1 using Minimap2 v2.1786 with the “-ax map-ont” option and the mapped reads from libraries were merged and sorted using SAMTools85.
SNP calling
The mapped files in BAM format from the second generation of the whole genome sequencing data were processed to obtain reliable variant files in VCF format by following these steps. The Genome Analysis Toolkit (GATK) v4.1.4.187 was used to detect the initial SNP calling. The GATK AddOrReplaceReadGroups and MarkDuplicates was used to add group information of samples and to mark PCR duplicates, respectively. Then the GATK HaplotypeCaller and SelectVariants was used to detect variants and to extract SNP information. After variant calling, high quality heterozygous SNPs were reserved for subsequent phasing analysis.
Data phasing
For phasing, we first prepared a variants.vcf (variant file in VCF format that records genetic variants information of trio), the mapped files of father.bam, mother.bam, offspring.bam1 (short-read) and offspring.bam2 (long-read), a trio.ped file that describes the family relationship between the samples and ref.fasta (the genome reference in FASTA format). WhatsHap phase v1.435 was used to phase diploid offspring samples using the following command: WhatsHap phase --reference=ref.fasta --ped=trio.ped -o phased.vcf variants.vcf father.bam mother.bam offspring.bam. Next, the output of phased.vcf file were used to tagged reads (WhatsHap haplotag v1.435) from BAM files of short-reads and long-read WGS data into the paternal phase and the maternal phase. The Integrative Genomics Viewer (IGV) v2.6.188 was used to visualize the tagged BAM files.
Evaluation and methods comparison
We assessed the applicability of the phasing process by calculating switch error and phasing efficiency. The switch error is calculated by the proportion of father-specific (or mother-specific) k-mers from parents that are present in the maternal-specific (or paternal-specific) k-mers from offspring. The WGS data of the father and mother were processed into k-mers with KMC v3.2.189. The k-mers from father and mother were subtracted to each other using KMC_tools simply with the parameters follow: “-ci1 -cx1 -ci1 kmer_subtract” to generate the father-specific and mother-specific k-mers. Additionally, the phased SNPs obtained from each trio pedigree were replaced with the reference genome in order to obtain the parental genome. The parental genome was then transferred into k-mers using KMC v3.2.189. Similarly, paternal-specific and maternal-specific k-mers were generated from the parental genome using KMC_tools simple with the same parameters: -ci1 -cx1 -ci1 kmer_subtract. The phasing efficiency refers to the percentage of reads in the tagged BAM file that are labeled with “HP1” and “HP2” out of the total number of reads. The comparison of phasing methods involved applying phase-tag and trio binning38 approaches to the WGS data of each sample to extract phase-specific reads. These reads were then processed into k-mers using KMC_tools dump, and the resulting k-mers from both methods were compared to assess the consistency of the phasing approaches.
RNA-seq dataset processing and PSEs identification
The raw data from RNA sequencing (RNA-seq) underwent the same preprocessing procedures as the raw data from short-read WGS. The high-quality reads were mapped to Sus scrofa 11.1 reference genome using HISAT2 v2.2.090 and then sorted using SAMTools85. The BAM files obtained after alignment were processed using WhatsHap haplotag v1.435 and phased SNPs to assign reads into paternal.bam and maternal.bam. Gene expression (in count and TPM) for the BAM file, paternal.bam and maternal.bam was calculated with StringTie v2.2.191 with parameters for strand-specific RNA-seq92. For PSEs identification, genes with less than 10 counts in any phase were filtered out and binomial test was conducted in R v4.1.1 with threshold of FDR < 0.05 and |log2FC | > 1.
DNA methylation dataset processing and PSMR identification
The raw data from Genomic Methylation (GM) sequencing underwent the same preprocessing procedures as the raw data from short-read WGS. The high-quality reads were aligned to Sus scrofa 11.1 reference genome using BWA v0.7.1784. Following the mapped reads were sorted and indexed using SAMtools v1.985. For further analysis, reads were marked PCR duplicates with sambamba v0.8.293. We focused only on CpG methylation. methylation calling of samples using Astair v3.3.281 with the following parameters: -m mCtoT --skip_clip_overlap False --minimum_base_quality 20 --minimum_mapping_quality 10 --ignore_orphans True --max_depth 10000 --start_clip 0 --end_clip 5 --context CpG. MethylDackel v0.6.194 was used for methylation calling of positive and negative control sequences using the following parameters: --mCtoT -p 10 -q 20 --cytosine_report -r L --keepDupes. The BAM files obtained after mapping followed the same processes as RNA-seq phasing, being divided into paternal.bam and maternal.bam, and methylation levels were calling using Astair v3.3.281 with the same parameters. Differentially methylated sites (DMSs) are defined as sites where the methylation level differs by more than ±0.3 between parental phases. Based on this criterion, PSMR consists of at least two contiguous DMSs with adjacent sites no more than 500 bp apart and consistent direction of methylation difference.
CUT&Tag dataset processing and identification of PSHM and PSCTCF
The raw data from CUT&Tag underwent the same preprocessing procedures as the raw data from short-read WGS. The high-quality reads were mapped to Sus scrofa 11.1 reference genome using BWA84 mem algorithm. PCR duplicates were marked using Picard v2.20 (https://broadinstitute.github.io/picard/) and uniquely mapped reads were extracted using SAMTools85 and were converted into BAM files. The peak calling of each histone modification and CTCF was conducted with MACS2 v2.1.195, utilizing the following parameters: “-f BAMPE -q 0.05 --nomodel --shift −0 --keep-dup all”. The narrow peaks identified from different samples were merged by using bedtools v2.25.0 merge96 to generate the consensus peaks of each assay in diploid97. Refer to RNA-seq phasing, the BAM files were tagged by parental phases with WhatsHap haplotag v1.435. The peak calling for parental phases was also conducted with MACS2 v2.1.195 and utilized the same parameters as those in diploid. Peaks of each assay from the paternal phase and maternal phase were then filtered with the corresponding diploid consensus peaks. The phased peak density were calculated using bedtools v2.25.0 coverage96 on the filtered phased peaks and parent.bam files. PSHM identification and PSCTCF identification was performed with peak density in both parental phases in R v4.1.1, after filtering low peaks density (reads coverage less than 10), binomial test was conducted with threshold of FDR < 0.05 and |log2FC | > 1.
Hi-C map construction and 3D chromatin structure identification
The paired-end Hi-C reads from different libraries were separately mapped to Sus scrofa 11.1 reference genome using juicer v1.698 pipeline with modified parameters “-s MboI -S early” and others are defaults. The “merged_nodups.txt” files recorded all mapped chimeric reads were filtered with MAPQ > 30 and then were merged using “mega.sh” (juicer software contained). Hi-C matrix generated into .hic files using juicer tools v1.22.01 with resolutions of 100k, 25k, 10k, 5k and 1k. A compartment was detected for Hi-C map with Homer v4.1199 at a resolution of 100 kb, A/B compartment was classified with gene density and GC content, A for positive value of PCA1, while B for negative value. TADs were called with HiCexplorer v3.7.2100 hicFindTADs with default parameters at a resolution of 20 kb. Loops were identified by Mustache v1.0.1101 with resolution parameter “--resolution 1000” and others optional parameters “--p Threshold 0.05 --normalization weight”. All identified loops passed the filtering criterion of being supported by ≥ 3 read pairs.
Phased Hi-C map and phase-specific 3D chromatin structure identification
The “merged_nodups.txt” file generated by diploid Hi-C map of each sample was processed with scripts of diploid.sh (juicer software contained) and VCF file (phased SNPs contained) to tag the paternal.hic and maternal.hic. The compartments of the phased Hi-C map were identified using the same method as the diploid Hi-C map. Differences in compartment type within the same 100 kb bin between paternal and maternal phases were classified as phase-specific compartments. PSTAD boundaries were identified between parental Hi-C maps with TADcompare102. The phase-specific loops were identified between the paternal.hic and maternal.hic files by using the Mustache v1.0.1101 with the optional parameters “-pt 0.05 -pt2 0.05 -r 5000 -st 0.88”.
Full-length transcripts dataset processing and DET identification
The Iso-Seq data were processed using the IsoSeq3 v3.1.0 on the PacBio Bioconda (https://github.com/PacificBiosciences/pbbioconda). The primary process for each sample of BF and LD involved the generation of consensus sequences, the demultiplexing of primers, the refinement, and the acquisition of full-length non-concatemer reads. Subsequently, a FOFN file was composed to merge the BF and LD samples, resulting in the generation of the clustered.bam file. The clustered.bam was processed with IsoSeq3 cluster and polish to create high quality, full-length transcript sequences. Then we mapped transcripts to Sus scrofa 11.1 reference genome with gmap v2019-06-10103. After acquiring HQ transcripts, following steps were conducted with Cupcake v29.0.0 (https://github.com/Magdoll/cDNA_Cupcake), which consists of the collapse of redundant isoforms and 5'end degradation deletion. Ultimately, the transcripts panel for BF and LD were identified. The phased RNA-seq data of each sample was mapped to the transcripts panel using kallisto v0.48.0104 and results were converted to expression matrix (count and TPM). DET identification was performed using binomial test in R (transcripts with counts less than 100 was removed) between phases with threshold of P-value < 5e-8 and |log2FC | > 1.
CTCF motif identification
Sequences corresponding to the peaks of interest was extracted from the reference genome. CTCF motif was referred to database (http://jaspar2022.genereg.net/matrix/MA0139.1). We aligned sequences with known motifs with online tools (https://meme-suite.org/meme/tools/fimo) and top three of results for each peak was selected. Thus, the number and direction of CTCF motif was identified.
Multi-omics annotation of POE on PSEs
In each sample, the identified promoters or gene body of PSEs were used as target regions to examine their colocalization with different phase-specific epigenetic modification signals. PSMRs, PSHMs, and PSCTCFs were directly intersected with the promoters and gene body of PSEs to establish potential associations. PSEs potentially associated with PSTAD boundaries were identified by their presence within a 200 kb region surrounding the PSTAD boundaries. The identification of PSLoops associated PSEs is based on the presence of one of the loop anchors within the promoter the PSE. PSEs regulated by PSMRs were identified through the colocalization of their promoters with PSMRs. Similarly, PSEs regulated by PSHMs were identified either by direct colocalization with their promoters or through looping interactions between the promoter and distal PSHMs.
Multi-Omics visualization
All tracks of multi-omics data are visualized using pyGenomeTracks V3.7105. Methylation levels are calculated as percentages within the range of 0 to 1.0. Gene expression, histone modifications, and CTCF signal intensities were converted to normalized (RPKM) bw files using deepTools V 3.5.1106 with optional parameter --normalize before visualization. Hi-C maps are converted from interaction matrices (.hic files) to normalized.cool files for visualization.
Statistics and reproducibility
In this study, the cor.test function in R software (version 4.1.1) was utilized to perform Pearson correlation analyes with two-sided t-tests. For both PSE and PSHM, p-value < 0.05 and |log2FC | > 1 were considered indicative of statistical significance. In the experimental validation section, two-sided t-tests were employed to assess the significance of differences between the control and experimental groups. The exact sample sizes were clearly indicated in the legends of the box plots to ensure transparency and reproducibility of the findings.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive107 in National Genomics Data Center108, China National Center for Bioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences under the accession code CRA026879. Source data are provided with this paper.
Code availability
All detailed code used in this study are available on the GitHub repository at https://github.com/ChenyuLi0627/Mechanism-of-Parent-of-origin-Effects-revealed-by-Multi-omic-data-in-Euro-Chinese-Hybrid-Pigs and are also available on the Zenodo at https://doi.org/10.5281/zenodo.15708739. Publicly available software used in this study are listed in the Methods.
References
Barlow, D. P. & Bartolomei, M. S. Genomic imprinting in mammals. Cold Spring Harb. Perspect. Biol. 6, a018382 (2014).
Toda, E., Ohnishi, Y. & Okamoto, T. An imbalanced parental genome ratio affects the development of rice zygotes. J. Exp. Bot. 69, 2609–2619 (2018).
Pliota, P. et al. Selfish conflict underlies RNA-mediated parent-of-origin effects. Nature 628, 122–129 (2024).
Falls, J. G., Pulford, D. J., Wylie, A. A. & Jirtle, R. L. Genomic imprinting: implications for human disease. Am. J. Pathol. 154, 635–647 (1999).
Lee, S. & Wevrick, R. Identification of novel imprinted transcripts in the Prader-Willi syndrome and Angelman syndrome deletion region: further evidence for regional imprinting control. Am. J. Hum. Genet. 66, 848–858 (2000).
Mabb, A. M., Judson, M. C., Zylka, M. J. & Philpot, B. D. Angelman syndrome: insights into genomic imprinting and neurodevelopmental phenotypes. Trends Neurosci. 34, 293–303 (2011).
Peeters, K., Eppink, T., Ellen, E., Visscher, J. & Bijma, P. Indirect genetic effects for survival in domestic chickens (Gallus gallus) are magnified in crossbred genotypes and show a parent-of-origin effect. Genetics 192, 705–713 (2012).
Sakamoto, Y. et al. Phasing analysis of lung cancer genomes using a long read sequencer. Nat. Commun. 13, 3464 (2022).
Han, Z. et al. Diploid genome architecture revealed by multi-omic data of hybrid mice. Genome Res. 30, 1097–1106 (2020).
Zhang, X., Wu, R., Wang, Y., Yu, J. & Tang, H. Unzipping haplotypes in diploid and polyploid genomes. Computational Struct. Biotechnol. J. 18, 66–72 (2020).
Nezer, C. et al. An imprinted QTL with major effect on muscle mass and fat deposition maps to the IGF2 locus in pigs. Nat. Genet. 21, 155–156 (1999).
Lawson, H. A., Cheverud, J. M. & Wolf, J. B. Genomic imprinting and parent-of-origin effects on complex traits. Nat. Rev. Genet. 14, 609–617 (2013).
Neugebauer, N., Luther, H. & Reinsch, N. Parent-of-origin effects cause genetic variation in pig performance traits. Animal 4, 672–681 (2010).
Quan, J. et al. Multi-omic characterization of allele-specific regulatory variation in hybrid pigs. Nat. Commun. 15, 5587 (2024).
MacPhillamy, C. et al. DNA methylation analysis to differentiate reference, breed, and parent-of-origin effects in the bovine pangenome era. GigaScience 13, giae061 (2024).
Waters, A. J. et al. Parent-of-origin effects on gene expression and DNA methylation in the maize endosperm. Plant Cell 23, 4221–4233 (2011).
Hofmeister, R. J. et al. Parent-of-Origin inference for biobanks. Nat. Commun. 13, 6668 (2022).
Guo, Z. et al. Global epigenomic analysis indicates that epialleles contribute to Allele-specific expression via Allele-specific histone modifications in hybrid rice. BMC genomics 16, 1–9 (2015).
Richer, S. et al. Widespread allele-specific topological domains in the human genome are not confined to imprinted gene clusters. Genome Biol. 24, 40 (2023).
Ye, J. et al. Transition of allele-specific DNA hydroxymethylation at regulatory loci is associated with phenotypic variation in monozygotic twins discordant for psychiatric disorders. BMC Med. 21, 491 (2023).
Schachtschneider, K. M. et al. Adult porcine genome-wide DNA methylation patterns support pigs as a biomedical model. BMC genomics 16, 1–18 (2015).
Kern, C. et al. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat. Commun. 12, 1821 (2021).
Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57 (2012).
Mikkelsen, T. S. et al. Comparative epigenomic analysis of murine and human adipogenesis. Cell 143, 156–169 (2010).
Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
Lee, J.-E. et al. H3K4 mono-and di-methyltransferase MLL4 is required for enhancer activation during cell differentiation. elife 2, e01503 (2013).
Kang, S. et al. Identification of nuclear hormone receptor pathways causing insulin resistance by transcriptional and epigenomic analysis. Nat. cell Biol. 17, 44–56 (2015).
Joshi, S. et al. TEAD transcription factors are required for normal primary myoblast differentiation in vitro and muscle regeneration in vivo. PLoS Genet. 13, e1006600 (2017).
Noordermeer, D. et al. The dynamic architecture of Hox gene clusters. Science 334, 222–225 (2011).
Mousavi, K. et al. eRNAs promote transcription by establishing chromatin accessibility at defined genomic loci. Mol. cell 51, 606–617 (2013).
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).
Pan, Z. et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat. Commun. 12, 5848 (2021).
Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
Pollard, M. O., Gurdasani, D., Mentzer, A. J., Porter, T. & Sandhu, M. S. Long reads: their purpose and place. Hum. Mol. Genet. 27, R234–R241 (2018).
Patterson, M. et al. WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Computational Biol. 22, 498–509 (2015).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Low, W. Y. et al. Haplotype-resolved genomes provide insights into structural variation and gene content in Angus and Brahman cattle. Nat. Commun. 11, 2071 (2020).
McDaniell, R. et al. Heritable individual-specific and allele-specific chromatin signatures in humans. Science 328, 235–239 (2010).
Cresswell, K. G. & Dozmorov, M. G. TADCompare: an R package for differential and temporal analysis of topologically associated domains. Front. Genet. 11, 158 (2020).
Wu, J. et al. PHA-4/FoxA senses nucleolar stress to regulate lipid accumulation in Caenorhabditis elegans. Nat. Commun. 9, 1195 (2018).
Dutta, P., Maiti, I., Ghose, A., Chauhan, R. & Maiti, S. The actin-associated protein Kaptin modulates F-actin barbed-end dynamics. bioRxiv, 2023.2010. 2030.564586 (2023).
Prendergast, J. G., Tong, P., Hay, D. C., Farrington, S. M. & Semple, C. A. A genome-wide screen in human embryonic stem cells reveals novel sites of allele-specific histone modification associated with known disease loci. Epigenetics Chromatin 5, 1–13 (2012).
Mao, M. et al. Multifaced roles of PLAC8 in cancer. Biomark. Res. 9, 1–10 (2021).
Punga, A. R., Maj, M., Lin, S., Meinen, S. & Rüegg, M. A. MuSK levels differ between adult skeletal muscles and influence postsynaptic plasticity. Eur. J. Neurosci. 33, 890–898 (2011).
Clarke, R. A. et al. Immp2l Enhances the Structure and Function of Mitochondrial Gpd2 Dehydrogenase. Int. J. Mol. Sci. 25, 990 (2024).
Henckel, A. et al. Histone methylation is mechanistically linked to DNA methylation at imprinting control regions in mammals. Hum. Mol. Genet. 18, 3375–3383 (2009).
Kim, J. Multiple YY1 and CTCF binding sites in imprinting control regions. Epigenetics 3, 115–118 (2008).
Yang, H. et al. Allele-specific H3K9me3 and DNA methylation co-marked CpG-rich regions serve as potential imprinting control regions in pre-implantation embryo. Nat. cell Biol. 24, 783–792 (2022).
Onuchic, V. et al. Allele-specific epigenome maps reveal sequence-dependent stochastic switching at regulatory loci. Science 361, eaar3146 (2018).
King, A. D. et al. Reversible regulation of promoter and enhancer histone landscape by DNA methylation in mouse embryonic stem cells. Cell Rep. 17, 289–302 (2016).
Hashimoto, H. et al. Structural basis for the versatile and methylation-dependent binding of CTCF to DNA. Mol. cell 66, 711–720.e713 (2017).
Bell, A. C. & Felsenfeld, G. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482–485 (2000).
Hark, A. T. et al. CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature 405, 486–489 (2000).
Ahn, J. et al. The landscape of genomic imprinting at the porcine SGCE/PEG10 locus from methylome and transcriptome of parthenogenetic embryos. G3: Genes, Genomes. Genetics 10, 4037–4047 (2020).
Nora, E. P. et al. Molecular basis of CTCF binding polarity in genome folding. Nat. Commun. 11, 5612 (2020).
Gigante, S. et al. Using long-read sequencing to detect imprinted DNA methylation. Nucleic acids Res. 47, e46–e46 (2019).
Gautheron, J. et al. ADH1B, the adipocyte-enriched alcohol dehydrogenase, plays an essential, cell-autonomous role in human adipogenesis. Proc. Natl Acad. Sci. 121, e2319301121 (2024).
Kumar, H. et al. Transcriptome analysis to identify long non coding RNA (lncRNA) and characterize their functional role in back fat tissue of pig. Gene 703, 71–82 (2019).
Matskova, L., Zheng, S., Kashuba, E., Ernberg, I. & Aspenström, P. MTSS1: beyond the integration of actin and membrane dynamics. Cell. Mol. Life Sci. 81, 1–11 (2024).
Li, E., Beard, C. & Jaenisch, R. Role for DNA methylation in genomic imprinting. Nature 366, 362–365 (1993).
Ma, X. et al. Parental variation in CHG methylation is associated with allelic-specific expression in elite hybrid rice. Plant Physiol. 186, 1025–1041 (2021).
Gimelbrant, A., Hutchinson, J. N., Thompson, B. R. & Chess, A. Widespread monoallelic expression on human autosomes. Science 318, 1136–1140 (2007).
Ouyang, W. et al. Haplotype mapping of H3K27me3-associated chromatin interactions defines topological regulation of gene silencing in rice. Cell Rep. 42, 112350 (2023).
Raas, M. W., Zijlmans, D. W., Vermeulen, M. & Marks, H. There is another: H3K27me3-mediated genomic imprinting. Trends Genet. 38, 82–96 (2022).
Zhao, J. et al. Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol. cell 40, 939–953 (2010).
Zhang, H. et al. Long noncoding RNA-mediated intrachromosomal interactions promote imprinting at the Kcnq1 locus. J. Cell Biol. 204, 61–75 (2014).
Monteagudo-Sánchez, A., Noordermeer, D. & Greenberg, M. V. The impact of DNA methylation on CTCF-mediated 3D genome organization. Nat. Struct. Mol. Biol. 31, 404–412 (2024).
Wang, H. et al. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680–1688 (2012).
Zuo, Z., Roy, B., Chang, Y. K., Granas, D. & Stormo, G. D. Measuring quantitative effects of methylation on transcription factor–DNA binding affinity. Sci. Adv. 3, eaao1799 (2017).
Murrell, A., Heeson, S. & Reik, W. Interaction between differentially methylated regions partitions the imprinted genes Igf2 and H19 into parent-specific chromatin loops. Nat. Genet. 36, 889–893 (2004).
Braunschweig, M. H., Owczarek-Lipska, M. & Stahlberger-Saitbekova, N. Relationship of porcine IGF2 imprinting status to DNA methylation at the H19 DMD and the IGF2 DMRs 1 and 2. BMC Genet. 12, 1–10 (2011).
Lin, Y. et al. Haplotype-resolved 3D chromatin architecture of the hybrid pig. Genome Res. 34, 310–325 (2024).
Marshall, T., Slate, J., Kruuk, L. & Pemberton, J. Statistical confidence for likelihood-based paternity inference in natural populations. Mol. Ecol. 7, 639–655 (1998).
Alexander, D. H. & Lange, K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinforma. 12, 246 (2011).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet 81, 559–575 (2007).
Liu, W. et al. Efficient delivery of a large-size Cas9-EGFP vector in porcine fetal fibroblasts using a Lonza 4D-Nucleofector system. BMC Biotechnol. 23, 29 (2023).
Kang, S. W., You, S., Wong, E. A. & Halawani, M. E. E. Normalization of transfection efficiency using the β-lactamase gene of the pGL3 luciferase vector in primary anterior pituitary cells. BioTechniques 33, 326–330 (2002).
Xu, Y. Z., Kanagaratham, C., Jancik, S. & Radzioch, D. Promoter deletion analysis using a dual-luciferase reporter system. Gene Regulation: Methods and Protocols, 79-93 (2013).
Hagege, H. et al. Quantitative analysis of chromosome conformation capture assays (3C-qPCR). Nat. Protoc. 2, 1722–1733 (2007).
Liu, Y. et al. Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution. Nat. Biotechnol. 37, 424–429 (2019).
Kaya-Okur, H. S. et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 10, 1930 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997 (2013).
Li, H. et al. The sequence alignment/map format and SAMtools. bioinformatics 25, 2078–2079 (2009).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinforma. 14, 178–192 (2013).
Kokot, M., Długosz, M. & Deorowicz, S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33, 2759–2761 (2017).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. & Salzberg, S. L. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat. Protoc. 11, 1650–1667 (2016).
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Schröder, C. et al. Regions of common inter-individual DNA methylation differences in human monocytes: genetic basis and potential function. Epigenetics Chromatin 10, 1–18 (2017).
Feng, J., Liu, T., Qin, B., Zhang, Y. & Liu, X. S. Identifying ChIP-seq enrichment using MACS. Nat. Protoc. 7, 1728–1740 (2012).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Nord, A. S. et al. Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell 155, 1521–1531 (2013).
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. cell 38, 576–589 (2010).
Wolff, J. et al. Galaxy HiCExplorer: a web server for reproducible Hi-C data analysis, quality control and visualization. Nucleic acids Res. 46, W11–W16 (2018).
Roayaei Ardakany, A., Gezer, H. T., Lonardi, S. & Ay, F. Mustache: multi-scale detection of chromatin loops from Hi-C and Micro-C maps using scale-space representation. Genome Biol. 21, 1–17 (2020).
Cresswell, K. G. & Dozmorov, M. G. TADCompare: an R package for differential and temporal analysis of topologically associated domains. Front. Genet. 11, 496697 (2020).
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Lopez-Delisle, L. et al. pyGenomeTracks: reproducible plots for multivariate genomic datasets. Bioinformatics 37, 422–423 (2021).
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic acids Res. 44, W160 (2016).
Chen, T. et al. The genome sequence archive family: toward explosive data growth and diverse data types. Genomics, Proteom. Bioinforma. 19, 578–583 (2021).
Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2025. Nucleic Acids Res 53, D30-D44 (2025).
Acknowledgements
We are grateful to the members of the authors’ laboratories for their valuable contributions and insightful discussions. This work was supported by The National Key R&D Program of China (2020YFA0509500) and Major science and technology research and development projects of Jiangxi Provincial Department of Science and Technology (20232ACF01002).
Author information
Authors and Affiliations
Contributions
L.H. and M.L. conceived and designed the research idea. Z.Z. and K.L. provided supervision and guidance for this project. C.L. and M.G. collected samples of pigs with the help of Z.H. C.L. and M.G. conducted all the bioinformatics analysis and wrote the manuscript. L.H. and Z.Z. reviewed and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Lingzhao Fang, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Li, C., Ge, M., Long, K. et al. Mechanism of parent-of-origin effects revealed by multi-omic data in euro-chinese hybrid pigs. Nat Commun 16, 7542 (2025). https://doi.org/10.1038/s41467-025-62243-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-025-62243-6