ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs

Yang, Hui; Wu, Jinyuan; Huang, Xiaochang; Zhou, Yunyan; Zhang, Yifeng; Liu, Min; Liu, Qin; Ke, Shanlin; He, Maozhang; Fu, Hao; Fang, Shaoming; Xiong, Xinwei; Jiang, Hui; Chen, Zhe; Wu, Zhongzi; Gong, Huanfa; Tong, Xinkai; Huang, Yizhong; Ma, Junwu; Gao, Jun; Charlier, Carole; Coppieters, Wouter; Shagam, Lev; Zhang, Zhiyan; Ai, Huashui; Yang, Bin; Georges, Michel; Chen, Congying; Huang, Lusheng

doi:10.1038/s41586-022-04769-z

Article
Published: 27 April 2022

ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs

Hui Yang¹^na1,
Jinyuan Wu¹^na1,
Xiaochang Huang¹,
Yunyan Zhou¹,
Yifeng Zhang¹,
Min Liu¹,
Qin Liu¹,
Shanlin Ke¹,
Maozhang He¹,
Hao Fu¹,
Shaoming Fang¹,
Xinwei Xiong¹,
Hui Jiang¹,
Zhe Chen¹,
Zhongzi Wu¹,
Huanfa Gong¹,
Xinkai Tong¹,
Yizhong Huang¹,
Junwu Ma¹,
Jun Gao¹,
Carole Charlier ORCID: orcid.org/0000-0002-9694-094X^1,2,
Wouter Coppieters²,
Lev Shagam ORCID: orcid.org/0000-0003-1325-3085²,
Zhiyan Zhang¹,
Huashui Ai ORCID: orcid.org/0000-0002-2859-8855¹,
Bin Yang¹,
Michel Georges ORCID: orcid.org/0000-0003-4124-2375^1,2^na2,
Congying Chen ORCID: orcid.org/0000-0001-7112-448X¹^na2 &
…
Lusheng Huang ORCID: orcid.org/0000-0002-6940-667X¹^na2

Nature volume 606, pages 358–367 (2022)Cite this article

38k Accesses
154 Citations
87 Altmetric
Metrics details

Subjects

Abstract

The composition of the intestinal microbiome varies considerably between individuals and is correlated with health¹. Understanding the extent to which, and how, host genetics contributes to this variation is essential yet has proved to be difficult, as few associations have been replicated, particularly in humans². Here we study the effect of host genotype on the composition of the intestinal microbiota in a large mosaic pig population. We show that, under conditions of exacerbated genetic diversity and environmental uniformity, microbiota composition and the abundance of specific taxa are heritable. We map a quantitative trait locus affecting the abundance of Erysipelotrichaceae species and show that it is caused by a 2.3 kb deletion in the gene encoding N-acetyl-galactosaminyl-transferase that underpins the ABO blood group in humans. We show that this deletion is a ≥3.5-million-year-old trans-species polymorphism under balancing selection. We demonstrate that it decreases the concentrations of N-acetyl-galactosamine in the gut, and thereby reduces the abundance of Erysipelotrichaceae that can import and catabolize N-acetyl-galactosamine. Our results provide very strong evidence for an effect of the host genotype on the abundance of specific bacteria in the intestine combined with insights into the molecular mechanisms that underpin this association. Our data pave the way towards identifying the same effect in rural human populations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

**Fig. 1: Intestinal microbiota of the healthy pig.**

**Fig. 2: Heritability of microbiota composition in mosaic pigs.**

**Fig. 3: A miQTL affecting Erysipelotrichaceae species.**

**Fig. 4: A 3.5-million-year-old deletion in the pig *ABO* orthologue causes the miQTL.**

**Fig. 5: The miQTL acts by increasing GalNAc concentrations and affects GalNAc-using bacteria.**

**Fig. 6: The GalNAc operon organization and transcriptome response of miQTL-responsive bacteria.**

Host genetic regulation of human gut microbial structural variation

Article Open access 03 January 2024

Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project

Article 03 February 2022

Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort

Article 03 February 2022

Data availability

All the 16S rRNA sequencing data, the metagenomics sequence data and the RNA-seq data were submitted to the GSA database under accession numbers CRA006230, CRA006239, CRA006240 and CRA006216. The genotype data were deposited at the GVM (http://bigd.big.ac.cn/gvm/getProjectDetail?project=GVM000310) under the GSA database under accession number GVM000310. The GWAS summary statistics are available at Figshare (https://doi.org/10.6084/m9.figshare.19313960). The whole-genome sequencing data of experimental pigs have been deposited in the GSA database (https://ngdc.cncb.ac.cn/gsa/browse/CRA006383) under accession number CRA006383. The source data are available at GitHub (https://github.com/yanghuijxau/Manuscript-microbiota-ABO).

Code availability

Codes to replicate the findings and the source data are available at GitHub (https://github.com/yanghuijxau/Manuscript-microbiota-ABO).

References

Kundu, P., Blacher, E., Elinav, E. & Pettersson, S. Our gut microbiome: the evolving inner self. Cell 171, 1481–1493 (2017).
Article CAS PubMed Google Scholar
Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).
Article ADS CAS PubMed Google Scholar
O’Hara, E., Neves, A. L. A., Song, Y. & Guan, L. L. The role of the gut microbiome in cattle production and health: driver or passenger? Annu. Rev. Anim. Biosci. 8, 199–220 (2020).
Article PubMed Google Scholar
Schmidt, T. S. B., Raes, J. & Bork, P. The human gut microbiome: from association to modulation. Cell 172, 1198–1215 (2018).
Article CAS PubMed Google Scholar
Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on 50 years of twin studies. Nat. Genet. 47, 702–709 (2015).
Article CAS PubMed Google Scholar
Polubriaginof, F. C. G. et al. Disease heritability inferred from familial relationships reported in medical records. Cell 173, 1692–1704 (2018).
Article CAS PubMed PubMed Central Google Scholar
Benson, A. K. et al. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc. Natl Acad. Sci. USA 107, 18933–18938 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).
Article CAS PubMed PubMed Central Google Scholar
Blekhman, R. et al. Host genetic variation impacts microbiome compoistion across human body sites. Genome Biol. 16, 191 (2015).
Article PubMed PubMed Central CAS Google Scholar
Turpin, W. et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat. Genet. 48, 1413–1417 (2016).
Article CAS PubMed Google Scholar
Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).
Article CAS PubMed Google Scholar
Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat. Genet. 48, 1396–1406 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hughes, D. A. et al. Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. Nat. Microbiol. 5, 1079–1087 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Patterson, N. et al. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006).
Article ADS CAS PubMed Google Scholar
Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. Nat. Rev. Microbiol. 14, 20–32 (2016).
Article CAS PubMed Google Scholar
Radjabzadeh, D. et al. Diversity, compositional and functional differences between gut microbiota of children and adults. Sci. Rep. 10, 1040 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cooling, L. Blood groups in infection and host susceptibility. Clin. Microbiol. Rev. 28, 801–870 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rühlemann, M. C. et al. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nat. Genet. 53, 147–155 (2021).
Article PubMed CAS Google Scholar
Lopera-Maya, E. E. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch microbiome project. Nat. Genet. 54, 143–151 (2022).
Article CAS PubMed Google Scholar
Qin, Y. et al. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat. Genet. 54, 134–142 (2022).
Article CAS PubMed Google Scholar
Choi, M. K. et al. Determination of complete sequence information of the human ABO blood group orthologous gene in pigs and breed differences in blood type frequencies. Gene 640, 1–5 (2018).
Article CAS PubMed Google Scholar
Wang, S. et al. Design of glycosyl transferase inhibitors: serine analogues as pyrophosphate surrogates? ChemPlusChem 80, 1525–1532 (2015).
Article CAS PubMed Google Scholar
Ségurel, L. et al. The ABO blood group is a trans-species poilymorphism in primates. Proc. Natl Acad. Sci. USA 109, 18493–18498 (2012).
Article ADS PubMed PubMed Central Google Scholar
Groenen, M. A. M. A decade of pig genome sequencing: windo on pig domestication and evolution. Genet. Sel. Evol. 48, 23–32 (2016).
Article PubMed PubMed Central CAS Google Scholar
Ravcheev, D. A. & Thiele, I. Comparative genomic analysis of the human gut microbiome reveals a broad distribution of metabolic pathways for the degradation of host-synthesized mucin glycans and utilization of mucin-derived monosaccharides. Front. Genet. 8, 111 (2017).
Article PubMed PubMed Central CAS Google Scholar
Tailford, L. A. et al. Mucin glycan foraging in the human gut microbiome. Front. Genet. 6, 81 (2015).
Article PubMed PubMed Central CAS Google Scholar
Lien, K. A., Sauer, W. C. & He, J. M. Dietary influences on the secretion into and degradation of mucin in the digestive tract of monogastric animals and humans. J. Anim. Feed Sci. 10, 223–245 (2001).
Article Google Scholar
Brinkkötter, A. B., Klöss, H., Alpert, C.-A. & Lengeler, J. W. Pathways for the utilization of N-acetyl-galactosamine and galactosamine in Escherichia coli. Mol. Microbiol. 37, 125–135 (2000).
Article PubMed Google Scholar
Rodionov, D. A. et al. Genomic encyclopedia of sugar utilization pathways in the Shewanella genus. BMC Genom. 11, 494 (2010).
Article CAS Google Scholar
Leyn, S. A., Gao, F., Yang, C. & Rodionov, D. A. N-acetylgalactosamine utilization pathway and regulon in proteobacteria. J. Biol. Chem. 287, 28047–28056 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hu, Z., Patel, I. R. & Mukherjee, A. Genetic analysis of the roles of agaA, agaI, and agaS genes in the N-acetyl-d-galactosamine and d-galactosamine catabolic pathways in Escherichia coli strains O157:H7 and C. BMC Microbiol. 13, 94 (2013).
Article CAS PubMed PubMed Central Google Scholar
Bidart, G. N., Rodriguez-Diaz, J., Monedoro, V. & Yebra, M. J. A unique gene cluster for the utilization of the mucosal and human milk-associated glycans galacto-N-biose and lacto-N-biose in Lactobacillus casei. Mol. Microbiol. 93, 521–538 (2014).
Article CAS PubMed Google Scholar
Zhang, H. et al. Two novel regulators of N-acetyl-galactosamine utilization pathway and distinct roles in bacterial infections. Microbiol. Open 4, 983–1000 (2015).
Article CAS Google Scholar
Lawrence, J. Selfish operons: the evolutionary impact of gene clustering in prokrayotes and eukaryotes. Curr. Opin. Genet. Dev. 9, 642–648 (1999).
Article CAS PubMed Google Scholar
Koonin, E. V. Evolution of genome architecture. Int. J. Biochem. Cell Biol. 41, 298–306 (2009).
Article CAS PubMed Google Scholar
Lombard, V. et al. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).
Article CAS PubMed Google Scholar
Rahfeld, P. et al. An enzymatic pathway in the human gut microbiome that converts A to universal O type blood. Nat. Microbiol. 4, 1475–1585 (2019).
Article CAS PubMed Google Scholar
Rahfled, P. et al. Prospecting for microbial α-N-acetylgalactosaminidases yields a new class of GH31 O-glycanase. J. Biol. Chem. 294, 16400–16415.
Chen, Y. et al. ABO blood group and susceptibility to severe acute respiratory syndrome. JAMA 293, 1450–1451 (2005).
Google Scholar
Ellinghaus, D. et al. The ABO blood group locus and a chromosome 3 gene cluster associate with SARS-CoV-2 respiratory failure in an Italian-Spanish genome-wide association analysis. Preprint at medRxiv https://doi.org/10.1101/2020.05.31.20114991 (2020).
Blancher, A. Evolution of the ABO supergene family. ISBT Sci. Ser. 8, 201–206 (2013).
Article CAS Google Scholar
Makivuokko, H. et al. Association between the ABO blood group and the human intestinal microbiota composition. BMC Microbiol. 12, 94 (2012).
Article PubMed PubMed Central Google Scholar
Davenport, E. R. et al. ABO antigen and secretor statuses are not associated with gut microbiota composition in 1,500 twins. BMC Genom. 17, 941–955 (2016).
Article Google Scholar
Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165 (2021).
Article CAS PubMed PubMed Central Google Scholar
Malmuthuge, N., Griebel, P. J. & Guan, L. L. Taxonomic identification of commensal bacteria associated with the mucosa and digesta throughout the gastrointestinal tracts of preweaned calves. Appl. Environ. Microbiol. 80, 2021–2028 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Hanson, M. E. B. et al. Population structure of human gut bacteria in a diverse cohort from rural Tanzania and Botswana. Genome Biol. 20, 16 (2019).
Article Google Scholar
Warr, A. et al. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience 9 (2019).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central CAS Google Scholar
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Article CAS PubMed PubMed Central Google Scholar
Coppieters, W., Karim, L. & Georges, M. SNP-based quantitative deconvolution of biological mixtures: application to the detection of cows with subclinical mastitis by whole genome sequencing of tank milk. Genome Res. 30, 1201–1207 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Magoč, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Article PubMed PubMed Central CAS Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).
Article CAS PubMed Google Scholar
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahe, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Article PubMed PubMed Central Google Scholar
Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheema, M. U. & Pluznick, J. L. Gut microbiota plays a central role to modulate the plasma and fecal metabolomes in response to angiotensin II. Hypertension 74, 184–193 (2019).
Article CAS PubMed Google Scholar
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).
Article Google Scholar
Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).
Article PubMed PubMed Central CAS Google Scholar
Ziyatdinov, A. et al. lme4QTL: linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinform. 19, 68 (2018).
Article Google Scholar
Haseman, J. K. & Elston, R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3–19 (1972).
Article CAS PubMed Google Scholar
Aulchenko, Y. S., Ripke, S., Isaacs, A. & van Duijn, C. M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).
Article CAS PubMed Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://arxiv.org/abs/1303.3997 (2013).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Univ. (2007).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Ai, H. et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat. Genet. 47, 217–225 (2015).
Article CAS PubMed Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the genome analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 11.10.11–11.10.33 (2013).
Article Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).
CAS PubMed Google Scholar
Nei, M. F-statistics and analysis of gene diversity in subdivided populations. Ann. Hum. Genet. 41, 225–233 (1977).
Article CAS PubMed MATH Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Hunt, M. et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).
Article PubMed PubMed Central CAS Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).
Article CAS Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Jia, G. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central CAS Google Scholar
Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).
Article CAS PubMed Google Scholar
Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).
Article PubMed PubMed Central Google Scholar
Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).
Article CAS PubMed PubMed Central Google Scholar
Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Article CAS PubMed Google Scholar
Segata, N., Bornigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).
Article ADS PubMed CAS Google Scholar
Li, M. et al. Aldolase B suppresses hepatocellular carcinogenesis by inhibiting G6PD and pentose phosphate pathways. Nat. Cancer 1, 737–747 (2020).
Article Google Scholar
Nanchen, A., Fuhrer, T. & Sauer, U. Determination of metabolic flux rartios from ¹³C-experiments and gas chromatography-mass spectrometry data: protocol and principles. Methods Mol. Biol. 358, 177–197 (2007).
Article CAS PubMed Google Scholar
van Winden, W. A. et al. Correcting mass isoptopomer distributions for naturally occurring isotopes. Biotechnol. Bioeng. 80, 477–479 (2002).
Article PubMed CAS Google Scholar
Staley et al. Stable engraftment of human microbiota into mice with a single oral gavage following antibiotic conditioning. Microbiome 5, 87 (2017).
Article PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central CAS Google Scholar
Momozawa, Y. et al. IBD risk loci are enriched in multigenic regulatory modules encompassing putative causative genes. Nat. Commun. 9, 2427 (2018).
Article ADS PubMed PubMed Central CAS Google Scholar
Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner (version 38.82) https://sourceforge.net/projects/bbmap/ (2014).
Köster, J. & Rahmann, S. Snakemake: a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).
Article PubMed CAS Google Scholar
Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A. & Caporaso, J. G. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).
Article CAS PubMed PubMed Central Google Scholar
Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ghodsi, M., Liu, B. & Pop, M. DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinform. 12, 271 (2011).
Article Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Srivastava, A. et al. Genomes of the mouse collaborative cross. Genetics 206, 537–556 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yu, N. et al. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18, 214–222 (2001).
Article CAS PubMed Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Article PubMed Central CAS Google Scholar
Frantz, L. A. F. et al. Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat. Genet. 47, 1141–1148 (2015).
Article CAS PubMed Google Scholar
Charlier, C. et al. NGS-based reverse genetic screen for common embryonic lethal mutations compromising fertility in livestock. Genome Res. 26, 1333–1341 (2016).
Article CAS PubMed PubMed Central Google Scholar
Georges, M., Charlier, C. & Hayes, B. Harnessing genomic information for livestock improvement. Nat. Rev. Genet. 20, 135–156 (2019).
Article CAS PubMed Google Scholar
Geraldes, A. et al. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol. Ecol. 17, 5349–5363 (2008).
Article PubMed PubMed Central Google Scholar
Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Suzuki, T. A. & Nachman, M. W. Spatial heterogeneity of gut microbioal composition along the gastrointestinal tract in natural populatiions of house mice. PLoS ONE 11, e0163720 (2016).
Article PubMed PubMed Central CAS Google Scholar
Vuik, F. E. R. et al. Composition of the mucosa-associated microbiota along the entire gastrointestinal tract of human individuals. UEG J. 7, 897–907 (2019).
Article CAS Google Scholar
Rowe, J. A. et al. Blood group O protects against severe Plasmodium falciparum malaria through the mechanism of reduced rosetting. Proc. Natl Acad. Sci. USA 104, 17471–17476 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Robinson, M. G., Tolchin, D. & Halpern, C. Enteric bacterial agents and the ABO blood groups. Am. J. Hum. Genet. 23, 135–145 (1971).
CAS PubMed PubMed Central Google Scholar
Camus, D., Bina, J. C., Carlier, Y. & Santoro, F. ABO blood groups and clinical forms of schistosomiasis mansoni. Trans. R. Soc. Trop. Med. Hyg. 71, 182 (1977).
Article CAS PubMed Google Scholar
Pereira, F. E. L., Bortolini, E. R., Carneiro, J. L. A., da Silva, C. R. M. & Neves, R. C. A, B, O blood groups and hepatosplenic form of schistosomiasis mansoni (Symmer’s fibrosis). Trans. R. Soc. Trop. Med. Hyg. 73, 238 (1977).
Article Google Scholar
Ndamba, J., Gomo, E., Nyazema, N., Makaza, N. & Kaondera, K. C. Schistosomiasis infection in relation to the ABO blood groups among school children in Zimbabwe. Acta Trop. 65, 181–190 (1997).
Article CAS PubMed Google Scholar
Chaudhuri, A. & De, S. Cholera and blood groups. Lancet 2, 404 (1977).
Article CAS PubMed Google Scholar
Boren, T. et al. Attachment of Helicobacter pylori to human gastric epithelium mediated by blood group antigens. Science 262, 1892–1895 (1993).
Article ADS CAS PubMed Google Scholar
Lindesmith, L. et al. Human susceptibility and resistance to Norwalk virus infection. Nat. Med. 9, 548–553 (2003).
Article CAS PubMed Google Scholar
Galili, U. in α-Gal and Anti-Gal (eds Galili, U. & Avila, J. L.) Vol. 32, 1–23 (Springer, 1999).
Prather, R. S., Shen, M. & Dai, Y. Genetically modified pigs for medicine and agriculture. Biotechnol. Genetic Eng. Rev. 25, 245–266 (2008).
CAS Google Scholar

Download references

Acknowledgements

We thank Y. He, S. Xiao, W. Li, Y. Guo and Y. Xing for assistance in the construction of the experimental mosaic pig populations; Y. Su and J. Li for preparation of reagents and management of samples; Y. Momozawa, R. Mariman, M. Mni, L. Karim and M. Dekkers for generating the CEDAR-1 16S rRNA data; the staff at the Jiangxi Department for Education, the Ministry of Science and Technology of P. R. China, the Ministry of Agriculture and Rural Affairs of P. R. China, and Jiangxi department of Science and Technology for their long-term support of the swine heterogeneous stock project; and the members of the MIQUANT consortium for comments and discussions. L.H. is supported by The National Natural Science Foundation of China (31790410) and National pig industry technology system (CARS-35); C. Chen by the National Natural Science Foundation of China (31772579); H.Y. by the National Postdoctoral Program for Innovative Talent (no. BX201700102); L.S. by the FNRS IBD-GI-Seq project; M.G. by the Chinese Thousand Talents Program, the Belgian EOS ‘Miquant’ project and the FNRS (CDR ‘GEM’ project). C. Charlier is a senior research associate at the FNRS.

Author information

These authors contributed equally: Hui Yang, Jinyuan Wu
These authors jointly supervised this work: Michel Georges, Congying Chen, Lusheng Huang

Authors and Affiliations

National Key Laboratory for Swine Genetic Improvement and Production Technology, Ministry of Science and Technology of China, Jiangxi Agricultural University, Nanchang, PR China
Hui Yang, Jinyuan Wu, Xiaochang Huang, Yunyan Zhou, Yifeng Zhang, Min Liu, Qin Liu, Shanlin Ke, Maozhang He, Hao Fu, Shaoming Fang, Xinwei Xiong, Hui Jiang, Zhe Chen, Zhongzi Wu, Huanfa Gong, Xinkai Tong, Yizhong Huang, Junwu Ma, Jun Gao, Carole Charlier, Zhiyan Zhang, Huashui Ai, Bin Yang, Michel Georges, Congying Chen & Lusheng Huang
Unit of Animal Genomics, GIGA-Institute and Faculty of Veterinary Medicine, University of Liege, Liege, Belgium
Carole Charlier, Wouter Coppieters, Lev Shagam & Michel Georges

Authors

Hui Yang
View author publications
Search author on:PubMed Google Scholar
Jinyuan Wu
View author publications
Search author on:PubMed Google Scholar
Xiaochang Huang
View author publications
Search author on:PubMed Google Scholar
Yunyan Zhou
View author publications
Search author on:PubMed Google Scholar
Yifeng Zhang
View author publications
Search author on:PubMed Google Scholar
Min Liu
View author publications
Search author on:PubMed Google Scholar
Qin Liu
View author publications
Search author on:PubMed Google Scholar
Shanlin Ke
View author publications
Search author on:PubMed Google Scholar
Maozhang He
View author publications
Search author on:PubMed Google Scholar
Hao Fu
View author publications
Search author on:PubMed Google Scholar
Shaoming Fang
View author publications
Search author on:PubMed Google Scholar
Xinwei Xiong
View author publications
Search author on:PubMed Google Scholar
Hui Jiang
View author publications
Search author on:PubMed Google Scholar
Zhe Chen
View author publications
Search author on:PubMed Google Scholar
Zhongzi Wu
View author publications
Search author on:PubMed Google Scholar
Huanfa Gong
View author publications
Search author on:PubMed Google Scholar
Xinkai Tong
View author publications
Search author on:PubMed Google Scholar
Yizhong Huang
View author publications
Search author on:PubMed Google Scholar
Junwu Ma
View author publications
Search author on:PubMed Google Scholar
Jun Gao
View author publications
Search author on:PubMed Google Scholar
Carole Charlier
View author publications
Search author on:PubMed Google Scholar
Wouter Coppieters
View author publications
Search author on:PubMed Google Scholar
Lev Shagam
View author publications
Search author on:PubMed Google Scholar
Zhiyan Zhang
View author publications
Search author on:PubMed Google Scholar
Huashui Ai
View author publications
Search author on:PubMed Google Scholar
Bin Yang
View author publications
Search author on:PubMed Google Scholar
Michel Georges
View author publications
Search author on:PubMed Google Scholar
Congying Chen
View author publications
Search author on:PubMed Google Scholar
Lusheng Huang
View author publications
Search author on:PubMed Google Scholar

Contributions

H.Y. analysed the 16S rRNA sequence data, performed GWAS, meta-analyses and local association analyses, computed heritabilities of individual taxa, contributed to ABO genotyping and analysed the effect of the 2.3 kb deletion on taxa abundance. J.W. analysed the composition of the microbiome, including PCoA analyses, β- and α-diversity, correlations between kinship and microbiome dissimilarities, isolated the OTU476-like strains, performed the GalNAc feeding experiments, measured the concentrations of GalNAc in the caecal lumen, analysed the GalNAc import and use pathway in the MAGs, and contributed to ABO genotyping. X.H. participated in 16S rRNA sequencing (F₆) and GWAS (F₆). Y. Zhou performed metagenome sequencing analysis, analysed the GalNAc import and use pathway in MAGs, analysed the RNA-seq data from caecum samples and contributed to ABO genotyping. Y. Zhang participated in the preparation of the genotype data from whole-genome sequence information, participated in the computation of the genomic contribution of the different breeds in the F₆ and F₇ generation and the definition of expected mapping resolution, performed LD analyses, performed eQTL analysis for the ABO gene, participated in the characterization and sequence analysis of the ABO gene, including definition of the 2.3 kb deletion, and in the balancing selection and trans-species polymorphism analyses. M.L. assisted with the isolation of the OTU476-like strains, the GalNAc feeding experiments and genotyping of the ABO gene. Q.L. assisted with measuring the concentrations of GalNAc in caecal lumen. S.K., M.H., H.F., S.F., X.X., H.J., Z.C. and J.G. assisted with the experiments. Z.Z., X.T., Z.W., H.G. and Y.H. assisted with the preparation of genotype data from whole-genome sequencing data and conducted the analysis of the Nanopore data of the ABO region. J.M. assisted with the construction of the mosaic population. H.A. assisted with the bioinformatic analysis of the ABO region, de novo assembly of the A allele, and evolutionary analysis of the ABO alleles. L.S. analysed the effect of ABO genotype on intestinal microbiota composition in humans. W.C. assisted in the analysis of the sequencing data for the trans-species polymorphisms. C. Charlier supervised the characterization of the ABO gene and the 2.3 kb deletion and the corresponding haplotype structure in the F₀, F₆ and F₇ population and for the trans-species polymorphism. B.Y. prepared the genotype data of whole-genome variants, assisted with raising the heterogeneous stock, and participated in the computation of the genomic contribution of the different breeds in the F₆ and F₇ generation and the definition of expected mapping resolution. M.G. supervised the bioinformatic and statistical analyses, performed bioinformatic and statistical analyses, and wrote the paper. C. Chen codesigned the study, supervised experiments, supervised bioinformatic and statistical analyses of gut microbiome, and wrote the paper. L.H. created the swine heterogeneous stock, designed the study, directed the project, supervised the experiments and analyses, and wrote the paper.

Corresponding authors

Correspondence to Michel Georges, Congying Chen or Lusheng Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Catherine Lozupone, Vincent Plagnol and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Generating a large mosaic pig population for genetic analysis of complex phenotypes.

(a) Rotational breeding design used for the generation of a large mosaic pig population for the genetic analysis of complex phenotypes, with sampling scheme for faeces (D25, D120, D240), luminal content of the ileum (IC) and caecum (CC), and mucosal scrapings in the ileum (IM) and caecum (CM). BX: Bamaxiang, EH: Erhualian, LA: Laiwu, TB: Tibetan, LW: Large White, LD: Landrace, PT: Piétrain, WD: White Duroc. (b) Average similarity (1 – π) between allelic sequences sampled within and between the eight founder breeds. The colour intensity ranges from black (breeds with lowest allelic similarity: BX vs WD, 1 - 4.3x10⁻³) to bright red (breed with highest allelic similarity: WD, 1 - 1.8x10⁻³). The acronyms for the breeds are as in (a). More than 30 million variants with MAF ≥ 3% segregate in this population, i.e. more than one variant every 100 base pairs. This is slightly lower than the 40 million high quality variants segregating in the mouse collaborative cross¹⁰⁸. (c) Comparison of the average nucleotide diversity (π, i.e. the proportion of sites that differ between two chromosomes sampled at random in the population(s)) within and between European (Eur) and Asian (As) domestic pigs, and between modern European (HS_Eur), Asian humans (HS_As), Neanderthal (Neand) and Chimpanzee (Pan Trogl). The average nucleotide diversity within the four Chinese founder breeds was ~2.5x10⁻³ and within the four European founder breeds ~2.0x10⁻³. By comparison, π-values within African and within Asian/European human populations are ~9x10⁻⁴ and ~8x10⁻⁴, respectively^109,110. Thus, against intuition (as domestication is often assumed to have severely reduced effective population size) the within population diversity is >2-fold higher in domestic pigs than in human populations, as previously reported^111,112,113. Nucleotide diversities between Chinese founder breeds and between European founder breeds were ~3.6x10⁻³ and ~2.5x10⁻³, respectively, i.e. 1.44-fold and 1.25-fold higher than the respective within-breed π-values. These π-values are of the same order of magnitude as the sequence divergence between Homo sapiens and Neanderthals/Denosivans (~3x10⁻³, ref. ¹⁵). By comparison, π-values between Africans, Asians and Europeans are typically ≤ ~1x10⁻³ (ref. ¹⁰⁹). The nucleotide diversity between Chinese and European breeds averaged ~4.3x10⁻³. This π-value is similar to the divergence between M. domesticus and M. castaneus¹¹⁴, and close to halve the ~1% difference between chimpanzee and human¹⁶. Note that Chinese and European pig breeds are derived from Chinese and European wild boars, respectively, which are thought to have diverged ~1 million years ago²⁷, while M. domesticus and M. castaneus are thought to have diverged ≤ 500,000 years ago¹¹⁴. (d) Autosome-specific estimates of the genomic contributions of the eight founder breeds in the F6 and F7 generation. We used a linear model incorporating all variants to estimate the average contribution of the eight founder breeds in the F6 and F7 generation at genome and chromosome level⁵⁶. At genome-wide level, the proportion of the eight founder breed genomes ranged from 11.2% (respectively 11.5%) to 14.1% (14.7%) in the F6 (F7) generations. At chromosome-specific level, the proportion of the eight founder breeds ranged from 6.7% (respectively 4.9%) to 20.7% (22.1%) in the F6 (F7) generations. The genomic contribution of the eight founder breeds in the F6 and F7 generation is remarkably uniform and close to expectations (i.e. 12.5%) both at genome-wide and chromosome-wide level, suggesting comparable levels of genetic diversity across the entire genome. This does not preclude that more granular examination may reveal local departures from expectations, or under-representation of incompatible allelic combinations at non-syntenic loci. (e-f) Indicators of achievable mapping resolution in the F6 generation: (e) Frequency distribution (density) of the number of variants in high LD (r² ≥ 0.9) with an “index” variant (was computed separately for all variants considered sequentially as the “index”), corresponding to the expected size of “credible sets” in GWAS¹¹⁵. The red vertical line corresponds to the genome-wide median. The green vertical line corresponds to the mapping resolution achieved in this study for the ABO locus (see hereafter). (f) Frequency distribution (density) of the maximum distance between an index variant and a variant in high LD (r² ≥ 0.9) with it, defining the spread of credible sets. Red and green vertical lines are as in (D).

Extended Data Fig. 2 Characterizing the age- and location-specific composition of the intestinal microbiome of the healthy pig.

(a) Definition of a core intestinal microbiome of the pig. A total of 58 OTUs that were annotated to 21 taxa were identified in >95% of day 120 and 240 faeces and caecum content samples of both F6 and F7 generations, hence defined as core bacterial taxa. (b) The compositions of the porcine and human intestinal microbiota are closer to each other than either is to that of the mouse. Boxplots are as is Fig. 1c. The number of samples available for analysis were 1281 pigs, 106 humans and 6 mice. (c) Abundances (F6-F7 averages when available) of the 43 families represented in Fig. 1b in the seven sample types relative to the sample type in which they are the most abundant (red – blue scale). The families are ordered according to the sample type in which they are the most abundant. The colour-code for phyla is as in Fig. 1b. Columns are added for comparison with mouse and human. Mouse data are from Fig. 1 in Suzuki & Nachman¹¹⁶, and human data from Fig. 6 in Vuik et al¹¹⁷. P_I: proximal ileum, D_IL: distal ileum, C: caecum, CO: colon, RE: rectum, F: faeces. The families differing the most with regards to location-specific distribution between species include Helicobacteriaceae, Veillonellaceae, Lactobacillaceae and Streptocaccaceae.

Extended Data Fig. 3 Evaluating the heritability of intestinal microbiota composition in the mosaic pig population.

Correlation between heritability estimates of taxa/OTUs in F6 and F7 generation by sample type (D25, D120, D240, CC and IC). Correlation coefficients (r) and associated p-values (p) were computed using heritability estimates that were pre-corrected for bacterial abundance (residuals of linear model). Heritability estimates indeed tend to slightly increase with taxa abundance. Yet, results show that this effect cannot account for the observed correlations between F6 and F7 estimates in D120, D240 and CC, hence pointing towards genuine genetic effects. The shaded areas correspond to the 95% confidence region for the regression fit. Correlation coefficients and two-sided p-values were computed using Spearman’s rank-based method. Reported p-values are nominal (i.e. uncorrected for multiple testing).

Extended Data Fig. 4 Identifying a microbiota QTL (miQTL) with major effect on the abundance of Erysipelotrichaceae species by whole genome sequence based GWAS.

(a) Schematic illustration of the samples and SNPs used for the two types of analyses (abundance and presence/absence) performed for miQTL mapping. (b) (Upper) Distribution of log(1/p) values for 1,527 sets of 11 p-values obtained in 11 data-series for a SNP x taxon x analysis model combination that yielded a genome-wide significant signal (p < 5 x 10⁻⁸) in the 12^th data-series. (Lower) Distribution of log(1/p) values for 1,527 sets of 11 p-values obtained in the same data-series and with the same analysis model as in (upper) but with randomly selected SNP x taxon combinations matching the ones in (upper) for MAF and taxa abundance. Log(1/p) values were computed using GenABEL as described in Methods. Corresponding p-values are nominal and two-sided. (c) Correlation between the average (F6 and F7) taxon heritability, and the average (F6 and F7) number of genome-wide significant (p ≤5 × 10⁻⁸) miQTL for D240 faecal samples. The shaded area corresponds to the 95% confidence region for the regression fit. Correlation coefficient and associated p-values are Spearman’s. (d) QQ plot for 1,527 (number of signals (SNP x taxon x model x one data series in one cohort) exceeding the genome-wide log(1/p) threshold value of 7.3) sets of ≤ 5-7 p-values (same SNP x taxon x model, all data series in the other cohort) for real SNPs (Blue: quantitative model; Green: binary model), and matched sets of ≤ 5-7 p-values corresponding to randomly selected SNP x taxon combinations matched for MAF and abundance or presence/absence rate (Brown: quantitative model; Yellow: binary model). Log (1/p) values were computed using GenABLE as described in Methods. Corresponding p-values are nominal and two-sided. (e) Same QQ plot as in (C) after removal of all SNPs in the chromosome 1: 272.8-273.1Mb interval. Log(1/p) values were computed using GenABLE as described in Methods. Corresponding p-values are nominal and two-sided. (f) Distribution of the association log(1/p) values and corresponding signed z-scores for SNP 1_272907239 and 31 p-75-a5 OTUs (red) and 83 Erysipelotrichaceae (yellow) OTUs, showing an enrichment of effects with same sign as for OTU476 and OTU327. Log(1/p) values were computed using Metal (v3.0) as described in Methods. Corresponding p-values are nominal and two-sided. See also Supplemental discussion 1.

Extended Data Fig. 5 The chromosome 1 miQTL is caused by a 2.3 kb deletion in the orthologue of the human ABO gene.

(a) Breakpoints of the 2.3 kb deletion showing the role of a duplicated SINE sequence in mediating an intra-chromosomal recombination. (b) Illustrative example of allelic balance for the cG146C SNP in an AA homozygote and of allelic imbalance for the same SNP in an AO heterozygote. (c) (Upper) eQTL analysis for the porcine AO gene maximizing at the exact position of the 2.3 kb deletion (p = 1.9x10⁻⁴³) and showing the additive effect of the A allele increasing transcript levels ~3-fold (inset; FPKM: Fragments Per Kilobase of transcript per Million mapped reads). The “n’s” correspond to the number of animals of each genotype available for analysis. Boxplots are as in Fig. 1c. (Lower) Genome wide eQTL scan for the porcine ABO gene showing the strong cis-eQTL signal on chromosome 1. eQTL analysis was conducted with GEMMA (v0.97)⁶⁴. Reported log-transformed p-values are nominal and two-sided. (d) Effect of N-acetyl-galactosaminyl transferase genotype (AA, AO or OO) on abundance of OTU327 and p-75-a5 in the twelve data series. Absence of an effect of N-acetyl-galactosaminyl transferase genotype (AA, AO or OO) on abundance of E. coli in the twelve data series. Sample sizes are as in STable 4.1. Boxplots are as in Fig. 3d. (e) Abundance of OTU476, OTU327 and p-75-a5 in the twelve data series. Violin plots with indication of the median. Numbers (n’s) are as in STable 4.1. See also Supplemental discussion 2.

Extended Data Fig. 6 cis-eQTL analyses in the vicinity of the chromosome 1 miQTLK supports the causality of the 2.3 kb deletion.

(a) Cis-eQTL analysis for the porcine N-acetyl-galactosaminyl transferase (“ABO”), GBTG1, LCN1 (=OBP2B), MED22 and SURF6 genes in caecum. The blue triangle corresponds to the top SNP for the miQTL. The red triangles correspond to the top SNPs for the respective cis-eQTL. Only for N-acetyl-galactosaminyl transferase are blue and red variants the same. eQTL analyses were conducted with GEMMA (v0.97)⁶⁴. Reported log-transformed p-values are nominal and two-sided. (b) Effect of AO genotype on the expression levels of the corresponding genes in caecum. There was no evidence for an effect of AO genotype on the expression of any of these genes other than ABO. The number of AA, AO and OO samples available for cis-eQTL analysis for each gene are given (n). Boxplots are as in Fig. 1c. We tested the difference in gene expression level between pairs of genotype classes using a two-sided t-test. (c) Effect of the top cis-eQTL SNPs (blue triangles in A) on OTU476 abundance. Only the top cis-eQTL SNPs for ABO has an effect on OTU476 abundance. The number of AA, AO and OO samples available for miQTL analysis for each gene are given (n). Boxplots are as in Fig. 1c. We tested the difference in bacterial abundance between pairs of genotype classes using a two-sided t-test.

Extended Data Fig. 7 The 2.3 kb deletion in the orthologue of the human ABO gene is 3.5 million years old and under balancing selection.

(a) UPGMA tree based on nucleotide diversities between 14 AA and 34 OO animals in windows of increasing size (0.5 to 40 kb) centred on the 2.3 kb deletion in the porcine N-acetyl-galactosaminyl transferase gene (porcine O allele). PA: Phacochaerus Africanus, SC: Sus cebifrons, SV: Sus verrucosus, SU: Sus scrofa vittatus, CB: Chinese wild boar, RB: Russian wild boar, EB: European wild boar, ERH: Erhualian, BX: Bamaxiang, T: Tibetan, LA: Laiwu, LR: Landrace, LW: Large White, PI: Piétrain, WD: White Duroc. Context: To gain additional insights in the age of the porcine O allele, we generated phylogenetic trees of the A and O alleles of 14 AA and 34 OO animals including domestic pigs, wild boars, Visayan and Javanese warty pigs, and common African warthog. Examination of their local SNP genotypes (50K window encompassing the ABO gene) reveals traces of ancestral recombinations between O and A haplotypes as close as 300 and 800 base pairs from the proximal and distal deletion breakpoints, respectively, as well as multiple instances of homoplasy that may either be due to recombination, gene conversion or recurrent de novo mutations. On their own, these signatures support the old age of the O allele. We constructed UPGMA trees based on nucleotide diversity for windows ranging from 500 bp to 40 kb centred on the 2.3 kb deletion. Smaller windows have a higher likelihood to compare the genuine ancestral O versus A states, yet yield less robust trees because they are based on smaller number of variants. Larger windows will increasingly be contaminated with recombinant A-O haplotypes blurring the sought signal. Indeed, for windows ≥ 20 kb or more, the gene tree corresponds to the species tree, while for windows ≤ 15 kb the tree sorts animals by AA vs OO genotype. For all windows ≤ 15 kb the Sus cebifrons O allele maps outside of the Sus scrofa O allele supporting a deep divergence (rather than hybridization) and hence the old age of the O allele. Of note, for windows ≤1.2 kb, the warthog A allele is more closely related to the Sus A alleles than to the Sus O alleles (ED7A). This suggests that the O allele may be older than the divergence of the Phacochoerus and Sus A alleles, i.e. > 10 MYA. It will be interesting to study larger numbers of warthog to see whether the same 2.3 kb deletion exists in this and other related species as well. (b) Alignment of ~900 base pairs of the O alleles of domestic pigs (Bamaxian), European and Asian wild boars, and Sus cebufrons demonstrating that these are identical-by-descent. The SINE element that is presumed to have mediated the recombinational event that caused to 2.3 kb deletion is highlighted in red. Context: To further support their identity-by-descent we aligned ~900 base pairs (centred on the position of the 2.3 kb deletion) of the O alleles of domestic pig, European and Asian wild boars and Sus cebifrons. The sequences were nearly identical further supporting our hypothesis. It is noteworthy that the old age of the “O” allele must have contributed to the remarkable mapping resolution (≤3 kb) that was achieved in this study. In total, 42 variants were in near perfect LD (r² ≥ 0.9) with the 2.3 kb deletion in the F0 generation, spanning 2,298 bp (1,522 on the proximal side, and 762 on the distal side of the 2.3 kb deletion). This 2.3 kb span is lower than genome-wide expectations (17th percentile), presumably due to the numerous cross-overs that have accrued since the birth of the 2.3 kb deletion that occurred in the distant past. Yet the number of informative variants within this small segment is higher than genome-wide average of (57% percentile) also probably due at least in part to the accumulation of numerous mutations since the remote time of coalescence of the A and O alleles (see Fig. 1d in main text). (c) QQ plots for the effect of AO genotype on 150 phenotypes pertaining to meat quality, growth, carcass composition, hematology, health, and other phenotypes in the F6 and F7 generation. P-values were obtained using a mixed model followed by meta-analysis (weighted Z score) across the F6 and F7 generations as described in Methods. log-transformed p-values used for the QQ plot are nominal and two-sided. Context: Our findings in suidae are reminiscent of the trans-species polymorphism of the ABO gene in primates attributed to balancing selection²⁶. The phenotype driving balancing selection remain largely unknown yet a tug of war with pathogens is usually invoked: synthesized glycans may affect pathogen adhesion, toxin binding or act as soluble decoys, while naturally occurring antibodies may be protective^20,44. In humans, the O allele may protect against malaria¹¹⁸, E. Coli and Salmonella enteric infection¹¹⁹, SARS-CoV-1⁴², SARS-CoV-2⁴³ and schistosomiasis^120,121,122, while being a possible risk factor for cholera¹²³, H. pylori¹²⁴ and norovirus infection¹²⁵. Whatever the underlying selective force, it appears to have operated independently in at least two mammalian branches (primates and suidae), over exceedingly long periods of time, and over broad geographic ranges, hence pointing towards its pervasive nature. To gain insights in what selective forces might underpin the observed balanced polymorphism, we tested the effect of porcine AO genotype on >150 traits measured in the F6 and F7 generations pertaining to carcass composition, growth, meat quality, hematological parameters, disease resistance and behaviour. No significant effects were observed when accounting for multiple testing, including those pertaining to immunity and disease resistance. (d) Expression profile of the AO gene in a panel of adult and embryonic porcine tissues (own RNA-Seq data).

Extended Data Fig. 8 The chromosome 1 miQTL affects caecal N-acetyl-D-galactosamine (GalNAc) concentrations which are correlated with the abundance of Erysipelotyrichaceae species within AO genotype: theory.

(a) ABO and α-gal epitopes in pigs and human. The glycosyltransferase gene located on 9q34.2 and underpinning the human ABO blood group is characterized in most human populations by three major alleles: (i) I^A encoding a α-3-N-acetyl-D-galactosaminyltransferase that is adding GalNAc to H and Lewis antigens (yielding the A antigen) on various glycoproteins including mucins secreted in the intestinal lumen, (ii) I^B encoding a α-3-D-galactosyltransferase that is adding galactose to the same antigens (yielding the B antigen), and (iii) the inactive I^O null allele that precludes expression of either the A and/or the B antigen. Mutations in the fucosyltransferase 2 gene (FUT2) preclude formation of the H antigen on secreted proteins and hence the detection of A and B antigens in secretions²⁰. The pig orthologue of the human ABO glycosyltransferase gene is located on the telomeric end of porcine chromosome 1q, and is characterized by two major alleles: (i) the A allele, encoding a α-3-N-acetyl-D-galactosaminyltransferase that is adding GalNAc to H and Lewis antigens, similar to the human I^A allele, and (ii) the O allele corresponding to a null allele as a result of a 2.3 kb deletion similar to the human I^O allele²⁴. Thus, the B antigen (Galα1-3(Fucα1-2)Galβ1-4GlcNAc-R) is not observed in pig populations. However, what is found abundantly on the surface of cells in many tissues is the so-called “α-gal epitope” (Galα1-3Galβ1-4GlcNAc-R), which results from the addition of a galactose to the Galβ1-4GlcNAc-R precursor by a α1,3galactosyltransferase encoded by the GGTA1 gene. The orthologue of the GGTA1 gene is non-functional in human and Old World non-human primates, which, however, have high titers of circulating anti-α-gal antibodies contributing to acute rejection of xenografts^126,127. (b) Identifying whether changes in GalNAc concentration are the cause of the observed changes in abundance of Erysipelotrichaceae species by searching for a correlation between the two phenotypes “within AO genotype”. (b1) If AO genotype is associated with the abundance of Erysipelotrichaceae species and GalNAc concentrations by virtue of different molecular mechanisms (for instance because they involved distinct causative mutations albeit in linkage disequilibrium, or because the gene has an as of yet unknown other activity that is causing the change in bacterial abundance, independently of its glycosyltransferase activity), there is no reason to expect a correlation between bacterial abundance and GalNAc concentration within AO genotype (red horizontal lines in the dotted circles). There is of course a correlation across genotypes that is due to the fact that AO genotype has a (direct or indirect) effect on both phenotypes. (b2) If, on the other hand, AO genotype causes the change in GalNAc concentration (which is very likely given its known enzymatic activity) which then causes the change in the abundance of Erysipelotrichaceae species, one can expect that bacterial abundance and GalNAc concentration will be correlated, also within AO genotype, as indicated by the sloped red lines within the dotted ellipses. This is what is observed with the real data.

Extended Data Fig. 9 The chromosome 1 miQTL affects caecal N-acetyl-D-galactosamine (GalNAc) concentrations which are correlated with the abundance of Erysipelotyrichaceae species within AO genotype: results.

(a) Positive correlation between caecal GalNAc concentrations and bacterial abundance (upper panels: p-75-a5; lower panels: OTU327) “within AO genotype”. GalNAc concentrations and bacterial abundances were corrected for batch effects and AO genotype and scaled between 0 and 1 to equalize residual variance. Correlations were computed using all samples jointly and Spearman’s rank-based test; corresponding p-values (nominal; two-sided) are given (left panels). Regression lines are shown for the different AO genotypes separately (right panels); all of them are positive. Note that the scatter plots for p-75-a5 are not identical but very similar to those for OTU476 (Fig. 5b, c). This is because OTU476 accounts for most of the p-75-a5 genus in caecum content (see also Extended Data Fig. 5). These data can therefore not be considered to be independent. The shaded areas correspond to the 95% confidence regions for the regression fit. (b) Comparison of the free GalNAc concentrations in caecal content of OO, AO and AA pigs as well as in caecal content of germ-free mice gavaged with 200mg/kg GalNAc. Concentrations were determined in freeze-dried caecal content powder using LC-MS/MS. Number of analyzed samples are given (n). Boxplots are as in Fig. 1c.

Extended Data Fig. 10 The chromosome 1 miQTL affects bacteria with a functional GalNAc import and catabolic pathway.

Presence anywhere in the genome (green), presence in close proximity to agaS (red), or absence (black) of the orthologues of 24 genes implicated in the GalNAc TR/CP pathway in the genome of (i) two OTU476 like strains (4-15-1 and 4-8-110), (ii) 248 MAGs assigned to the Erysipelotrichaceae family, and (iii) 2,863 MAGs assigned to other bacterial families. The two lanes on the right of the three panels correspond to the Regulon (red) and Pathway (green) score respectively. Both scores range from 0 (black) to 6 (bright red or green). Means (range) for the corresponding dataset are given on top. P-values (nominal, two-sided, uncorrected) of the pathway and regulon scores were computed using a linear model described in Methods.

Extended Data Fig. 11 Different GalNAc operon structure and transcriptome response in miQTL-sensitive versus -insensitive GalNAc utilizing bacteria.

Maps of GalNAc “operons” in one of the two OTU476-like strains (NB: The organization of the GalNAc gene cluster was identical in both 4-15-1 and 4-8-110 strains), and six MAGs assigned respectively to an Erysipelotrichaceae, E. coli (an Enterobacteriaceae), a Collinsella (a Coriobacteriaceae), a Fusobacteriaceae, a Firmicutes and a Clostridium. Identified Open Reading Frames (ORFs) are represented as coloured boxes. Genes implicated in GalNAc import and catabolism are in red if they are part of the cluster and in green if located elsewhere in the genome. Genes with a known function unrelated to GalNAc are in blue. ORFs with uncharacterized gene product in gray. Gene acronyms are given next to the corresponding boxes. ORFs transcribed from the top (respectively bottom) strand are above (below) the dotted line. The respective transcriptional directions are marked by the arrows. The source of information used to confirm the map order is given (finished genome, multiple MAGs, single contig).

Extended Data Fig. 12 No effect of ABO genotype on intestinal Erysipelotrichaceae abundance in human.

Volcano and QQ plots for 43 (V1-V2), 20 (V3-V4) and 9 (V5-V6) OTUs classified as Erysipelotrichaceae for the contrasts (a) [AA, AO and AB] versus [BB, BO and OO], (b) [BB, BO and AB] versus [AA, AO and OO], and (c) [OO] versus [all others]. The shaded areas correspond to the 95% confidence intervals of the spread of the QQ plot under the null hypothesis of no QTL. The actual points are always within these intervals precluding us to reject the null hypothesis. P-values (nominal, two-sided) were computed using the linear model described in Methods and hereafter. See also Supplemental discussion 3.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, H., Wu, J., Huang, X. et al. ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs. Nature 606, 358–367 (2022). https://doi.org/10.1038/s41586-022-04769-z

Download citation

Received: 29 July 2020
Accepted: 19 April 2022
Published: 27 April 2022
Version of record: 01 June 2022
Issue date: 09 June 2022
DOI: https://doi.org/10.1038/s41586-022-04769-z

This article is cited by

Genomics of host–microbiome interactions in humans
- Pamela Ferretti
- Kelsey Johnson
- Ran Blekhman
Nature Reviews Genetics (2026)
Maternal intestinal L. vaginalis facilitates embryo implantation and survival through enhancing uterine receptivity in sows
- Qianhong Ye
- Yifan Hu
- Xianghua Yan
Microbiome (2025)
Gut microbiota and metabolites in lipid metabolism and intramuscular fat deposition: mechanisms and implications for meat quality
- Xiaofeng Song
- Chenglong Jin
- Xiaofan Wang
Journal of Animal Science and Biotechnology (2025)
Deciphering the coordinated roles of the host genome, duodenal mucosal genes, and microbiota in regulating complex traits in chickens
- Fangren Lan
- Xiqiong Wang
- Congjiao Sun
Microbiome (2025)
Unraveling the composition and function of pig gut microbiome from metagenomics
- Qiwu Tang
- Xiaoping Yin
- Shengguo Tan
Animal Microbiome (2025)