Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs

Abstract

The composition of the intestinal microbiome varies considerably between individuals and is correlated with health1. Understanding the extent to which, and how, host genetics contributes to this variation is essential yet has proved to be difficult, as few associations have been replicated, particularly in humans2. Here we study the effect of host genotype on the composition of the intestinal microbiota in a large mosaic pig population. We show that, under conditions of exacerbated genetic diversity and environmental uniformity, microbiota composition and the abundance of specific taxa are heritable. We map a quantitative trait locus affecting the abundance of Erysipelotrichaceae species and show that it is caused by a 2.3 kb deletion in the gene encoding N-acetyl-galactosaminyl-transferase that underpins the ABO blood group in humans. We show that this deletion is a ≥3.5-million-year-old trans-species polymorphism under balancing selection. We demonstrate that it decreases the concentrations of N-acetyl-galactosamine in the gut, and thereby reduces the abundance of Erysipelotrichaceae that can import and catabolize N-acetyl-galactosamine. Our results provide very strong evidence for an effect of the host genotype on the abundance of specific bacteria in the intestine combined with insights into the molecular mechanisms that underpin this association. Our data pave the way towards identifying the same effect in rural human populations.

This is a preview of subscription content, access via your institution

Access options

Fig. 1: Intestinal microbiota of the healthy pig.
Fig. 2: Heritability of microbiota composition in mosaic pigs.
Fig. 3: A miQTL affecting Erysipelotrichaceae species.
Fig. 4: A 3.5-million-year-old deletion in the pig ABO orthologue causes the miQTL.
Fig. 5: The miQTL acts by increasing GalNAc concentrations and affects GalNAc-using bacteria.
Fig. 6: The GalNAc operon organization and transcriptome response of miQTL-responsive bacteria.

Similar content being viewed by others

Data availability

All the 16S rRNA sequencing data, the metagenomics sequence data and the RNA-seq data were submitted to the GSA database under accession numbers CRA006230, CRA006239, CRA006240 and CRA006216. The genotype data were deposited at the GVM (http://bigd.big.ac.cn/gvm/getProjectDetail?project=GVM000310) under the GSA database under accession number GVM000310. The GWAS summary statistics are available at Figshare (https://doi.org/10.6084/m9.figshare.19313960). The whole-genome sequencing data of experimental pigs have been deposited in the GSA database (https://ngdc.cncb.ac.cn/gsa/browse/CRA006383) under accession number CRA006383. The source data are available at GitHub (https://github.com/yanghuijxau/Manuscript-microbiota-ABO).

Code availability

Codes to replicate the findings and the source data are available at GitHub (https://github.com/yanghuijxau/Manuscript-microbiota-ABO).

References

  1. Kundu, P., Blacher, E., Elinav, E. & Pettersson, S. Our gut microbiome: the evolving inner self. Cell 171, 1481–1493 (2017).

    Article  CAS  PubMed  Google Scholar 

  2. Rothschild, D. et al. Environment dominates over host genetics in shaping human gut microbiota. Nature 555, 210–215 (2018).

    Article  ADS  CAS  PubMed  Google Scholar 

  3. O’Hara, E., Neves, A. L. A., Song, Y. & Guan, L. L. The role of the gut microbiome in cattle production and health: driver or passenger? Annu. Rev. Anim. Biosci. 8, 199–220 (2020).

    Article  PubMed  Google Scholar 

  4. Schmidt, T. S. B., Raes, J. & Bork, P. The human gut microbiome: from association to modulation. Cell 172, 1198–1215 (2018).

    Article  CAS  PubMed  Google Scholar 

  5. Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on 50 years of twin studies. Nat. Genet. 47, 702–709 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Polubriaginof, F. C. G. et al. Disease heritability inferred from familial relationships reported in medical records. Cell 173, 1692–1704 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Benson, A. K. et al. Individuality in gut microbiota composition is a complex polygenic trait shaped by multiple environmental and host genetic factors. Proc. Natl Acad. Sci. USA 107, 18933–18938 (2010).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–227 (2012).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Goodrich, J. K. et al. Human genetics shape the gut microbiome. Cell 159, 789–799 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Blekhman, R. et al. Host genetic variation impacts microbiome compoistion across human body sites. Genome Biol. 16, 191 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. Turpin, W. et al. Association of host genome with intestinal microbial composition in a large healthy cohort. Nat. Genet. 48, 1413–1417 (2016).

    Article  CAS  PubMed  Google Scholar 

  12. Bonder, M. J. et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).

    Article  CAS  PubMed  Google Scholar 

  13. Wang, J. et al. Genome-wide association analysis identifies variation in vitamin D receptor and other host factors influencing the gut microbiota. Nat. Genet. 48, 1396–1406 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hughes, D. A. et al. Genome-wide associations of human gut microbiome variation and implications for causal inference analyses. Nat. Microbiol. 5, 1079–1087 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Sankararaman, S. et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature 507, 354–357 (2014).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  16. Patterson, N. et al. Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006).

    Article  ADS  CAS  PubMed  Google Scholar 

  17. Donaldson, G. P., Lee, S. M. & Mazmanian, S. K. Gut biogeography of the bacterial microbiota. Nat. Rev. Microbiol. 14, 20–32 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. Radjabzadeh, D. et al. Diversity, compositional and functional differences between gut microbiota of children and adults. Sci. Rep. 10, 1040 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. Goodrich, J. K. et al. Genetic determinants of the gut microbiome in UK twins. Cell Host Microbe 19, 731–743 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Cooling, L. Blood groups in infection and host susceptibility. Clin. Microbiol. Rev. 28, 801–870 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Rühlemann, M. C. et al. Genome-wide association study in 8,956 German individuals identifies influence of ABO histo-blood groups on gut microbiome. Nat. Genet. 53, 147–155 (2021).

    Article  PubMed  CAS  Google Scholar 

  22. Lopera-Maya, E. E. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch microbiome project. Nat. Genet. 54, 143–151 (2022).

    Article  CAS  PubMed  Google Scholar 

  23. Qin, Y. et al. Combined effects of host genetics and diet on human gut microbiota and incident disease in a single population cohort. Nat. Genet. 54, 134–142 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Choi, M. K. et al. Determination of complete sequence information of the human ABO blood group orthologous gene in pigs and breed differences in blood type frequencies. Gene 640, 1–5 (2018).

    Article  CAS  PubMed  Google Scholar 

  25. Wang, S. et al. Design of glycosyl transferase inhibitors: serine analogues as pyrophosphate surrogates? ChemPlusChem 80, 1525–1532 (2015).

    Article  CAS  PubMed  Google Scholar 

  26. Ségurel, L. et al. The ABO blood group is a trans-species poilymorphism in primates. Proc. Natl Acad. Sci. USA 109, 18493–18498 (2012).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  27. Groenen, M. A. M. A decade of pig genome sequencing: windo on pig domestication and evolution. Genet. Sel. Evol. 48, 23–32 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Ravcheev, D. A. & Thiele, I. Comparative genomic analysis of the human gut microbiome reveals a broad distribution of metabolic pathways for the degradation of host-synthesized mucin glycans and utilization of mucin-derived monosaccharides. Front. Genet. 8, 111 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Tailford, L. A. et al. Mucin glycan foraging in the human gut microbiome. Front. Genet. 6, 81 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Lien, K. A., Sauer, W. C. & He, J. M. Dietary influences on the secretion into and degradation of mucin in the digestive tract of monogastric animals and humans. J. Anim. Feed Sci. 10, 223–245 (2001).

    Article  Google Scholar 

  31. Brinkkötter, A. B., Klöss, H., Alpert, C.-A. & Lengeler, J. W. Pathways for the utilization of N-acetyl-galactosamine and galactosamine in Escherichia coli. Mol. Microbiol. 37, 125–135 (2000).

    Article  PubMed  Google Scholar 

  32. Rodionov, D. A. et al. Genomic encyclopedia of sugar utilization pathways in the Shewanella genus. BMC Genom. 11, 494 (2010).

    Article  CAS  Google Scholar 

  33. Leyn, S. A., Gao, F., Yang, C. & Rodionov, D. A. N-acetylgalactosamine utilization pathway and regulon in proteobacteria. J. Biol. Chem. 287, 28047–28056 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Hu, Z., Patel, I. R. & Mukherjee, A. Genetic analysis of the roles of agaA, agaI, and agaS genes in the N-acetyl-d-galactosamine and d-galactosamine catabolic pathways in Escherichia coli strains O157:H7 and C. BMC Microbiol. 13, 94 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Bidart, G. N., Rodriguez-Diaz, J., Monedoro, V. & Yebra, M. J. A unique gene cluster for the utilization of the mucosal and human milk-associated glycans galacto-N-biose and lacto-N-biose in Lactobacillus casei. Mol. Microbiol. 93, 521–538 (2014).

    Article  CAS  PubMed  Google Scholar 

  36. Zhang, H. et al. Two novel regulators of N-acetyl-galactosamine utilization pathway and distinct roles in bacterial infections. Microbiol. Open 4, 983–1000 (2015).

    Article  CAS  Google Scholar 

  37. Lawrence, J. Selfish operons: the evolutionary impact of gene clustering in prokrayotes and eukaryotes. Curr. Opin. Genet. Dev. 9, 642–648 (1999).

    Article  CAS  PubMed  Google Scholar 

  38. Koonin, E. V. Evolution of genome architecture. Int. J. Biochem. Cell Biol. 41, 298–306 (2009).

    Article  CAS  PubMed  Google Scholar 

  39. Lombard, V. et al. The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res. 42, D490–D495 (2014).

    Article  CAS  PubMed  Google Scholar 

  40. Rahfeld, P. et al. An enzymatic pathway in the human gut microbiome that converts A to universal O type blood. Nat. Microbiol. 4, 1475–1585 (2019).

    Article  CAS  PubMed  Google Scholar 

  41. Rahfled, P. et al. Prospecting for microbial α-N-acetylgalactosaminidases yields a new class of GH31 O-glycanase. J. Biol. Chem. 294, 16400–16415.

  42. Chen, Y. et al. ABO blood group and susceptibility to severe acute respiratory syndrome. JAMA 293, 1450–1451 (2005).

    Google Scholar 

  43. Ellinghaus, D. et al. The ABO blood group locus and a chromosome 3 gene cluster associate with SARS-CoV-2 respiratory failure in an Italian-Spanish genome-wide association analysis. Preprint at medRxiv https://doi.org/10.1101/2020.05.31.20114991 (2020).

  44. Blancher, A. Evolution of the ABO supergene family. ISBT Sci. Ser. 8, 201–206 (2013).

    Article  CAS  Google Scholar 

  45. Makivuokko, H. et al. Association between the ABO blood group and the human intestinal microbiota composition. BMC Microbiol. 12, 94 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  46. Davenport, E. R. et al. ABO antigen and secretor statuses are not associated with gut microbiota composition in 1,500 twins. BMC Genom. 17, 941–955 (2016).

    Article  Google Scholar 

  47. Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Malmuthuge, N., Griebel, P. J. & Guan, L. L. Taxonomic identification of commensal bacteria associated with the mucosa and digesta throughout the gastrointestinal tracts of preweaned calves. Appl. Environ. Microbiol. 80, 2021–2028 (2014).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  49. Hanson, M. E. B. et al. Population structure of human gut bacteria in a diverse cohort from rural Tanzania and Botswana. Genome Biol. 20, 16 (2019).

    Article  Google Scholar 

  50. Warr, A. et al. An improved pig reference genome sequence to enable pig genetics and genomics research. Gigascience 9 (2019).

  51. Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Coppieters, W., Karim, L. & Georges, M. SNP-based quantitative deconvolution of biological mixtures: application to the detection of cows with subclinical mastitis by whole genome sequencing of tank milk. Genome Res. 30, 1201–1207 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Magoč, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  PubMed  Google Scholar 

  60. Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahe, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  62. Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  63. Cheema, M. U. & Pluznick, J. L. Gut microbiota plays a central role to modulate the plasma and fecal metabolomes in response to angiotensin II. Hypertension 74, 184–193 (2019).

    Article  CAS  PubMed  Google Scholar 

  64. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg. Sci. 14, 927–930 (2003).

    Article  Google Scholar 

  67. Visscher, P. M. et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  68. Ziyatdinov, A. et al. lme4QTL: linear mixed models with flexible covariance structure for genetic studies of related individuals. BMC Bioinform. 19, 68 (2018).

    Article  Google Scholar 

  69. Haseman, J. K. & Elston, R. C. The investigation of linkage between a quantitative trait and a marker locus. Behav. Genet. 2, 3–19 (1972).

    Article  CAS  PubMed  Google Scholar 

  70. Aulchenko, Y. S., Ripke, S., Isaacs, A. & van Duijn, C. M. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23, 1294–1296 (2007).

    Article  CAS  PubMed  Google Scholar 

  71. Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).

    Article  CAS  PubMed  Google Scholar 

  73. Vaser, R., Sovic, I., Nagarajan, N. & Sikic, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at arXiv https://arxiv.org/abs/1303.3997 (2013).

  75. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  76. Harris, R. S. Improved Pairwise Alignment of Genomic DNA. PhD thesis, Pennsylvania State Univ. (2007).

  77. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  79. Liao, Y., Smyth, G. K. & Shi, W. FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article  CAS  PubMed  Google Scholar 

  80. Ai, H. et al. Adaptation and possible ancient interspecies introgression in pigs identified by whole-genome sequencing. Nat. Genet. 47, 217–225 (2015).

    Article  CAS  PubMed  Google Scholar 

  81. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the genome analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 11.10.11–11.10.33 (2013).

    Article  Google Scholar 

  82. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984).

    CAS  PubMed  Google Scholar 

  84. Nei, M. F-statistics and analysis of gene diversity in subdivided populations. Ann. Hum. Genet. 41, 225–233 (1977).

    Article  CAS  PubMed  MATH  Google Scholar 

  85. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Hunt, M. et al. Circlator: automated circularization of genome assemblies using long sequencing reads. Genome Biol. 16, 294 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  87. Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 11, 119 (2010).

    Article  CAS  Google Scholar 

  88. Chen, S., Zhou, Y., Chen, Y. & Jia, G. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Li, D. et al. MEGAHIT v1.0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods 102, 3–11 (2016).

    Article  CAS  PubMed  Google Scholar 

  90. Uritskiy, G. V., DiRuggiero, J. & Taylor, J. MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis. Microbiome 6, 158 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Olm, M. R., Brown, C. T., Brooks, B. & Banfield, J. F. dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication. ISME J. 11, 2864–2868 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).

    Article  CAS  PubMed  Google Scholar 

  94. Segata, N., Bornigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

    Article  ADS  PubMed  CAS  Google Scholar 

  95. Li, M. et al. Aldolase B suppresses hepatocellular carcinogenesis by inhibiting G6PD and pentose phosphate pathways. Nat. Cancer 1, 737–747 (2020).

    Article  Google Scholar 

  96. Nanchen, A., Fuhrer, T. & Sauer, U. Determination of metabolic flux rartios from 13C-experiments and gas chromatography-mass spectrometry data: protocol and principles. Methods Mol. Biol. 358, 177–197 (2007).

    Article  CAS  PubMed  Google Scholar 

  97. van Winden, W. A. et al. Correcting mass isoptopomer distributions for naturally occurring isotopes. Biotechnol. Bioeng. 80, 477–479 (2002).

    Article  PubMed  CAS  Google Scholar 

  98. Staley et al. Stable engraftment of human microbiota into mice with a single oral gavage following antibiotic conditioning. Microbiome 5, 87 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  101. Momozawa, Y. et al. IBD risk loci are enriched in multigenic regulatory modules encompassing putative causative genes. Nat. Commun. 9, 2427 (2018).

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  102. Bushnell, B. BBMap: A Fast, Accurate, Splice-Aware Aligner (version 38.82) https://sourceforge.net/projects/bbmap/ (2014).

  103. Köster, J. & Rahmann, S. Snakemake: a scalable bioinformatics workflow engine. Bioinformatics 28, 2520–2522 (2012).

    Article  PubMed  CAS  Google Scholar 

  104. Bolyen, E., Rideout, J. R., Dillon, M. R., Bokulich, N. A. & Caporaso, J. G. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Ghodsi, M., Liu, B. & Pop, M. DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinform. 12, 271 (2011).

    Article  Google Scholar 

  107. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  108. Srivastava, A. et al. Genomes of the mouse collaborative cross. Genetics 206, 537–556 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Yu, N. et al. Global patterns of human DNA sequence variation in a 10-kb region on chromosome 1. Mol. Biol. Evol. 18, 214–222 (2001).

    Article  CAS  PubMed  Google Scholar 

  110. The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  PubMed Central  CAS  Google Scholar 

  111. Frantz, L. A. F. et al. Evidence of long-term gene flow and selection during domestication from analyses of Eurasian wild and domestic pig genomes. Nat. Genet. 47, 1141–1148 (2015).

    Article  CAS  PubMed  Google Scholar 

  112. Charlier, C. et al. NGS-based reverse genetic screen for common embryonic lethal mutations compromising fertility in livestock. Genome Res. 26, 1333–1341 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  113. Georges, M., Charlier, C. & Hayes, B. Harnessing genomic information for livestock improvement. Nat. Rev. Genet. 20, 135–156 (2019).

    Article  CAS  PubMed  Google Scholar 

  114. Geraldes, A. et al. Inferring the history of speciation in house mice from autosomal, X-linked, Y-linked and mitochondrial genes. Mol. Ecol. 17, 5349–5363 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Huang, H. et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 547, 173–178 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  116. Suzuki, T. A. & Nachman, M. W. Spatial heterogeneity of gut microbioal composition along the gastrointestinal tract in natural populatiions of house mice. PLoS ONE 11, e0163720 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  117. Vuik, F. E. R. et al. Composition of the mucosa-associated microbiota along the entire gastrointestinal tract of human individuals. UEG J. 7, 897–907 (2019).

    Article  CAS  Google Scholar 

  118. Rowe, J. A. et al. Blood group O protects against severe Plasmodium falciparum malaria through the mechanism of reduced rosetting. Proc. Natl Acad. Sci. USA 104, 17471–17476 (2007).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  119. Robinson, M. G., Tolchin, D. & Halpern, C. Enteric bacterial agents and the ABO blood groups. Am. J. Hum. Genet. 23, 135–145 (1971).

    CAS  PubMed  PubMed Central  Google Scholar 

  120. Camus, D., Bina, J. C., Carlier, Y. & Santoro, F. ABO blood groups and clinical forms of schistosomiasis mansoni. Trans. R. Soc. Trop. Med. Hyg. 71, 182 (1977).

    Article  CAS  PubMed  Google Scholar 

  121. Pereira, F. E. L., Bortolini, E. R., Carneiro, J. L. A., da Silva, C. R. M. & Neves, R. C. A, B, O blood groups and hepatosplenic form of schistosomiasis mansoni (Symmer’s fibrosis). Trans. R. Soc. Trop. Med. Hyg. 73, 238 (1977).

    Article  Google Scholar 

  122. Ndamba, J., Gomo, E., Nyazema, N., Makaza, N. & Kaondera, K. C. Schistosomiasis infection in relation to the ABO blood groups among school children in Zimbabwe. Acta Trop. 65, 181–190 (1997).

    Article  CAS  PubMed  Google Scholar 

  123. Chaudhuri, A. & De, S. Cholera and blood groups. Lancet 2, 404 (1977).

    Article  CAS  PubMed  Google Scholar 

  124. Boren, T. et al. Attachment of Helicobacter pylori to human gastric epithelium mediated by blood group antigens. Science 262, 1892–1895 (1993).

    Article  ADS  CAS  PubMed  Google Scholar 

  125. Lindesmith, L. et al. Human susceptibility and resistance to Norwalk virus infection. Nat. Med. 9, 548–553 (2003).

    Article  CAS  PubMed  Google Scholar 

  126. Galili, U. in α-Gal and Anti-Gal (eds Galili, U. & Avila, J. L.) Vol. 32, 1–23 (Springer, 1999).

  127. Prather, R. S., Shen, M. & Dai, Y. Genetically modified pigs for medicine and agriculture. Biotechnol. Genetic Eng. Rev. 25, 245–266 (2008).

    CAS  Google Scholar 

Download references

Acknowledgements

We thank Y. He, S. Xiao, W. Li, Y. Guo and Y. Xing for assistance in the construction of the experimental mosaic pig populations; Y. Su and J. Li for preparation of reagents and management of samples; Y. Momozawa, R. Mariman, M. Mni, L. Karim and M. Dekkers for generating the CEDAR-1 16S rRNA data; the staff at the Jiangxi Department for Education, the Ministry of Science and Technology of P. R. China, the Ministry of Agriculture and Rural Affairs of P. R. China, and Jiangxi department of Science and Technology for their long-term support of the swine heterogeneous stock project; and the members of the MIQUANT consortium for comments and discussions. L.H. is supported by The National Natural Science Foundation of China (31790410) and National pig industry technology system (CARS-35); C. Chen by the National Natural Science Foundation of China (31772579); H.Y. by the National Postdoctoral Program for Innovative Talent (no. BX201700102); L.S. by the FNRS IBD-GI-Seq project; M.G. by the Chinese Thousand Talents Program, the Belgian EOS ‘Miquant’ project and the FNRS (CDR ‘GEM’ project). C. Charlier is a senior research associate at the FNRS.

Author information

Authors and Affiliations

Authors

Contributions

H.Y. analysed the 16S rRNA sequence data, performed GWAS, meta-analyses and local association analyses, computed heritabilities of individual taxa, contributed to ABO genotyping and analysed the effect of the 2.3 kb deletion on taxa abundance. J.W. analysed the composition of the microbiome, including PCoA analyses, β- and α-diversity, correlations between kinship and microbiome dissimilarities, isolated the OTU476-like strains, performed the GalNAc feeding experiments, measured the concentrations of GalNAc in the caecal lumen, analysed the GalNAc import and use pathway in the MAGs, and contributed to ABO genotyping. X.H. participated in 16S rRNA sequencing (F6) and GWAS (F6). Y. Zhou performed metagenome sequencing analysis, analysed the GalNAc import and use pathway in MAGs, analysed the RNA-seq data from caecum samples and contributed to ABO genotyping. Y. Zhang participated in the preparation of the genotype data from whole-genome sequence information, participated in the computation of the genomic contribution of the different breeds in the F6 and F7 generation and the definition of expected mapping resolution, performed LD analyses, performed eQTL analysis for the ABO gene, participated in the characterization and sequence analysis of the ABO gene, including definition of the 2.3 kb deletion, and in the balancing selection and trans-species polymorphism analyses. M.L. assisted with the isolation of the OTU476-like strains, the GalNAc feeding experiments and genotyping of the ABO gene. Q.L. assisted with measuring the concentrations of GalNAc in caecal lumen. S.K., M.H., H.F., S.F., X.X., H.J., Z.C. and J.G. assisted with the experiments. Z.Z., X.T.,  Z.W., H.G. and Y.H. assisted with the preparation of genotype data from whole-genome sequencing data and conducted the analysis of the Nanopore data of the ABO region. J.M. assisted with the construction of the mosaic population. H.A. assisted with the bioinformatic analysis of the ABO region, de novo assembly of the A allele, and evolutionary analysis of the ABO alleles. L.S. analysed the effect of ABO genotype on intestinal microbiota composition in humans. W.C. assisted in the analysis of the sequencing data for the trans-species polymorphisms. C. Charlier supervised the characterization of the ABO gene and the 2.3 kb deletion and the corresponding haplotype structure in the F0, F6 and F7 population and for the trans-species polymorphism. B.Y. prepared the genotype data of whole-genome variants, assisted with raising the heterogeneous stock, and participated in the computation of the genomic contribution of the different breeds in the F6 and F7 generation and the definition of expected mapping resolution. M.G. supervised the bioinformatic and statistical analyses, performed bioinformatic and statistical analyses, and wrote the paper. C. Chen codesigned the study, supervised experiments, supervised bioinformatic and statistical analyses of gut microbiome, and wrote the paper. L.H. created the swine heterogeneous stock, designed the study, directed the project, supervised the experiments and analyses, and wrote the paper.

Corresponding authors

Correspondence to Michel Georges, Congying Chen or Lusheng Huang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks Catherine Lozupone, Vincent Plagnol and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Generating a large mosaic pig population for genetic analysis of complex phenotypes.

(a) Rotational breeding design used for the generation of a large mosaic pig population for the genetic analysis of complex phenotypes, with sampling scheme for faeces (D25, D120, D240), luminal content of the ileum (IC) and caecum (CC), and mucosal scrapings in the ileum (IM) and caecum (CM). BX: Bamaxiang, EH: Erhualian, LA: Laiwu, TB: Tibetan, LW: Large White, LD: Landrace, PT: Piétrain, WD: White Duroc. (b) Average similarity (1 – π) between allelic sequences sampled within and between the eight founder breeds. The colour intensity ranges from black (breeds with lowest allelic similarity: BX vs WD, 1 - 4.3x10−3) to bright red (breed with highest allelic similarity: WD, 1 - 1.8x10−3). The acronyms for the breeds are as in (a). More than 30 million variants with MAF ≥ 3% segregate in this population, i.e. more than one variant every 100 base pairs. This is slightly lower than the 40 million high quality variants segregating in the mouse collaborative cross108. (c) Comparison of the average nucleotide diversity (π, i.e. the proportion of sites that differ between two chromosomes sampled at random in the population(s)) within and between European (Eur) and Asian (As) domestic pigs, and between modern European (HSEur), Asian humans (HSAs), Neanderthal (Neand) and Chimpanzee (Pan Trogl). The average nucleotide diversity within the four Chinese founder breeds was ~2.5x10−3 and within the four European founder breeds ~2.0x10−3. By comparison, π-values within African and within Asian/European human populations are ~9x10−4 and ~8x10−4, respectively109,110. Thus, against intuition (as domestication is often assumed to have severely reduced effective population size) the within population diversity is >2-fold higher in domestic pigs than in human populations, as previously reported111,112,113. Nucleotide diversities between Chinese founder breeds and between European founder breeds were ~3.6x10−3 and ~2.5x10−3, respectively, i.e. 1.44-fold and 1.25-fold higher than the respective within-breed π-values. These π-values are of the same order of magnitude as the sequence divergence between Homo sapiens and Neanderthals/Denosivans (~3x10−3, ref. 15). By comparison, π-values between Africans, Asians and Europeans are typically ≤ ~1x10−3 (ref. 109). The nucleotide diversity between Chinese and European breeds averaged ~4.3x10−3. This π-value is similar to the divergence between M. domesticus and M. castaneus114, and close to halve the ~1% difference between chimpanzee and human16. Note that Chinese and European pig breeds are derived from Chinese and European wild boars, respectively, which are thought to have diverged ~1 million years ago27, while M. domesticus and M. castaneus are thought to have diverged ≤ 500,000 years ago114. (d) Autosome-specific estimates of the genomic contributions of the eight founder breeds in the F6 and F7 generation. We used a linear model incorporating all variants to estimate the average contribution of the eight founder breeds in the F6 and F7 generation at genome and chromosome level56. At genome-wide level, the proportion of the eight founder breed genomes ranged from 11.2% (respectively 11.5%) to 14.1% (14.7%) in the F6 (F7) generations. At chromosome-specific level, the proportion of the eight founder breeds ranged from 6.7% (respectively 4.9%) to 20.7% (22.1%) in the F6 (F7) generations. The genomic contribution of the eight founder breeds in the F6 and F7 generation is remarkably uniform and close to expectations (i.e. 12.5%) both at genome-wide and chromosome-wide level, suggesting comparable levels of genetic diversity across the entire genome. This does not preclude that more granular examination may reveal local departures from expectations, or under-representation of incompatible allelic combinations at non-syntenic loci. (e-f) Indicators of achievable mapping resolution in the F6 generation: (e) Frequency distribution (density) of the number of variants in high LD (r2 ≥ 0.9) with an “index” variant (was computed separately for all variants considered sequentially as the “index”), corresponding to the expected size of “credible sets” in GWAS115. The red vertical line corresponds to the genome-wide median. The green vertical line corresponds to the mapping resolution achieved in this study for the ABO locus (see hereafter). (f) Frequency distribution (density) of the maximum distance between an index variant and a variant in high LD (r2 ≥ 0.9) with it, defining the spread of credible sets. Red and green vertical lines are as in (D).

Extended Data Fig. 2 Characterizing the age- and location-specific composition of the intestinal microbiome of the healthy pig.

(a) Definition of a core intestinal microbiome of the pig. A total of 58 OTUs that were annotated to 21 taxa were identified in >95% of day 120 and 240 faeces and caecum content samples of both F6 and F7 generations, hence defined as core bacterial taxa. (b) The compositions of the porcine and human intestinal microbiota are closer to each other than either is to that of the mouse. Boxplots are as is Fig. 1c. The number of samples available for analysis were 1281 pigs, 106 humans and 6 mice. (c) Abundances (F6-F7 averages when available) of the 43 families represented in Fig. 1b in the seven sample types relative to the sample type in which they are the most abundant (red – blue scale). The families are ordered according to the sample type in which they are the most abundant. The colour-code for phyla is as in Fig. 1b. Columns are added for comparison with mouse and human. Mouse data are from Fig. 1 in Suzuki & Nachman116, and human data from Fig. 6 in Vuik et al117. P_I: proximal ileum, D_IL: distal ileum, C: caecum, CO: colon, RE: rectum, F: faeces. The families differing the most with regards to location-specific distribution between species include Helicobacteriaceae, Veillonellaceae, Lactobacillaceae and Streptocaccaceae.

Extended Data Fig. 3 Evaluating the heritability of intestinal microbiota composition in the mosaic pig population.

Correlation between heritability estimates of taxa/OTUs in F6 and F7 generation by sample type (D25, D120, D240, CC and IC). Correlation coefficients (r) and associated p-values (p) were computed using heritability estimates that were pre-corrected for bacterial abundance (residuals of linear model). Heritability estimates indeed tend to slightly increase with taxa abundance. Yet, results show that this effect cannot account for the observed correlations between F6 and F7 estimates in D120, D240 and CC, hence pointing towards genuine genetic effects. The shaded areas correspond to the 95% confidence region for the regression fit. Correlation coefficients and two-sided p-values were computed using Spearman’s rank-based method. Reported p-values are nominal (i.e. uncorrected for multiple testing).

Extended Data Fig. 4 Identifying a microbiota QTL (miQTL) with major effect on the abundance of Erysipelotrichaceae species by whole genome sequence based GWAS.

(a) Schematic illustration of the samples and SNPs used for the two types of analyses (abundance and presence/absence) performed for miQTL mapping. (b) (Upper) Distribution of log(1/p) values for 1,527 sets of 11 p-values obtained in 11 data-series for a SNP x taxon x analysis model combination that yielded a genome-wide significant signal (p < 5 x 10−8) in the 12th data-series. (Lower) Distribution of log(1/p) values for 1,527 sets of 11 p-values obtained in the same data-series and with the same analysis model as in (upper) but with randomly selected SNP x taxon combinations matching the ones in (upper) for MAF and taxa abundance. Log(1/p) values were computed using GenABEL as described in Methods. Corresponding p-values are nominal and two-sided. (c) Correlation between the average (F6 and F7) taxon heritability, and the average (F6 and F7) number of genome-wide significant (p ≤5 × 10−8) miQTL for D240 faecal samples. The shaded area corresponds to the 95% confidence region for the regression fit. Correlation coefficient and associated p-values are Spearman’s. (d) QQ plot for 1,527 (number of signals (SNP x taxon x model x one data series in one cohort) exceeding the genome-wide log(1/p) threshold value of 7.3) sets of ≤ 5-7 p-values (same SNP x taxon x model, all data series in the other cohort) for real SNPs (Blue: quantitative model; Green: binary model), and matched sets of ≤ 5-7 p-values corresponding to randomly selected SNP x taxon combinations matched for MAF and abundance or presence/absence rate (Brown: quantitative model; Yellow: binary model). Log (1/p) values were computed using GenABLE as described in Methods. Corresponding p-values are nominal and two-sided. (e) Same QQ plot as in (C) after removal of all SNPs in the chromosome 1: 272.8-273.1Mb interval. Log(1/p) values were computed using GenABLE as described in Methods. Corresponding p-values are nominal and two-sided. (f) Distribution of the association log(1/p) values and corresponding signed z-scores for SNP 1_272907239 and 31 p-75-a5 OTUs (red) and 83 Erysipelotrichaceae (yellow) OTUs, showing an enrichment of effects with same sign as for OTU476 and OTU327. Log(1/p) values were computed using Metal (v3.0) as described in Methods. Corresponding p-values are nominal and two-sided. See also Supplemental discussion 1.

Extended Data Fig. 5 The chromosome 1 miQTL is caused by a 2.3 kb deletion in the orthologue of the human ABO gene.

(a) Breakpoints of the 2.3 kb deletion showing the role of a duplicated SINE sequence in mediating an intra-chromosomal recombination. (b) Illustrative example of allelic balance for the cG146C SNP in an AA homozygote and of allelic imbalance for the same SNP in an AO heterozygote. (c) (Upper) eQTL analysis for the porcine AO gene maximizing at the exact position of the 2.3 kb deletion (p = 1.9x10−43) and showing the additive effect of the A allele increasing transcript levels ~3-fold (inset; FPKM: Fragments Per Kilobase of transcript per Million mapped reads). The “n’s” correspond to the number of animals of each genotype available for analysis. Boxplots are as in Fig. 1c. (Lower) Genome wide eQTL scan for the porcine ABO gene showing the strong cis-eQTL signal on chromosome 1. eQTL analysis was conducted with GEMMA (v0.97)64. Reported log-transformed p-values are nominal and two-sided. (d) Effect of N-acetyl-galactosaminyl transferase genotype (AA, AO or OO) on abundance of OTU327 and p-75-a5 in the twelve data series. Absence of an effect of N-acetyl-galactosaminyl transferase genotype (AA, AO or OO) on abundance of E. coli in the twelve data series. Sample sizes are as in STable 4.1. Boxplots are as in Fig. 3d. (e) Abundance of OTU476, OTU327 and p-75-a5 in the twelve data series. Violin plots with indication of the median. Numbers (n’s) are as in STable 4.1. See also Supplemental discussion 2.

Extended Data Fig. 6 cis-eQTL analyses in the vicinity of the chromosome 1 miQTLK supports the causality of the 2.3 kb deletion.

(a) Cis-eQTL analysis for the porcine N-acetyl-galactosaminyl transferase (“ABO”), GBTG1, LCN1 (=OBP2B), MED22 and SURF6 genes in caecum. The blue triangle corresponds to the top SNP for the miQTL. The red triangles correspond to the top SNPs for the respective cis-eQTL. Only for N-acetyl-galactosaminyl transferase are blue and red variants the same. eQTL analyses were conducted with GEMMA (v0.97)64. Reported log-transformed p-values are nominal and two-sided. (b) Effect of AO genotype on the expression levels of the corresponding genes in caecum. There was no evidence for an effect of AO genotype on the expression of any of these genes other than ABO. The number of AA, AO and OO samples available for cis-eQTL analysis for each gene are given (n). Boxplots are as in Fig. 1c. We tested the difference in gene expression level between pairs of genotype classes using a two-sided t-test. (c) Effect of the top cis-eQTL SNPs (blue triangles in A) on OTU476 abundance. Only the top cis-eQTL SNPs for ABO has an effect on OTU476 abundance. The number of AA, AO and OO samples available for miQTL analysis for each gene are given (n). Boxplots are as in Fig. 1c. We tested the difference in bacterial abundance between pairs of genotype classes using a two-sided t-test.

Extended Data Fig. 7 The 2.3 kb deletion in the orthologue of the human ABO gene is 3.5 million years old and under balancing selection.

(a) UPGMA tree based on nucleotide diversities between 14 AA and 34 OO animals in windows of increasing size (0.5 to 40 kb) centred on the 2.3 kb deletion in the porcine N-acetyl-galactosaminyl transferase gene (porcine O allele). PA: Phacochaerus Africanus, SC: Sus cebifrons, SV: Sus verrucosus, SU: Sus scrofa vittatus, CB: Chinese wild boar, RB: Russian wild boar, EB: European wild boar, ERH: Erhualian, BX: Bamaxiang, T: Tibetan, LA: Laiwu, LR: Landrace, LW: Large White, PI: Piétrain, WD: White Duroc. Context: To gain additional insights in the age of the porcine O allele, we generated phylogenetic trees of the A and O alleles of 14 AA and 34 OO animals including domestic pigs, wild boars, Visayan and Javanese warty pigs, and common African warthog. Examination of their local SNP genotypes (50K window encompassing the ABO gene) reveals traces of ancestral recombinations between O and A haplotypes as close as 300 and 800 base pairs from the proximal and distal deletion breakpoints, respectively, as well as multiple instances of homoplasy that may either be due to recombination, gene conversion or recurrent de novo mutations. On their own, these signatures support the old age of the O allele. We constructed UPGMA trees based on nucleotide diversity for windows ranging from 500 bp to 40 kb centred on the 2.3 kb deletion. Smaller windows have a higher likelihood to compare the genuine ancestral O versus A states, yet yield less robust trees because they are based on smaller number of variants. Larger windows will increasingly be contaminated with recombinant A-O haplotypes blurring the sought signal. Indeed, for windows ≥ 20 kb or more, the gene tree corresponds to the species tree, while for windows ≤ 15 kb the tree sorts animals by AA vs OO genotype. For all windows ≤ 15 kb the Sus cebifrons O allele maps outside of the Sus scrofa O allele supporting a deep divergence (rather than hybridization) and hence the old age of the O allele. Of note, for windows ≤1.2 kb, the warthog A allele is more closely related to the Sus A alleles than to the Sus O alleles (ED7A). This suggests that the O allele may be older than the divergence of the Phacochoerus and Sus A alleles, i.e. > 10 MYA. It will be interesting to study larger numbers of warthog to see whether the same 2.3 kb deletion exists in this and other related species as well. (b) Alignment of ~900 base pairs of the O alleles of domestic pigs (Bamaxian), European and Asian wild boars, and Sus cebufrons demonstrating that these are identical-by-descent. The SINE element that is presumed to have mediated the recombinational event that caused to 2.3 kb deletion is highlighted in red. Context: To further support their identity-by-descent we aligned ~900 base pairs (centred on the position of the 2.3 kb deletion) of the O alleles of domestic pig, European and Asian wild boars and Sus cebifrons. The sequences were nearly identical further supporting our hypothesis. It is noteworthy that the old age of the “O” allele must have contributed to the remarkable mapping resolution (≤3 kb) that was achieved in this study. In total, 42 variants were in near perfect LD (r2 ≥ 0.9) with the 2.3 kb deletion in the F0 generation, spanning 2,298 bp (1,522 on the proximal side, and 762 on the distal side of the 2.3 kb deletion). This 2.3 kb span is lower than genome-wide expectations (17th percentile), presumably due to the numerous cross-overs that have accrued since the birth of the 2.3 kb deletion that occurred in the distant past. Yet the number of informative variants within this small segment is higher than genome-wide average of (57% percentile) also probably due at least in part to the accumulation of numerous mutations since the remote time of coalescence of the A and O alleles (see Fig. 1d in main text). (c) QQ plots for the effect of AO genotype on 150 phenotypes pertaining to meat quality, growth, carcass composition, hematology, health, and other phenotypes in the F6 and F7 generation. P-values were obtained using a mixed model followed by meta-analysis (weighted Z score) across the F6 and F7 generations as described in Methods. log-transformed p-values used for the QQ plot are nominal and two-sided. Context: Our findings in suidae are reminiscent of the trans-species polymorphism of the ABO gene in primates attributed to balancing selection26. The phenotype driving balancing selection remain largely unknown yet a tug of war with pathogens is usually invoked: synthesized glycans may affect pathogen adhesion, toxin binding or act as soluble decoys, while naturally occurring antibodies may be protective20,44. In humans, the O allele may protect against malaria118, E. Coli and Salmonella enteric infection119, SARS-CoV-142, SARS-CoV-243 and schistosomiasis120,121,122, while being a possible risk factor for cholera123, H. pylori124 and norovirus infection125. Whatever the underlying selective force, it appears to have operated independently in at least two mammalian branches (primates and suidae), over exceedingly long periods of time, and over broad geographic ranges, hence pointing towards its pervasive nature. To gain insights in what selective forces might underpin the observed balanced polymorphism, we tested the effect of porcine AO genotype on >150 traits measured in the F6 and F7 generations pertaining to carcass composition, growth, meat quality, hematological parameters, disease resistance and behaviour. No significant effects were observed when accounting for multiple testing, including those pertaining to immunity and disease resistance. (d) Expression profile of the AO gene in a panel of adult and embryonic porcine tissues (own RNA-Seq data).

Extended Data Fig. 8 The chromosome 1 miQTL affects caecal N-acetyl-D-galactosamine (GalNAc) concentrations which are correlated with the abundance of Erysipelotyrichaceae species within AO genotype: theory.

(a) ABO and α-gal epitopes in pigs and human. The glycosyltransferase gene located on 9q34.2 and underpinning the human ABO blood group is characterized in most human populations by three major alleles: (i) IA encoding a α-3-N-acetyl-D-galactosaminyltransferase that is adding GalNAc to H and Lewis antigens (yielding the A antigen) on various glycoproteins including mucins secreted in the intestinal lumen, (ii) IB encoding a α-3-D-galactosyltransferase that is adding galactose to the same antigens (yielding the B antigen), and (iii) the inactive IO null allele that precludes expression of either the A and/or the B antigen. Mutations in the fucosyltransferase 2 gene (FUT2) preclude formation of the H antigen on secreted proteins and hence the detection of A and B antigens in secretions20. The pig orthologue of the human ABO glycosyltransferase gene is located on the telomeric end of porcine chromosome 1q, and is characterized by two major alleles: (i) the A allele, encoding a α-3-N-acetyl-D-galactosaminyltransferase that is adding GalNAc to H and Lewis antigens, similar to the human IA allele, and (ii) the O allele corresponding to a null allele as a result of a 2.3 kb deletion similar to the human IO allele24. Thus, the B antigen (Galα1-3(Fucα1-2)Galβ1-4GlcNAc-R) is not observed in pig populations. However, what is found abundantly on the surface of cells in many tissues is the so-called “α-gal epitope” (Galα1-3Galβ1-4GlcNAc-R), which results from the addition of a galactose to the Galβ1-4GlcNAc-R precursor by a α1,3galactosyltransferase encoded by the GGTA1 gene. The orthologue of the GGTA1 gene is non-functional in human and Old World non-human primates, which, however, have high titers of circulating anti-α-gal antibodies contributing to acute rejection of xenografts126,127. (b) Identifying whether changes in GalNAc concentration are the cause of the observed changes in abundance of Erysipelotrichaceae species by searching for a correlation between the two phenotypes “within AO genotype”. (b1) If AO genotype is associated with the abundance of Erysipelotrichaceae species and GalNAc concentrations by virtue of different molecular mechanisms (for instance because they involved distinct causative mutations albeit in linkage disequilibrium, or because the gene has an as of yet unknown other activity that is causing the change in bacterial abundance, independently of its glycosyltransferase activity), there is no reason to expect a correlation between bacterial abundance and GalNAc concentration within AO genotype (red horizontal lines in the dotted circles). There is of course a correlation across genotypes that is due to the fact that AO genotype has a (direct or indirect) effect on both phenotypes. (b2) If, on the other hand, AO genotype causes the change in GalNAc concentration (which is very likely given its known enzymatic activity) which then causes the change in the abundance of Erysipelotrichaceae species, one can expect that bacterial abundance and GalNAc concentration will be correlated, also within AO genotype, as indicated by the sloped red lines within the dotted ellipses. This is what is observed with the real data.

Extended Data Fig. 9 The chromosome 1 miQTL affects caecal N-acetyl-D-galactosamine (GalNAc) concentrations which are correlated with the abundance of Erysipelotyrichaceae species within AO genotype: results.

(a) Positive correlation between caecal GalNAc concentrations and bacterial abundance (upper panels: p-75-a5; lower panels: OTU327) “within AO genotype”. GalNAc concentrations and bacterial abundances were corrected for batch effects and AO genotype and scaled between 0 and 1 to equalize residual variance. Correlations were computed using all samples jointly and Spearman’s rank-based test; corresponding p-values (nominal; two-sided) are given (left panels). Regression lines are shown for the different AO genotypes separately (right panels); all of them are positive. Note that the scatter plots for p-75-a5 are not identical but very similar to those for OTU476 (Fig. 5b, c). This is because OTU476 accounts for most of the p-75-a5 genus in caecum content (see also Extended Data Fig. 5). These data can therefore not be considered to be independent. The shaded areas correspond to the 95% confidence regions for the regression fit. (b) Comparison of the free GalNAc concentrations in caecal content of OO, AO and AA pigs as well as in caecal content of germ-free mice gavaged with 200mg/kg GalNAc. Concentrations were determined in freeze-dried caecal content powder using LC-MS/MS. Number of analyzed samples are given (n). Boxplots are as in Fig. 1c.

Extended Data Fig. 10 The chromosome 1 miQTL affects bacteria with a functional GalNAc import and catabolic pathway.

Presence anywhere in the genome (green), presence in close proximity to agaS (red), or absence (black) of the orthologues of 24 genes implicated in the GalNAc TR/CP pathway in the genome of (i) two OTU476 like strains (4-15-1 and 4-8-110), (ii) 248 MAGs assigned to the Erysipelotrichaceae family, and (iii) 2,863 MAGs assigned to other bacterial families. The two lanes on the right of the three panels correspond to the Regulon (red) and Pathway (green) score respectively. Both scores range from 0 (black) to 6 (bright red or green). Means (range) for the corresponding dataset are given on top. P-values (nominal, two-sided, uncorrected) of the pathway and regulon scores were computed using a linear model described in Methods.

Extended Data Fig. 11 Different GalNAc operon structure and transcriptome response in miQTL-sensitive versus -insensitive GalNAc utilizing bacteria.

Maps of GalNAc “operons” in one of the two OTU476-like strains (NB: The organization of the GalNAc gene cluster was identical in both 4-15-1 and 4-8-110 strains), and six MAGs assigned respectively to an Erysipelotrichaceae, E. coli (an Enterobacteriaceae), a Collinsella (a Coriobacteriaceae), a Fusobacteriaceae, a Firmicutes and a Clostridium. Identified Open Reading Frames (ORFs) are represented as coloured boxes. Genes implicated in GalNAc import and catabolism are in red if they are part of the cluster and in green if located elsewhere in the genome. Genes with a known function unrelated to GalNAc are in blue. ORFs with uncharacterized gene product in gray. Gene acronyms are given next to the corresponding boxes. ORFs transcribed from the top (respectively bottom) strand are above (below) the dotted line. The respective transcriptional directions are marked by the arrows. The source of information used to confirm the map order is given (finished genome, multiple MAGs, single contig).

Extended Data Fig. 12 No effect of ABO genotype on intestinal Erysipelotrichaceae abundance in human.

Volcano and QQ plots for 43 (V1-V2), 20 (V3-V4) and 9 (V5-V6) OTUs classified as Erysipelotrichaceae for the contrasts (a) [AA, AO and AB] versus [BB, BO and OO], (b) [BB, BO and AB] versus [AA, AO and OO], and (c) [OO] versus [all others]. The shaded areas correspond to the 95% confidence intervals of the spread of the QQ plot under the null hypothesis of no QTL. The actual points are always within these intervals precluding us to reject the null hypothesis. P-values (nominal, two-sided) were computed using the linear model described in Methods and hereafter. See also Supplemental discussion 3.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Wu, J., Huang, X. et al. ABO genotype alters the gut microbiota by regulating GalNAc levels in pigs. Nature 606, 358–367 (2022). https://doi.org/10.1038/s41586-022-04769-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41586-022-04769-z

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing