Background & Summary

The gastrointestinal microbiota are vast in number and diverse in composition1,2. Compared to monogastric animals, ruminant gastrointestinal microbiota exhibit higher diversity3. Studies have shown4 that the gut microbiota of ruminants mainly consists of bacteria, fungi, archaea, and viruses, with bacteria being the most abundant, accounting for approximately 80% of the total microbial population in the gut5. The gut microbiome is an important regulatory factor in host health and physiological functions, playing a critical role in the host’s growth and development6, nutritional metabolism7,8, immune regulation9,10,11, and disease resistance12,13. Goats, as important livestock with strong adaptability and high economic value, have been shown to have a strong correlation between their rumen or intestinal microbiota and the fatty acids in their longissimus dorsi muscle, revealing specific bacteria associated with these fatty acids14. The gastrointestinal microbiota of goats significantly affects the digestibility of dietary calcium, with Prevotella species in the rumen being beneficial to the digestion of dietary calcium, supporting normal physiological functions and growth15. However, previous research has mainly focused on the rumen and intestinal microbiota of adult goats, emphasizing the impact of microbial communities on the health and production performance of mature animals. In contrast, studies on the microbiota of fetal and young goats are limited, especially regarding the establishment of gastrointestinal microbiota during the fetal stage and the early microbial community’s impact on goat growth and development, which restricts our understanding of microbial dynamics throughout the goat’s life cycle and its early effects on immunity and nutrient absorption.

With the advancement of molecular biology techniques, researchers have utilized high-throughput sequencing to analyze the gut microbiota community structure of ruminants under various breeds, growth stages, and feeding environments16. 16S rRNA sequencing and metagenomic sequencing have become essential tools for studying microbiota community structure and function. Studies have found that factors such as breed17, age18, and feeding environment19 influence the composition of animal gastrointestinal microbiota. Metagenomic sequencing, which involves sequencing the total DNA of microbial communities, provides a comprehensive understanding of microbiota structure and function, uncovering functional genes and metabolic pathways, and revealing the relationship between microbiota and host growth and development20, as well as disease occurrence21. Currently, 16S rRNA sequencing and metagenomic sequencing are widely used in gastrointestinal microbiota research.

Recent advances in agricultural microbiome research underscore the value of cross-species methodological integration. For instance, Wang et al. employed metagenomics to constructed a chicken multi-kingdom microbiome catalog (CMKMC)22, while our study establishes a dataset combining 16S rRNA sequencing (fetal goat) and metagenomics (7-day-old goat). This dual approach addresses technical challenges in low-biomass fetal environments while enhancing functional insights in neonates. Both studies focused on the host gut microbiome, with bacteria dominating during initial colonization. The eukaryotic microbial catalog from Wang et al. further aids annotation of eukaryotic signals in our dataset, whereas our protocols for prenatal sampling offer methodological insights for embryonic microbiome studies.

This study focuses on the 16S rRNA gene sequencing of the two stomachs (rumen and reticulum) as well as the large and small intestines of fetal Hunan local goats (Hutianshi Goat). Additionally, metagenomic sequencing was performed on the contents of the four stomachs (rumen, omasum, abomasum, and reticulum) and the large intestine (cecum, rectum, colon) and small intestine (ileum, jejunum) of 7-day-old goat kids. Understanding the interaction mechanisms between gut microbiota composition and ruminant health status will help in developing effective preventive and management strategies. The generation of these data provides an important molecular basis for understanding the interaction mechanisms between goat gastrointestinal microbiota and their host. This study provides the 16S rRNA sequencing data of fetal goat gastrointestinal microbiota (PRJNA1159466) and metagenomic sequencing data of 7-day-old goat kid gastrointestinal microbiota (PRJNA1160040). These datasets not only enrich the microbiome database of Hunan local goats but also provide solid data support for future functional gene research, microbiota-host interaction mechanism analysis, and precision breeding strategies. By making these high-quality microbiome data publicly available, we aim to promote further academic understanding of goat gastrointestinal microbiota ecology, foster healthy farming practices, and encourage sustainable development in animal husbandry. Additionally, these data provide a valuable reference resource for livestock microbiome research globally and hold significant application prospects and scientific value.

Methods

Ethical declaration

Animal handling and experimental procedures were approved by the animal protection and utilization committee of Hunan Agricultural University (protocol number: HAU ACC 2022120). All animal treatments and experiments comply with the guidelines for ethical review of animal welfare in the national standards of the People’s Republic of China (151). This study does not involve any endangered or protected species, so no additional specific permits are required in addition to standard ethical approvals.

Experimental animal sample collection and DNA

All animals in this study were sourced from the Hutianshi Goat breeding farm in Xiangtan City. Details of the sample classification and its sequence reading counts are shown in Tables 1, 2. The content of Table 1 includes the individuals of the fetal goat group, the sample name labels, the number of effective sequences obtained from denoising the contents of each part of the fetal goat through 16S rRNA sequencing, and the quantity of high-quality sequences after removing chimeras. The content of Table 2 includes the individual and sample name annotations for the 7-day-old goat kid group, the number of final effective reads obtained from metagenomic sequencing of the contents from various parts of the 7-day-old goat kid, the number of contigs generated after data processing and assembly, the number of genes predicted, and the number of Genesets obtained after redundancy removal. The Hutianshi Goats were all raised under standard conditions with free access to water and feed. Six adult (720 ± 30 days) female goats (22.37 ± 4.93 kg) with similar body weights were selected at 90 ± 10 days of pregnancy. They were fasted for 12 hours before slaughter, with free access to water. At the Hutianshi Goat slaughterhouse, experienced personnel performed anesthesia and exsanguination according to commercial practices, followed by skinning. Contents from the reticulum, rumen, small intestine, and large intestine of four fetal goats were collected, totaling 13 samples, which were quickly frozen in liquid nitrogen and then stored at −80 °C in a freezer. Subsequently, under the same feeding conditions, three 7-day-old goat kids were randomly selected for slaughter and sampling. Professional personnel collected contents from the four stomachs (rumen, omasum, abomasum, and reticulum), large intestine (cecum, colon, and rectum), and small intestine (ileum and jejunum) of each goat kid, yielding a total of 27 samples, which were rapidly frozen in liquid nitrogen and transported back to the laboratory for storage at −80 °C. All experimental procedures involving goats in this study were conducted in strict compliance with the Guidelines for Ethical Review of Experimental Animal Welfare and institutional protocols. The research protocol underwent formal review and approval by the Animal Experimental Ethics Committee of Hunan Agricultural University (protocol number: HAU ACC 2022120) to ensure adherence to ethical principles. The DNA extraction was performed using the TGuide S96 magnetic bead-based soil/fecal genomic DNA extraction kit.

Table 1 The fetal goat sample classification and its sequence reading counts.
Table 2 The 7-day-old goat kid sample classification and its sequence reading counts.

DNA sequencing

For 16S rRNA sequencing, the TruSeq Nano DNA LT Library Prep Kit from Illumina was used to prepare the sequencing library, and paired-end sequencing of the community DNA fragments was performed on the Illumina MiSeq/NovaSeq platform. For metagenomic sequencing, the VAHTS™ Universal Plus DNA Library Prep Kit for Illumina was used to prepare the library. The constructed libraries were sequenced on the Illumina NovaSeq 6000 platform, with a sequencing strategy of PE150. The raw sequencing data were provided in FASTQ format, with quality control and trimming performed by the sequencing laboratory. The taxonomic identification and species abundance data of non-redundant high-quality bins are provided in attachments 1–2.

Bioinformatics analysis

We performed bioinformatics analysis on both 16S rRNA and metagenomic data. Raw data from high-throughput sequencing were first screened based on sequence quality, and problematic samples were re-sequenced or supplemented. The sequences passing the initial quality filter were sorted by index and barcode information, and the barcode sequences were removed. Sequence denoising or OTU clustering was performed according to the QIIME2 dada2 pipeline or the Vsearch software23 pipeline. Specifically, for the QIIME2 analysis (version 2019.4), the qiime cutadapt trim-paired command was used to trim primer sequences, discarding sequences that did not match primers. Then, qiime dada2 denoise-paired was used to perform quality control, denoising, joining, and chimera removal via DADA2. For bacterial or archaeal 16S rRNA genes, the Greengenes database (Release 13.8, http://greengenes.secondgenome.com/) was the default reference database24, though the Silva database (Release 132, http://www.arb-silva.de) can also be used25. The classify-sklearn algorithm in QIIME226 (https://github.com/QIIME2/q2-feature-classifier) was applied for species annotation of each ASV or representative sequence of each OTU using the pre-trained Naive Bayes classifier with default parameters.

For metagenomic data, raw reads obtained from sequencing were subjected to quality control using fastp27 and bowtie228, resulting in clean reads for subsequent bioinformatics analysis. These clean reads were assembled, and coding genes were predicted to construct a non-redundant gene set. The non-redundant gene set was then annotated for function and taxonomy using both general and specialized databases, and species composition and abundance were assessed. Metagenomic assembly was performed using MEGAHIT29, with contig sequences shorter than 300 bp being filtered out. The assembly results were evaluated using the QUAST software30. Gene prediction was performed using MetaGeneMark31 (http://exon.gatech.edu/meta_gmhmmp.cgi, Version 3.26) with default parameters to identify coding regions in the genome. Redundancy was removed from the protein sequences using MMseqs 232 (https://github.com/soedinglab/mmseqs2, Version 12-113e3), setting the protein sequence similarity threshold to 90% and the coverage threshold to 80%, thus constructing a non-redundant gene set.

Metagenomic functional annotation was carried out by aligning the sequences with general database such as NR (Non-Redundant Protein Database), GO (Gene Ontology), KEGG (Kyoto Encyclopedia of Genes and Genomes), eggNOG (Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups), Pfam (Protein Families Database), SwissProt, and special database such as CAZy (Carbohydrate-active Enzymes Database), CARD (Comprehensive Antibiotic Research Database), PHI-base (Pathogen Host Interactions Database), CYPED (The Cytochrome P450 Engineering Database), QS (Quorum Sensing), and BacMet (Antibacterial Biocide And Metal Resistance Genes Database). Table 3 provides detailed annotation information, including the number of genes annotated into each database.

Table 3 Statistics of annotation results in the function database.

BLASTP was used to analyze the NCBI non-redundant protein database. For annotation, the protein sequences of the non-redundant genes were subjected to BLAST alignment with the corresponding database (e-value set to 1e-5). The most similar sequence in the respective database was identified, and the annotation information corresponding to that sequence was used as the annotation for the sequencing genome gene. The database was constructed from the assembled metagenomic sequences of the combined samples. All reported aligned sequences were further subjected to a reverse BLASTN search in the NCBI non-redundant nucleotide database. Based on the alignment results from the non-redundant gene to the corresponding database, the corresponding annotation information was considered as the annotation for the sequencing genome genes.

Method and Cohort limitations

While this dual-sequencing study provides initial insights into microbial colonization shifts from prenatal to neonatal ruminants, several analytical constraints require acknowledgment: (1) Taxonomic resolution discordance between 16S rRNA (genus-level) and metagenomic (species-level) profiling complicates cross-developmental continuity assessments. (2) Restricted fetal cohort size (n = 3) may underpower rare taxa detection (<0.01% abundance) and inflate individual variation bias. These inherent challenges in developmental microbiome research highlight the necessity for longitudinal designs integrating synchronized multi-omics and expanded cohorts.

Data Records

The sequencing data used in this study have been deposited in the NCBI Sequence Read Archive (SRA). The assembled metagenomic data are available under the NCBI accession number PRJNA1160040, the SRA under accession number SRR30679764, SRR30679765, SRR30679766, SRR30679767, SRR30679768, SRR30679769, SRR30679770, SRR30679771, SRR30679772, SRR30679773, SRR30679774, SRR30679775, SRR30679776, SRR30679777, SRR30679778, SRR30679779, SRR30679780, SRR30679781, SRR30679782, SRR30679783, SRR30679784, SRR30679785, SRR30679786, SRR30679787, SRR30679788, SRR30679789, SRR3067979033. Additionally, the 16S rRNA sequencing data have been deposited in the NCBI SRA under the accession number PRJNA1159466 and the SRA under accession number SRR30634697, SRR30634698, SRR30634708, SRR30634719, SRR30634725, SRR30634726, SRR30634727, SRR30634728, SRR30634729, SRR30634730, SRR30634731, SRR30634741, SRR3063474234.

Technical Validation

The raw reads from 16S rRNA sequencing was first mass filtered using Trimmomatic, followed by primer sequence identification and removal using Cutadapt. Later, for the clustering method, “denoising (dada2)” was selected, and the dada2 package in R was used for further quality control, splicing of double-ended reads, and removal of chimeras. For the clustering method, we chose “similarity clustering”, and we used USEARCH to splice the double-ended reads and remove the chimeras (UCHIME, resulting in high-quality sequences for subsequent analysis. The raw reads obtained by metagenomic sequencing contain low-quality sequences, and the parameters −5 -W 50 -M 20 -l 60 -n 0 -g -A need to be used to filter the raw reads to obtain high-quality clean reads, and the parameters --seed 123456 -I 200 -X 1000 --un-conc-gz are used to align with the host genome sequence. Host contamination (https://asia.ensembl.org/Capra_hircus/Info/Index, Capra_hircus V1.0) is removed for subsequent information analysis. As shown in Fig. 1, as the amount of sequencing increases, the Observed_species index curve approaches flattening, indicating that the sequencing depth is sufficient to reflect the vast majority of microbial information in the sample.

Fig. 1
figure 1

Rarefaction Curve. (a) Rarefaction curves of fetal goat. The x-axis represents the sequencing depth, and the y-axis shows the boxplot of the Observed_species index calculated 10 times. (b) Rarefaction curves of fetal goat. The x-axis represents the amount of sequencing data randomly sampled; the y-axis represents the number of observed OTUs; this indicates that the number of OTU species measured at the same sequencing depth differs among different samples.