Background & Summary

Large yellow croaker (LYC, Larimichthys crocea), one of the most economically important nearshore fishery species in China, has high economic value and is favored by consumers both domestically and internationally1. However, during farming and production, the individual growth rates of LYC raised in the same pond vary considerably, limiting the development of the LYC industry. Growth performance is a key indicator of aquaculture efficiency, and it is well-documented that fish from the same batch, raised in identical conditions, can exhibit differences in growth2,3,4. In addition, growth differences are observed between male and female fish5,6,7. Therefore, investigating the mechanisms underlying growth variations in LYC is essential for improving economic returns and breeding better strains.

Fish represent a dynamic, complex system, with the gut hosting various microbial communities. Genetic factors play a crucial role in shaping the intestinal microbiota, and in turn, the intestinal microbiota provide essential physiological functions to the host8,9. As such, intestinal flora is an indispensable “multifunctional organ” critical for host biology. Fish growth is regulated by multiple factors, including genetic inheritance, intestinal microbiota, and nutrient metabolism10,11,12,13,14. The gut plays a vital role in nutrient digestion and absorption, while the microbiota residing there produce essential nutrients, such as short-chain fatty acids, vitamins, and amino acids. Intestinal microbiota dysbiosis can alter microbial composition and diversity, promoting pathogen proliferation while depleting symbiotic bacteria, ultimately impairing the ability of the host to digest and absorb nutrients15,16,17,18,19,20,21. Maintaining the dynamic balance of intestinal flora is therefore crucial for host health22,23,24. Furthermore, intestinal microbiota influence growth by regulating host metabolism through various biochemical pathways25.

Growth performance is governed by multiple genes, and transcriptomic analysis can effectively identify genes associated with growth pathways26,27. In addition, intestinal microbes and animal hosts form an inseparable “living community,” with the microbiota influencing fish growth by modulating gene expression and nutrient metabolism. However, little remains known about how the interactions among intestinal microbiota, nutrient metabolism, and genetic factors influence LYC growth. Therefore, herein we employed an integrated multi-omics approach to comprehensively analyze intestinal tissues and contents, with the aim of elucidating the molecular mechanisms underlying growth variations in LYC (Fig. 1).

Fig. 1
figure 1

Workflow for intestinal sample collection and data processing.

The integrated metagenomic, transcriptomic and metabolomics analyses generated high-quality datasets. The gut content metagenome yielded 81.05 Gb of raw data and 517 million clean reads28 (Table 1). Species annotation revealed that most reads were unclassified, with 0.01%–0.24% at the phylum level, 0.45%–11.87% at the genus level, and 2.5%–39.83% at the species level (Table 2). The gut transcriptome produced 87.72 Gb of raw data and 568 million clean reads29 (Table 3). In addition, metabolomics analysis identified 1,776 metabolites in the intestinal contents30.

Table 1 Metagenomic sequencing of intestinal contents.
Table 2 Metagenomic classification of intestinal contents of LYC, showing the relative abundance (%) of each taxon based on species annotations.
Table 3 Transcriptomic sequencing data of intestinal samples.

Methods

Experimental design and sample collection

The LYC species used in this study was Fufa No.1. LYCs were obtained from the Fuding Breeding Base of the East China Sea Fisheries Research Institute. LYCs were 18 months of age and from the same parents. LYCs with similar genetic backgrounds were selected and reared under identical environmental conditions to minimize genetic divergence and interindividual variations. One hundred LYCs were randomly sampled from culture cages, and weight of each LYC were measured using electronic balance. Based on these measurements, the nine heaviest females and nine heaviest males were classified as the high-growth group (IWHM and IWHF, respectively), while the nine lightest females and nine lightest males were categorized as the slow-growth group (IWLM and IWLF, respectively). The mean weights of IWHF, IWHM, IWLF, and IWLM were 574.28 ± 37.48 g, 502.12 ± 56.22 g, 191.03 ± 18.87 g, and 189.88 ± 8.86 g, respectively (Table 1). Midgut sections were harvested from nine fish per group under aseptic conditions, and the intestinal contents of three fish per group were pooled and stored at −80 °C in 2 mL sterile centrifuge tubes for metagenome, transcriptome, and metabolome analyses (n = 3).

Sex identification

LYC fins were collected and stored in a 2 mL centrifuge tube. Genomic DNA was extracted using a DN37-Marine Animal Tissue Genomic DNA Rapid Extraction Kit (Aidlab Biotechnologies Co., Ltd., Beijing, China), and DNA quality was assessed via 1% agarose gel electrophoresis. Sex-specific single nucleotide polymorphism markers31 were used for PCR to identify sex. The following primers were used for PCR on a commercial PCR instrument (Bio-Rad): 6 F (5′-ATCTGTCAACCACTGTATCATCTG-3′), 6 R (5′-GGATGGCGTTTGGCTGAG-3′), and 6F-T (5′-CATCCCCAGACCTCCACT-3′). Primers were synthesized by Sangon Bioengineering (Shanghai) Co., LTD. The cycling conditions were as follows: initial denaturation at 94 °C for 5 min, followed by 29 cycles of denaturation at 94 °C for 30 s, annealing at 64 °C for 30 s, and extension at 72 °C for 45 s. The reaction mixture (20 μL) included 2 μL of template DNA, 0.53 μL of common (6F-1 and 6R-1) and specific (6F-T) primers, 10 μL of 2 × Hieff Robust PCR Master Mix (with dye), and 6.4 μL of ultrapure water. Amplicons were visualized by 1.5% agarose gel electrophoresis.

Metagenomic sample preparation and analysis

Approximately 200 mg intestinal contents were placed in 2 mL centrifuge tubes, and total microbial DNA was extracted using the OMEGA Mag-Bind Soil DNA Kit (M4015) (Omega Bio-Tek, Norcross, GA, USA) according to manufacturer instructions. DNA concentration was measured, and DNA quality was determined by 1% agarose gel electrophoresis. DNA samples were stored at −20 °C for further analyses. These samples were used to construct sequencing libraries with 400 bp insert sizes using the Illumina TruSeq Nano DNA LT Library Preparation Kit. Each library was sequenced on the Illumina NovaSeq platform (Illumina, USA; PE150 strategy) at Personal Biotechnology Co., Ltd. (Shanghai, China).

Raw sequencing reads were processed to obtain high-quality reads for downstream analyses. Adapters were removed using Cutadapt (v1.2.1)32, and low-quality reads were trimmed with a sliding-window algorithm in fastp33. Next, reads were aligned to the LYC genome using BMTagger to remove host contamination. Quality-filtered reads were then taxonomically classified using Kraken234 with a RefSeq-derived database that included genomes from archaea, bacteria, viruses, fungi, protozoans, metazoans, and Viridiplantae. Reads assigned to metazoans were excluded from downstream analysis. Contigs were assembled using Megahit v1.1.235 with meta-large preset parameters. These contigs (longer than 300 bp) were subsequently pooled and clustered with MMseqs236 in the “easy-linclust” mode, with a sequence identity threshold of 0.95 and coverage of 90% of the shorter contig. The lowest common ancestor taxonomy of non-redundant contigs was assigned by aligning them against the NCBI-nt database with MMseqs2 in the “taxonomy” mode, Contigs assigned to Viridiplantae or Metazoa were excluded from further analyses. Gene prediction within the contigs was achieved using MetaGeneMark v3.2537. Coding DNA sequences were clustered with MMseqs2 in the “easy-cluster” mode, with a protein sequence identity threshold of 0.90 and coverage of 90% of the shorter contig. Gene abundances were calculated by mapping high-quality reads from each sample to the predicted gene sequences using salmon in the quasi-mapping-based mode with “−meta −minScoreFraction = 0.55”, and abundance in metagenomes was normalized using copy per kilobase per million mapped reads. Functional annotation of non-redundant genes was performed with MMseqs2 in the “search” mode against the KEGG, eggNOG, and CAZy protein databases.

Transcriptome sample preparation and analysis

Total RNA was isolated using TRIzol (Invitrogen Life Technologies), and RNA concentration, quality, and integrity were determined using a NanoDrop spectrophotometer (Thermo Scientific). Three micrograms of RNA were used as input material for RNA sample preparation, after which sequencing libraries were established. The products were purified using the AMPure XP system and quantified using the Agilent High Sensitivity DNA Assay on Bioanalyzer 2100 (Agilent). Sequencing was then performed on the NovaSeq 6000 platform (Illumina) by Shanghai Personal Biotechnology Co. Ltd. The sequencing platform converted image files into FASTQ format (raw data), and sequencing data were processed using fastp v0.22.0 to filter out low-quality reads, producing high-quality sequences (clean data) for further analyses. The reference genome and gene annotation files were obtained from public genome databases. Filtered reads were aligned to the reference genome using HISAT2 v2.1.0. Gene expression was quantified using HTSeq v0.9.1 to calculate read counts for each gene, and expression was normalized with fragments per kilobase per million mapped reads. Differential expression between two comparison groups was analyzed using DESeq v1.38.3, with thresholds set at |log2FoldChange| > 1 and P < 0.05.

Metabolome sample preparation and analysis

Untargeted LC–MS/MS metabolomics was employed to identify gut metabolites. Intestinal samples were thawed at 4 °C, and an appropriate amount was added to a precooled methanol/acetonitrile/water solution (2:2:1, v/v), followed by vortexing, sonication at low temperature for 30 min, incubation at −20 °C for 10 min, and centrifugation at 14000 g for 20 min at 4 °C. The supernatant thus obtained was dried under vacuum, reconstituted in 100 μL of aqueous acetonitrile solution (1:1 acetonitrile/water, v/v) for mass spectrometry, vortexed, centrifuged at 14000 g for 15 min at 4 °C, and injected for analysis. Ultrahigh-performance liquid chromatography (Vanquish UHPLC; Thermo) coupled with Orbitrap was used for analysis. Hydrophilic interaction liquid chromatography was performed using a 2.1 mm × 100 mm ACQUITY UPLC BEH Amide 1.7 μm column (Waters, Ireland). Electrospray ionization in positive and negative ion modes was used for detection. Raw mass spectrometry data were converted into mzXML files using ProteoWizard v3.0.8789 MSConvert and imported into the freely accessible XCMS v3.12.0. Peak selection parameters included the following: centWave m/z = 10 ppm, peakwidth = c (10, 60), and prefilter = c (10, 100). Peak grouping parameters included the following: bw = 5, mzwid = 0.025, and minfrac = 0.5. Isotope and adduct annotation were performed using Collection of Algorithms of MEtabolite pRofile Annotation. Variables with >50% nonzero measurement values in at least one group were retained for further analysis. Metabolites were identified by comparing accurate m/z values (<10 ppm) and MS/MS spectra against in-house databases and authentic standards.

Data Records

Raw metagenomic and transcriptomic data are available through the NCBI Sequence Read Archive (SRA). The BioProject ID for metagenomics data is PRJNA1110616 and the accession number is SUB14429435 (https://identifiers.org/ncbi/insdc.sra:SRP507162). The BioProject ID for transcriptomic data is PRJNA1115810 and the accession number is SUB14468536 (https://identifiers.org/ncbi/insdc.sra:SRP509549). Raw metabolomics data were uploaded to MetaboLights under the study ID MTBLS10000 (http://www.ebi.ac.uk/metabolights/MTBLS10000). All datasets are publicly available.

Technical Validation

We performed quality control on metagenomic, transcriptomic, and metabolomic data. For metagenomic data, raw sequencing data were filtered to remove low-quality sequences, short reads, sequences containing ambiguous bases, and those containing adapters. Such sequences can significantly interfere with downstream analyses. Therefore, raw data underwent rigorous screening and filtering processes.

We employed fastp v.0.23.2 to identify potential adapter sequences at the 3′-end (rare cases of pass detection occurred) and truncate sequences accordingly. A minimum match length of 3 bp was required for adapter detection, allowing a base mismatch rate of up to 20%. After adapter trimming, sequence quality was screened using the sliding-window approach. The window size was set to 5 bp, sliding from the first base at the 5′-end. The average base quality within the window had to be at least Q20 (where Q20 represents a base accuracy of >99%), and sequences were truncated from the base at the 3′-end of the first window with an average quality value lower than Q20. Sequences shorter than 50 bp were removed, in addition to those containing ambiguous bases.

To evaluate data quality before and after filtering, fastp was used to determine sequencing data quality for both raw and filtered data. The read retention rate was >95% (Table 1), indicating the completeness and reliability of the metagenomic dataset.

For transcriptomic analysis, a series of quality control steps were performed to ensure data accuracy and reliability, including the removal of adapter sequences, elimination of low-quality regions at the beginning of each read, and exclusion of reads containing indeterminate bases. RNA-seq data exhibited a high mapping rate, ranging from 89.67% to 93.29% (Table 3). These statistics confirmed that high-quality RNA-seq reads were obtained for downstream analysis.

To assess the reproducibility and quality of untargeted metabolomics data, Pearson correlation analysis was performed on quality control samples. The correlation coefficients for both positive and negative ion modes exceeded 0.99 (Fig. 2), indicating a high degree of similarity in expression patterns between samples. This analysis confirmed the high quality of the metabolomic dataset, further validating data completeness and reliability.

Fig. 2
figure 2

Pearson correlation analysis. (a) Positive and (b) negative ion modes.