Abstract
Large yellow croaker (Larimichthys crocea) is a highly economically important marine fish species in China. However, substantial individual variations in growth performance have emerged as a limiting factor for the sustainable development of the large yellow croaker industry. Gut microbiota plays a crucial role in fish growth and development by regulating metabolic processes. To explore these dynamics, we employed metagenomics, transcriptomics, and untargeted metabolomics to comprehensively analyze the structure of the intestinal microbiome and its relationship with intestinal metabolism and host gene expression. We constructed association models for “gut microbiota–differentially expressed genes”, “differentially expressed genes–metabolites,” and “gut microbiota–metabolites.” Sequencing data and LC–MS/MS raw data have been deposited in NCBI and MetaboLights databases for public access. Our findings offer critical insights into the molecular mechanisms underlying growth variations in L. crocea and provide valuable data for the selective breeding of improved strains.
Similar content being viewed by others
Background & Summary
Large yellow croaker (LYC, Larimichthys crocea), one of the most economically important nearshore fishery species in China, has high economic value and is favored by consumers both domestically and internationally1. However, during farming and production, the individual growth rates of LYC raised in the same pond vary considerably, limiting the development of the LYC industry. Growth performance is a key indicator of aquaculture efficiency, and it is well-documented that fish from the same batch, raised in identical conditions, can exhibit differences in growth2,3,4. In addition, growth differences are observed between male and female fish5,6,7. Therefore, investigating the mechanisms underlying growth variations in LYC is essential for improving economic returns and breeding better strains.
Fish represent a dynamic, complex system, with the gut hosting various microbial communities. Genetic factors play a crucial role in shaping the intestinal microbiota, and in turn, the intestinal microbiota provide essential physiological functions to the host8,9. As such, intestinal flora is an indispensable “multifunctional organ” critical for host biology. Fish growth is regulated by multiple factors, including genetic inheritance, intestinal microbiota, and nutrient metabolism10,11,12,13,14. The gut plays a vital role in nutrient digestion and absorption, while the microbiota residing there produce essential nutrients, such as short-chain fatty acids, vitamins, and amino acids. Intestinal microbiota dysbiosis can alter microbial composition and diversity, promoting pathogen proliferation while depleting symbiotic bacteria, ultimately impairing the ability of the host to digest and absorb nutrients15,16,17,18,19,20,21. Maintaining the dynamic balance of intestinal flora is therefore crucial for host health22,23,24. Furthermore, intestinal microbiota influence growth by regulating host metabolism through various biochemical pathways25.
Growth performance is governed by multiple genes, and transcriptomic analysis can effectively identify genes associated with growth pathways26,27. In addition, intestinal microbes and animal hosts form an inseparable “living community,” with the microbiota influencing fish growth by modulating gene expression and nutrient metabolism. However, little remains known about how the interactions among intestinal microbiota, nutrient metabolism, and genetic factors influence LYC growth. Therefore, herein we employed an integrated multi-omics approach to comprehensively analyze intestinal tissues and contents, with the aim of elucidating the molecular mechanisms underlying growth variations in LYC (Fig. 1).
The integrated metagenomic, transcriptomic and metabolomics analyses generated high-quality datasets. The gut content metagenome yielded 81.05 Gb of raw data and 517 million clean reads28 (Table 1). Species annotation revealed that most reads were unclassified, with 0.01%–0.24% at the phylum level, 0.45%–11.87% at the genus level, and 2.5%–39.83% at the species level (Table 2). The gut transcriptome produced 87.72 Gb of raw data and 568 million clean reads29 (Table 3). In addition, metabolomics analysis identified 1,776 metabolites in the intestinal contents30.
Methods
Experimental design and sample collection
The LYC species used in this study was Fufa No.1. LYCs were obtained from the Fuding Breeding Base of the East China Sea Fisheries Research Institute. LYCs were 18 months of age and from the same parents. LYCs with similar genetic backgrounds were selected and reared under identical environmental conditions to minimize genetic divergence and interindividual variations. One hundred LYCs were randomly sampled from culture cages, and weight of each LYC were measured using electronic balance. Based on these measurements, the nine heaviest females and nine heaviest males were classified as the high-growth group (IWHM and IWHF, respectively), while the nine lightest females and nine lightest males were categorized as the slow-growth group (IWLM and IWLF, respectively). The mean weights of IWHF, IWHM, IWLF, and IWLM were 574.28 ± 37.48 g, 502.12 ± 56.22 g, 191.03 ± 18.87 g, and 189.88 ± 8.86 g, respectively (Table 1). Midgut sections were harvested from nine fish per group under aseptic conditions, and the intestinal contents of three fish per group were pooled and stored at −80 °C in 2 mL sterile centrifuge tubes for metagenome, transcriptome, and metabolome analyses (n = 3).
Sex identification
LYC fins were collected and stored in a 2 mL centrifuge tube. Genomic DNA was extracted using a DN37-Marine Animal Tissue Genomic DNA Rapid Extraction Kit (Aidlab Biotechnologies Co., Ltd., Beijing, China), and DNA quality was assessed via 1% agarose gel electrophoresis. Sex-specific single nucleotide polymorphism markers31 were used for PCR to identify sex. The following primers were used for PCR on a commercial PCR instrument (Bio-Rad): 6 F (5′-ATCTGTCAACCACTGTATCATCTG-3′), 6 R (5′-GGATGGCGTTTGGCTGAG-3′), and 6F-T (5′-CATCCCCAGACCTCCACT-3′). Primers were synthesized by Sangon Bioengineering (Shanghai) Co., LTD. The cycling conditions were as follows: initial denaturation at 94 °C for 5 min, followed by 29 cycles of denaturation at 94 °C for 30 s, annealing at 64 °C for 30 s, and extension at 72 °C for 45 s. The reaction mixture (20 μL) included 2 μL of template DNA, 0.53 μL of common (6F-1 and 6R-1) and specific (6F-T) primers, 10 μL of 2 × Hieff Robust PCR Master Mix (with dye), and 6.4 μL of ultrapure water. Amplicons were visualized by 1.5% agarose gel electrophoresis.
Metagenomic sample preparation and analysis
Approximately 200 mg intestinal contents were placed in 2 mL centrifuge tubes, and total microbial DNA was extracted using the OMEGA Mag-Bind Soil DNA Kit (M4015) (Omega Bio-Tek, Norcross, GA, USA) according to manufacturer instructions. DNA concentration was measured, and DNA quality was determined by 1% agarose gel electrophoresis. DNA samples were stored at −20 °C for further analyses. These samples were used to construct sequencing libraries with 400 bp insert sizes using the Illumina TruSeq Nano DNA LT Library Preparation Kit. Each library was sequenced on the Illumina NovaSeq platform (Illumina, USA; PE150 strategy) at Personal Biotechnology Co., Ltd. (Shanghai, China).
Raw sequencing reads were processed to obtain high-quality reads for downstream analyses. Adapters were removed using Cutadapt (v1.2.1)32, and low-quality reads were trimmed with a sliding-window algorithm in fastp33. Next, reads were aligned to the LYC genome using BMTagger to remove host contamination. Quality-filtered reads were then taxonomically classified using Kraken234 with a RefSeq-derived database that included genomes from archaea, bacteria, viruses, fungi, protozoans, metazoans, and Viridiplantae. Reads assigned to metazoans were excluded from downstream analysis. Contigs were assembled using Megahit v1.1.235 with meta-large preset parameters. These contigs (longer than 300 bp) were subsequently pooled and clustered with MMseqs236 in the “easy-linclust” mode, with a sequence identity threshold of 0.95 and coverage of 90% of the shorter contig. The lowest common ancestor taxonomy of non-redundant contigs was assigned by aligning them against the NCBI-nt database with MMseqs2 in the “taxonomy” mode, Contigs assigned to Viridiplantae or Metazoa were excluded from further analyses. Gene prediction within the contigs was achieved using MetaGeneMark v3.2537. Coding DNA sequences were clustered with MMseqs2 in the “easy-cluster” mode, with a protein sequence identity threshold of 0.90 and coverage of 90% of the shorter contig. Gene abundances were calculated by mapping high-quality reads from each sample to the predicted gene sequences using salmon in the quasi-mapping-based mode with “−meta −minScoreFraction = 0.55”, and abundance in metagenomes was normalized using copy per kilobase per million mapped reads. Functional annotation of non-redundant genes was performed with MMseqs2 in the “search” mode against the KEGG, eggNOG, and CAZy protein databases.
Transcriptome sample preparation and analysis
Total RNA was isolated using TRIzol (Invitrogen Life Technologies), and RNA concentration, quality, and integrity were determined using a NanoDrop spectrophotometer (Thermo Scientific). Three micrograms of RNA were used as input material for RNA sample preparation, after which sequencing libraries were established. The products were purified using the AMPure XP system and quantified using the Agilent High Sensitivity DNA Assay on Bioanalyzer 2100 (Agilent). Sequencing was then performed on the NovaSeq 6000 platform (Illumina) by Shanghai Personal Biotechnology Co. Ltd. The sequencing platform converted image files into FASTQ format (raw data), and sequencing data were processed using fastp v0.22.0 to filter out low-quality reads, producing high-quality sequences (clean data) for further analyses. The reference genome and gene annotation files were obtained from public genome databases. Filtered reads were aligned to the reference genome using HISAT2 v2.1.0. Gene expression was quantified using HTSeq v0.9.1 to calculate read counts for each gene, and expression was normalized with fragments per kilobase per million mapped reads. Differential expression between two comparison groups was analyzed using DESeq v1.38.3, with thresholds set at |log2FoldChange| > 1 and P < 0.05.
Metabolome sample preparation and analysis
Untargeted LC–MS/MS metabolomics was employed to identify gut metabolites. Intestinal samples were thawed at 4 °C, and an appropriate amount was added to a precooled methanol/acetonitrile/water solution (2:2:1, v/v), followed by vortexing, sonication at low temperature for 30 min, incubation at −20 °C for 10 min, and centrifugation at 14000 g for 20 min at 4 °C. The supernatant thus obtained was dried under vacuum, reconstituted in 100 μL of aqueous acetonitrile solution (1:1 acetonitrile/water, v/v) for mass spectrometry, vortexed, centrifuged at 14000 g for 15 min at 4 °C, and injected for analysis. Ultrahigh-performance liquid chromatography (Vanquish UHPLC; Thermo) coupled with Orbitrap was used for analysis. Hydrophilic interaction liquid chromatography was performed using a 2.1 mm × 100 mm ACQUITY UPLC BEH Amide 1.7 μm column (Waters, Ireland). Electrospray ionization in positive and negative ion modes was used for detection. Raw mass spectrometry data were converted into mzXML files using ProteoWizard v3.0.8789 MSConvert and imported into the freely accessible XCMS v3.12.0. Peak selection parameters included the following: centWave m/z = 10 ppm, peakwidth = c (10, 60), and prefilter = c (10, 100). Peak grouping parameters included the following: bw = 5, mzwid = 0.025, and minfrac = 0.5. Isotope and adduct annotation were performed using Collection of Algorithms of MEtabolite pRofile Annotation. Variables with >50% nonzero measurement values in at least one group were retained for further analysis. Metabolites were identified by comparing accurate m/z values (<10 ppm) and MS/MS spectra against in-house databases and authentic standards.
Data Records
Raw metagenomic and transcriptomic data are available through the NCBI Sequence Read Archive (SRA). The BioProject ID for metagenomics data is PRJNA1110616 and the accession number is SUB14429435 (https://identifiers.org/ncbi/insdc.sra:SRP507162). The BioProject ID for transcriptomic data is PRJNA1115810 and the accession number is SUB14468536 (https://identifiers.org/ncbi/insdc.sra:SRP509549). Raw metabolomics data were uploaded to MetaboLights under the study ID MTBLS10000 (http://www.ebi.ac.uk/metabolights/MTBLS10000). All datasets are publicly available.
Technical Validation
We performed quality control on metagenomic, transcriptomic, and metabolomic data. For metagenomic data, raw sequencing data were filtered to remove low-quality sequences, short reads, sequences containing ambiguous bases, and those containing adapters. Such sequences can significantly interfere with downstream analyses. Therefore, raw data underwent rigorous screening and filtering processes.
We employed fastp v.0.23.2 to identify potential adapter sequences at the 3′-end (rare cases of pass detection occurred) and truncate sequences accordingly. A minimum match length of 3 bp was required for adapter detection, allowing a base mismatch rate of up to 20%. After adapter trimming, sequence quality was screened using the sliding-window approach. The window size was set to 5 bp, sliding from the first base at the 5′-end. The average base quality within the window had to be at least Q20 (where Q20 represents a base accuracy of >99%), and sequences were truncated from the base at the 3′-end of the first window with an average quality value lower than Q20. Sequences shorter than 50 bp were removed, in addition to those containing ambiguous bases.
To evaluate data quality before and after filtering, fastp was used to determine sequencing data quality for both raw and filtered data. The read retention rate was >95% (Table 1), indicating the completeness and reliability of the metagenomic dataset.
For transcriptomic analysis, a series of quality control steps were performed to ensure data accuracy and reliability, including the removal of adapter sequences, elimination of low-quality regions at the beginning of each read, and exclusion of reads containing indeterminate bases. RNA-seq data exhibited a high mapping rate, ranging from 89.67% to 93.29% (Table 3). These statistics confirmed that high-quality RNA-seq reads were obtained for downstream analysis.
To assess the reproducibility and quality of untargeted metabolomics data, Pearson correlation analysis was performed on quality control samples. The correlation coefficients for both positive and negative ion modes exceeded 0.99 (Fig. 2), indicating a high degree of similarity in expression patterns between samples. This analysis confirmed the high quality of the metabolomic dataset, further validating data completeness and reliability.
Code availability
All software versions used for data processing have been precisely described in the manuscript to ensure repeatability.
References
Chen, Y. et al. Growth characteristics of cage-cultured large yellow croaker Larimichthys crocea. Aquaculture Reports 16, 100242 (2020).
Wei, Z. et al. Dietary hydroxyproline improves the growth and muscle quality of large yellow croaker Larimichthys crocea. Aquaculture 464, 497–504 (2016).
Zhang, Y. et al. Differences of gut microbiota and lipid metabolism in Chinese perch (Siniperca chuatsi) with different growth rates. Aquaculture Research 53, 1766–1781 (2022).
Zhu, F., Sun, H., Jiang, L., Zhang, Q. & Liu, J. Genome-wide association study for growth-related traits in golden pompano (Trachinotus ovatus). Aquaculture 572, 739549 (2023).
Su, X., Ji, D., Yao, J., Zou, Y. & Yan, M. Comparative Analysis of Intestinal Characteristics of Largemouth Bass (Micropterus salmoides) and Intestinal Flora with Different Growth Rates. Fishes 7, 65 (2022).
Ma, R. et al. Bacterial Flora in the Gill Tissues and Intestinal Tracts of Male and Female Chinese Mitten Crabs (Eriocheir sinensis) with Different Diets in a Mud Pond. Current Microbiology 78, 2291–2297 (2021).
Zhang, B.-Y. et al. Comparative study on growth, digestive function and intestinal microbial composition of female Chinese mitten crab Eriocheir sinensis selected at different growth stages in rice-crab culture systems. Aquaculture 554, 738120 (2022).
Huang, Q. et al. Diversity of gut microbiomes in marine fishes is shaped by host-related factors. Molecular Ecology 29, 5019–5034 (2020).
Gauzens, B. et al. Quantitative description of six fish species’ gut contents and prey abundances in the Baltic Sea (1968–1978). Scientific Data 11, 236, 03075–9 (2024).
Lee, Y. M. et al. Microbiota control of maternal behavior regulates early postnatal growth of offspring. Science Advances 7, eabe6563 (2021).
Liu, Y. et al. Gut microbial and metabolic characterization of Atlantic salmon (Salmon salar) challenged with Aeromonas salmonicida. Aquaculture 570, 739420 (2023).
Naya-Català, F., Simó-Mirabet, P., Calduch-Giner, J. & Pérez-Sánchez, J. Transcriptomic profiling of Gh/Igf system reveals a prompted tissue-specific differentiation and novel hypoxia responsive genes in gilthead sea bream. Scientific Reports 11, 1–13 (2021).
Mitev, K. & Taleski, V. Association between the gut microbiota and obesity. Open access Macedonian journal of medical sciences 7, 2050 (2019).
Beckman, B. R. J. G. & Endocrinology, C. Perspectives on concordant and discordant relations between insulin-like growth factor 1 (IGF1) and growth in fishes. General and Comparative Endocrinology 170, 233–252 (2011).
Yin, Z. et al. Early life intervention using probiotic Clostridium butyricum improves intestinal development, immune response, and gut microbiota in large yellow croaker (Larimichthys crocea) larvae. Frontiers in immunology 12, 640767.
Diwan, A. D., Harke, S. N., Gopalkrishna & Panche, A. N. Aquaculture industry prospective from gut microbiome of fish and shellfish: An overview. Journal of Animal Physiology and Animal Nutrition 106, 441–469 (2022).
Jahan, N., Islam, S. M. M., Rohani, M. F., Hossain, M. T. & Shahjahan, M. Probiotic yeast enhances growth performance of rohu (Labeo rohita) through upgrading hematology, and intestinal microbiota and morphology. Aquaculture 545, 737243 (2021).
Verdegem, M. C. J. & Schrama, J. W. Effect of dietary NSP level and bile acid supplementation on nutrient digestibility and the bile acid metabolism in rainbow trout (Oncorhynchus mykiss). Aquaculture 561, 738724 (2022).
Pirr, S. & Viemann, D. Host Factors of Favorable Intestinal Microbial Colonization. Frontiers in Immunology 11, 584288 (2020).
Medina‐Félix, D., Garibay‐Valdez, E., Vargas‐Albores, F. & Martínez‐Porchas, M. Fish disease and intestinal microbiota: A close and indivisible relationship. Reviews in Aquaculture 15, 820–839 (2023).
Deng, Y. et al. Gut–Liver Immune Response and Gut Microbiota Profiling Reveal the Pathogenic Mechanisms of Vibrio harveyi in Pearl Gentian Grouper (Epinephelus lanceolatus♂ × E. fuscoguttatus♀). Frontiers in Immunology 11, 607754 (2020).
Tran, N. T. et al. Application of heat-killed probiotics in aquaculture. Aquaculture 548, 737700 (2022).
Angthong, P., Uengwetwanit, T., Arayamethakorn, S., Chaitongsakul, P. & Rungrassamee, W. Bacterial analysis in the early developmental stages of the black tiger shrimp (Penaeus monodon). Scientific Reports 10, 4896 (2020).
Li, X., Yu, Y., Li, C. & Yan, Q. Comparative study on the gut microbiotas of four economically important Asian carp species. Science China-Life Sciences 61, 696–705 (2018).
Ridlon, J. M. & Gaskins, H. R. Another renaissance for bile acid gastrointestinal microbiology. Nature Reviews Gastroenterology & Hepatology 21, 1–17 (2024).
Wang, T. et al. Transcriptome Analysis Reveals Differences in Gene Expression in the Muscle of the Brown-Marbled Grouper (Epinephelus fuscoguttatus) with Different Growth Rates. Fishes 8, 309 (2023).
Lin, Z. et al. Comparative transcriptome analysis of mixed tissues of black porgy (Acanthopagrus schlegelii) with differing growth rates. Aquaculture Research 52, 5800–5813 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP507162 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP509549 (2024).
Huang, H. MetaboLights MTBLS10000 http://www.ebi.ac.uk/metabolights/MTBLS10000 (2024).
Lin, X. Y., Xiao, S. J., Li, W. B. & Wang, Z. Y. Development and validation of sex-specific SNP markers in Larimichthys crocea. Journal of fisheries of China 42, 1329–1337 (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. Embnet Journal 17 (2011).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome biology 20, 1–13 (2019).
Dinghua, L., Chi-Man, L., Ruibang, L., Kunihiko, S. & Tak-Wah, L. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Steinegger, M. & Johannes, S. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechno 11 (2017).
Wenhan, Z., Alexandre, L. & Mark, B. Ab initio gene identification in metagenomic sequences. Nucleic Acids Research 38, e132 (2010).
Acknowledgements
This work was supported by the National Key Research and Development Program of China [grant number 2022YFD2401004] and the Central Nonprofit Basic Scientific Research Project for the Scientific Research Institutes of China [grant numbers 2023XT05].
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Huang, H., Cheng, Z., Wang, Y. et al. Multi-omics dataset of individual variations in growth performance of large yellow croaker. Sci Data 12, 90 (2025). https://doi.org/10.1038/s41597-025-04429-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-04429-7