Introduction

The deterioration in the quality of refrigerated pork is primarily due to the proliferation of microorganism1. In particular, microbial contamination that occurs during slaughtering processes can lead to deterioration of freshness. If microbes in meat grow to high levels and cause odors, exudates, and slime, the meat is considered unsuitable for human consumption (EU Regulation (EC) No 178/2002, Article 14). In relation to the EU’s regulation, if contamination at the production site exceeds 5 × 106 CFU/g (total bacterial count), improvements in hygiene practices during production are required (European Commission Regulation (EC) No 2073/2005). Typically, the inspection of microbial count has been used as a plate counting method2, which is a method to destroy food materials and cause economic losses3. Additionally, the economic losses make it challenging to test many samples4. Thus, it is crucial to develop a non-destructive method to predict pork freshness, ensuring consumer safety and maintaining trust in the meat industry.

The traditional methods for assessing meat freshness include sensory evaluation, total microbial content analysis, and the measurement of volatile basic nitrogen (VBN) levels5. However, these conventional approaches often face limitations, such as slow processing times and the requirement for sample destruction, which limit their practicality6,7. Recently, innovative non-destructive technologies have been developed for monitoring meat spoilage6,7,8,9. Spectroscopic methods utilize information from overtones and O–H, C–H, and N–H bonds to measure VBN and TBC6,8,9. Smart films detect spoilage through color changes in response to volatile compounds such as CO2, H2S, and volatile nitrogen emitted during the process of meat spoilage7. These innovations enhance food safety and quality assurance.

Pork contains about 70% water, which is found within muscle fibers, between muscle fibers, cell membranes, muscle cells, and muscle bundles10. In meat, water is classified into bound water, immobilized water, and free water5. Both immobilized and free water can exude to the surface during storage due to structural changes in the meat5. These structural changes are influenced by the contamination and proliferation of spoilage microbes11. The exuded water is referred to as drip5. It was reported that drip can represent the metabolites of meat5, and that meat quality can be predicted through drip analysis12. Thus, the drip can be used to non-destructively assess the condition of pork, as they are a result of the activity of spoilage microbial.

Metabolomics, the study of metabolites produced by intracellular biochemical reactions, has been used in food science related to food quality and safety4. The metabolites in meat can be altered by the growth of spoilage microbes13. Kim et al.4 applied metabolites to a classification model to categorize meat spoilage. Additionally, metabolites have been used to distinguish between fresh and frozen meat using selected biomarkers through multivariate models14. Therefore, by utilizing an appropriate model, the metabolites in drip can be analyzed to effectively predict meat spoilage and microorganisms.

When modeling with metabolomics data, it is essential to select and reduce the number of variables15. Because a high correlation among variables can lead to multicollinearity, which may cause errors in the estimation of model coefficients and lead to overfitting issues15. Additionally, reducing the number of variables makes model training more efficient and the resulting model easier to interpret16. To reduce the number of variables, various variable selection models are used16,17. For instance, the least absolute shrinkage and selection operator (LASSO) applies L1 regularization to drive the coefficients of less important variables to zero, automatically performing variable selection and dimension reduction, thus preventing model overfitting and reducing complexity18. On the other hand, Random Forest (RF) is an ensemble learning model where each decision tree independently learns from bootstrap samples and randomly selects variables during tree construction17. This randomness increases the diversity of the model overall, preventing overfitting and reducing variance17. Modeling based on the selected variables is also very important16. Ridge regression uses L2 regularization to adjust the coefficients of all variables, ensuring model stability and helping prevent overfitting16. Meanwhile, Support Vector Regression (SVR) utilizes the kernel trick to effectively capture patterns in the data and solve the optimization problem posed, modeling complex relationships19. In this process, SVR aims to include as many data points as possible within the set margin, contributing to the model’s generalization performance19. Thus, by using variables chosen by LASSO and RF, Ridge regression and SVR effectively leverage the strengths of each technique to predict complex biological information from metabolomics data.

Therefore, this study aimed to predict the freshness of pork using metabolites from the drip and predictive models. Additionally, it intended to estimate the proportion of microorganisms contaminating the pork.

Results and discussion

Changes in pork quality characteristics during storage

During the storage period, there was no significant difference in the pH of the pork. The pH was observed to be 5.91 on the first day of storage and decreased to 5.62 by day 27 (Fig. 1a). The pork producer’s council has reported that the ideal pH range for pork is between 5.6 and 5.920. The pH of pork is significantly influenced by microbial composition21,22. An increase in pH during storage is linked to the predominance of spoilage microorganisms like Brochothrix thermosphacta, Shewanella putrefaciens, Pseudomonas spp22, while lactic acid bacteria are associated with pH decreases22. Thus, pH changes in pork during storage are likely driven by microbial composition.

Fig. 1: The illustration of workflow for variable selection and modeling to predict total microbial and microbial composition in pork using drip.
Fig. 1: The illustration of workflow for variable selection and modeling to predict total microbial and microbial composition in pork using drip.
Full size image

LOOVC leave-one-out cross-validation, RMSE root mean square error.

The drip loss in pork significantly increased from a 2.12% initially to 11.41% by 27 days of storage (Fig. 1b). The increase in drip is influenced by the breakdown of muscle tissue11, and Chen et al.23 reported that the destruction of muscle cell structure in pork affects drip loss. Thus, the increased drip loss might be affected by changes in the muscle tissues over the storage period.

TAB significantly increased from 3.82 log CFU/g on the first day of storage and increased to 7.53 log CFU/g by day 20 (Fig. 2a). It was reported that enzymes secreted by microbes influence changes in the structural tissues of meat24, which might have affected the drip loss in pork. To identify the contaminated microbes in pork, microbial compositions were analyzed using 16S rRNA sequencing on days 4, 8, and 20 (Fig. 2b). It was observed that the proportion of Enterobacteriaceae, Lactobacillaceae, and Leuconostoc increased in all three carcasses over the storage period. Microbes contaminate the pork at the time of slaughter25, and their composition can change depending on storage conditions1. The increase in microbial composition seems influenced by facultative anaerobic conditions in vacuum storage, likely affecting quality factors like pH (Fig. 3).

Fig. 2: Quality changes in pork during storage period.
Fig. 2: Quality changes in pork during storage period.
Full size image

a pH. b Drip loss. Different letters (a–d) indicate significant differences (p < 0.05).

Fig. 3: Microbial changes in pork during storage.
Fig. 3: Microbial changes in pork during storage.
Full size image

a Total aerobic bacterial (TAB) counts. b Microbial composition changes across different carcass repetitions (P1, P2, and P3).

During the storage period, a total of 63 metabolites were qualitatively analyzed in the drip using 1H-13C HSQC 2D NMR (Table S1). In overall, 30 types of carboxylic acids and derivatives, 12 types of organooxygen compounds, 7 types of purine nucleotides, 4 types of organonitrogen compounds, and 10 types of other compounds were identified. The metabolites were quantitatively analyzed using 1H zg30 NMR (Table S2). The metabolites that significantly increased over the storage period were acetate, agmatine, alanine, asparagine, aspartate, cadaverine, ethanol, fucose, glutamate, glutamine, inosine, isoleucine, leucine, lysine, malate, methionine, phenylalanine, proline, serine, threonine, tyramine, tyrosine, uridine, valine, and phosphocholine. The increase in microbes during the storage period can enhance the protein degradation process, which increases the amino acids3. In particular, biogenic amines such as cadaverine and tyramine can be produced by Lactobacillaceae like Leuconostoc26. Additionally, amino, carboxyl, and hydroxyl groups, which are polar, can easily mix into the drip and be exuded27. Thus, the metabolite changes in the drip could be influenced by the TAB. Therefore, a model of the TAB was developed using the drip metabolites to predict meat freshness.

Selection of metabolites for modeling using mathematical methods and machine learning

Metabolites for TAB modeling were selected using LASSO regression and RF. LASSO regression was applied to the entire set of metabolites for variable selection optimization through cross-validation, and the model was then rerun with the selected metabolites18. The metabolites selected by LASSO include ADP, carnosine, creatine, ethanol, fumarate, glutamine, glycerol, hypoxanthine, IMP, malate, O-acetylcarnitine, tyrosine, and uracil. RF selected variables that showed an importance score of 10 or higher after running the seed 10 times28. The metabolites chosen by RF are acetate, adenosine, alanine, asparagine, aspartate, cadaverine, hypoxanthine, IMP, isoleucine, and tyramine. Each of selected metabolites by different method was used for modeling.

To assess the effectiveness of feature selection in reducing multicollinearity, the variance inflation factor (VIF) was calculated for all metabolites before and after selection (Table S3). Prior to selection, all metabolites exhibited extreme multicollinearity (VIF = ∞). However, after LASSO selection, the VIF values for the selected metabolites were significantly reduced, with ADP (9.960), ethanol (3.411), fumarate (4.693), glycerol (4.048), hypoxanthine (9.413), IMP (6.154), malate (7.505), O-acetylcarnitine (2.626), tyrosine (1.833), uracil (5.400) showing improved collinearity. Similarly, RF selection effectively minimized multicollinearity, reducing VIF values for acetate (3.352), adenosine (1.310), cadaverine (2.528), hypoxanthine (3.267), IMP (2.674), tyramine (4.617). These results indicate that both LASSO and RF contributed to eliminating redundant variables while preserving essential predictors for microbial composition modeling. Each set of selected metabolites was subsequently used for modeling to predict TAB.

Modeling using mathematical method and machine learning for TAB

Metabolites selected by each method were modeled using the mathematical method of Ridge regression and the machine learning method of SVR. When the metabolites, selected by LASSO regression, were applied to Ridge regression (Eq. (1)) and SVR (Eq. (2)), the R2 values were 0.968 and 0.970, respectively. Using the metabolites selected by RF, Ridge regression (Eq. (3)) and SVR (Eq. (4)) models were constructed, and the R² values were 0.847 and 0.876, respectively. The developed models were validated using RMSE values obtained through LOOCV (Fig. 4). The RMSE values were 0.283, 0.387, 0.598, and 0.519 for Eq. (1) through Eq. (4), respectively. Previous studies have demonstrated that using LASSO and Ridge regression offers significant advantages in predictive modeling. For instance, Wang et al.29 showed that the LASSO-logistic regression model provided superior predictive accuracy for identifying neurotoxicity biomarkers. Similarly, Zhang et al.30 found that LASSO regression outperformed other models in predicting neonatal sepsis, achieving the highest accuracy. Furthermore, Krstic et al.31 reported that LASSO excelled in variable selection, while Ridge regression stabilized models by effectively handling multicollinearity. Therefore, the mathematical model (LASSO regression) that selected the metabolites and modeled them using Ridge regression demonstrated the best performance (Eq. (1)), as it had the high R² and the lowest RMSE values.

$$\begin{array}{c}{\rm{y}}=6.2400+0.0462\left({\rm{ADP}}\right)-0.2203\left({\rm{Carnosine}}\right)-0.2249\left({\rm{Creatine}}\right)+\\ \begin{array}{c}0.0916\left({\rm{Ethanol}}\right)+0.1236\left({\rm{Fumarate}}\right)+0.1817\left({\rm{Glutamine}}\right)+0.1891\left({\rm{Glycerol}}\right)+\\ 0.1692\left({\rm{Hypoxanthine}}\right)-0.2165\left({\rm{IMP}}\right)+0.1741\left({\rm{Malate}}\right)-0.1527\left({\rm{O}}-{\rm{acetylcarnitine}}\right)+0.1954\left({\rm{Tyrosine}}\right)+0.1206({\rm{Uracil}})\end{array}\end{array}$$
(1)
$$\begin{array}{c}{\rm{y}}=-0.0006+0.0800\left({\rm{ADP}}\right)-0.2090\left({\rm{Carnosine}}\right)-0.1729\left({\rm{Creatine}}\right)+\\ \begin{array}{c}0.0150\left({\rm{Ethanol}}\right)+0.1105\left({\rm{Fumarate}}\right)+0.1270\left({\rm{Glutamine}}\right)+0.1898\left({\rm{Glycerol}}\right)+\\ 0.1692\left({\rm{Hypoxanthine}}\right)-0.2190\left({\rm{IMP}}\right)+0.0372\left({\rm{Malate}}\right)-0.1589\left({\rm{O}}-{\rm{acetylcarnitine}}\right)+0.1605\left({\rm{Tyrosine}}\right)+0.1014\left({\rm{Uracil}}\right)\end{array}\end{array}$$
(2)
$$\begin{array}{c}{\rm{y}}=6.2400+0.1261\left({\rm{Acetate}}\right)-0.2644\left({\rm{Adenosine}}\right)+0.1813\left({\rm{Alanine}}\right)+\\ \begin{array}{c}0.0920\left({\rm{Asparagine}}\right)+0.2162\left({\rm{Aspartate}}\right)+0.0434\left({\rm{Cadaverine}}\right)+\\ 0.3270\left({\rm{Hypoxanthine}}\right)-0.4480\left({\rm{IMP}}\right)+0.0287\left({\rm{Isoleucine}}\right)-0.3851\left({\rm{Tyramine}}\right)\end{array}\end{array}$$
(3)
$$\begin{array}{c}{\rm{y}}=-0.0008+0.1612\left({\rm{Acetate}}\right)+0.0206\left({\rm{Adenosine}}\right)+0.3128\left({\rm{Alanine}}\right)+\\ \begin{array}{c}0.0366\left({\rm{Asparagine}}\right)+0.4623\left({\rm{Aspartate}}\right)+0.1616\left({\rm{Cadaverine}}\right)+\\ 0.2302\left({\rm{Hypoxanthine}}\right)-0.6273\left({\rm{IMP}}\right)-0.5543\left({\rm{Isoleucine}}\right)-0.5472\left({\rm{Tyramine}}\right)\end{array}\end{array}$$
(4)
Fig. 4: Results of RMSE value using LOOCV. The RMS values were calculated cumulatively as the number of features increased.
Fig. 4: Results of RMSE value using LOOCV. The RMS values were calculated cumulatively as the number of features increased.
Full size image

RMSE root mean square error, LOOCV leave-one-out cross-validation. Model 1: Ridge regression using metabolites selected by LASSO regression. Model 2: Support vector regression (SVR) using metabolites selected by LASSO regression. Model 3: Ridge regression using metabolites selected by Random Forest. Model 4: SVR using metabolites selected by Random Forest.

In the early post-mortem period, metabolite changes were primarily influenced by intrinsic enzymatic processes rather than microbial activity32, as indicated by minimal bacterial proliferation (Fig. 2a). Metabolites such as IMP and hypoxanthine (Table S2) are typical byproducts of post-mortem ATP degradation, reflecting endogenous biochemical processes32. As storage progressed, microbial proliferation increased significantly (Fig. 2a). For example, Lactobacillaceae species contributed to the production of ethanol and biogenic amines such as cadaverine and tyramine13,33. In general, microbial nucleic acid metabolism and energy production processes are reflected in metabolites such as ADP, IMP, hypoxanthine, and uracil34,35. ADP, IMP, hypoxanthine, and uracil are involved in microbial nucleic acid metabolism and energy production processes34. ADP is produced when energy is released from ATP, while IMP and hypoxanthine are produced through the breakdown of nucleic acids34. Uracil, a component of RNA, is directly involved in the metabolism of microbial genetic material35. Amino acid-related metabolites such as glutamine, tyrosine, carnosine, and creatine have crucial roles in the synthesis and breakdown of proteins by microbial36. O-acetylcarnitine and glycerol can be affected by microbial fatty acid metabolism37.

Therefore, each metabolite was influenced not only by the total bacterial count but also by the composition of specific microbes. Consequently, we also verified whether the metabolites in the drip could be used to predict specific microbial compositions.

Mathematical method for predicting microbial composition model

The model used for prediction employed elastic net regression, which combines LASSO and Ridge models15. Prior to modeling, metabolites were input as independent variables and microbial composition values as dependent variables in the elastic net regression, and the results are presented in Table 1. The model’s fit was assessed using Spearman correlation (rs), with results showing values above 0.95 for the Microbacterium, Pseudomonas, Stenotrophomonas, Acinetobacter, Brevundimonas, Enterococcaceae, Leuconostoc, and Enterococcus. For Enhydrobacter, Enterobacteriaceae, Brochothrix, Lactobacillaceae, Lactobacillus, Lactococcus, and Lactobacillales, the rs values were confirmed to be above 0.9. Additionally, rs values above 0.8 were observed for the remaining microbes. A Spearman correlation coefficient (rs) is interpreted as follows: absolute values between 0.3 and 0.5 suggest a weak to moderate correlation; values between 0.5 and 0.7 indicate a moderate to strong correlation; and values above 0.7 are typically considered to reflect a strong correlation38. Thus, the elastic net regression model is considered to have a high reliable results based on the high Spearman correlation observed. Consequently, to accurately assess the model’s performance, we conducted model validation using the RMSE, which evaluates the similarity between the actual values and the predictions effectively (Table 1). The RMSE value is the square root of the mean squared error between predicted values and actual values39. Therefore, RMSE values distributed in the range of 0.15 to 0.19 indicate that the model, utilizing elastic net regression, has a high prediction accuracy. Consequently, the metabolites in the drip can also be used to predict the specific composition of microbes through mathematical modeling.

Table 1 Microbial composition prediction modeling using net-elastic regression (N = 1800, n = 600)

We conducted variables selection and modeling to predict TAB and microbial composition in pork using drip. For variables selection, we used LASSO regression and RF, while Ridge regression and SVR were utilized for the modeling phase. The combination of LASSO regression for variable selection followed by Ridge regression proved to be the most appropriate method for predicting TAB, as evidenced by an R2 value of 0.968 and an RMSE of 0.283. Additionally, integrated elastic net regression, which combines LASSO and Ridge regression, was effective for predicting microbial compositions. Therefore, the combination of LASSO and Ridge regression provides high accuracy, making it a valuable tool for prediction of meat freshness and microbial composition. Traditional methods, such as total bacterial count (TBC), typically require over 48 h and involve destructive sampling, leading to material loss and delayed results. In contrast, our method using drip metabolites can reduce the analysis time to just a few hours while maintaining non-destructive sampling, offering a more efficient and practical solution for meat freshness assessment. However, this study has limitations due to the small sample size, which may affect the statistical robustness and generalizability of the findings. While the model showed high accuracy through cross-validation, its performance in real-world industrial environments has not been fully validated. Factors such as sample variability, environmental conditions, and measurement inconsistencies could influence the model’s robustness in practical applications. Further research is needed under various conditions, including packaging, storage, drip collection methods, and additional batch experiments, to assess the model’s generalizability. Moreover, studies should distinguish between metabolite changes from microbial activity and those from natural aging. Thus, this study serves as a preliminary investigation into methods of measuring spoilage using drip metabolomic analysis and underscores the need for future research based on these preliminary results.

Methods

Sample preparation

Pork loin (Longissimus thoracis) from both sides of three crossbred ((Landrace × Yorkshire) × Duroc) pigs was acquired from three individual butcher shops in different locations. Each pig was slaughtered 2 days prior to purchase. After trimming both ends, each loin was sliced into 7 pieces, each 11 cm thick. The sliced samples were vacuum-packaged (HFV-600L; Hankook Fujee Machinery Co., Ltd., Hwaseong, Korea) in a polyethylene/nylon bag (oxygen permeability, 22.5 mL/m² per 24 h at an atmospheric pressure of 60% relative humidity and 25 °C; moisture vapor transmission rate, 4.7 g/m² per 24 h at 100% relative humidity and 25 °C). In total, 30 loin slices (5 days × 3 pigs × 2 sides) were incubated at 4 ± 2 °C and sampled on days 1, 8, 14, 20, and 27 for subsequent analyses.

pH

This procedure was slightly modified based on the methodology described by Jung et al.40. Each 3 g sample was homogenated with 27 mL of distilled water using an Ultra-Turrax T25 homogenizer (Ika-Werke) at a speed of 9600 rpm for 30 s. Following this, the mixtures were centrifuged in a Continent 512R centrifuge (Hanil) at 2265 × g for 10 min and filtered using Whatman No. 4 paper (Whatman plc, Maidstone, UK). The pH of the samples was measured using a Seven2Go pH meter and an InLab Expert Go-ISM pH probe (Metter-Toledo International Inc., Schwerzenbach, Switzerland). The pH meter was first calibrated at room temperature using buffer solutions with pH values of 4.0, 7.0, and 9.21. The pH of sample was measured twice, with the average of these measurements taken as the final value for each sample.

Drip loss

Assessing drip loss in pork was based on the method described by Choi et al.41. To calculate the drip loss, each piece of pork was initially weighed prior to packaging. Following a period of storage, the pieces were reweighed. The drip loss was then determined by calculating the difference in weight before and after storage using Eq. (4).

$$\mathrm{Drdip}\,\mathrm{loss}=\frac{\mathrm{original\; sample\; weight}-\mathrm{sample\; weight\; after\; incubation}}{\mathrm{original\; sample\; weight}}\times 100$$
(5)

Total aerobic bacterial counts

The total aerobic bacterial count (TAB) was conducted following the methods described by Ismail et al. 42. Initially, 10 g of the sample were placed into a sterile bag containing 90 mL of a 0.85% saline solution. The mixture was then agitated using a BagMixer 400 P (Interscience Ind., St. Nom, France). This was followed by creating serial dilutions to secure measurable concentrations. Afterward, aliquots of 100 μL from these dilutions were spread on plate count agar (Difco Laboratories, Detroit, MI, USA) and were then incubated at 37 °C for 48 h. Post-incubation, the number of bacterial colonies was counted. The final results were recorded in log CFU/g.

Metabolites

The NMR analysis of metabolites was carried out using the method described by Kim et al.4. Initially, 1 mL of pork drip collected from pork loin samples was mixed with 4 mL of 0.6 M perchloric acid and homogenized at 16,000 rpm for 1 min with a T25 Ultra homogenizer (Ika-Werke). This homogenate was then centrifuged at 3000 × g for 20 min at 4 °C in a Continent 512 R centrifuge (Hanil Co., Ltd., Incheon, Korea). The supernatant was poured into a new tube, and its pH was adjusted to 7.0 using KOH, measured by a SevenGo pH meter (Mettler-Toledo, Schwerzenbach, Switzerland), followed by another centrifugation under the same conditions. The supernatant was then filtered through Whatman No. 1 filter paper (Whatman plc) and freeze-dried for 4 h (Freezer Dryer 18, Labco Corp., Kansas City, USA). The dried sample was rehydrated with 1 mM 3-(trimethylsilyl)propionic acid-2,2,3,3-d4 (TSP) in D2O, adjusted to pH 7.4 with a 20 mM phosphate buffer, then vortexed and incubated at 37 °C for 10 min. The samples centrifugation at 17,800 × g for 20 min in an HM-150IV centrifuge (Hanil Co., Ltd., Incheon, Korea), and the prepared samples were placed into 5 mL NMR tubes for analysis using a Bruker 850 MHz NMR spectrometer (Bruker Biospin GmbH, Baden-Wuttemberg, Germany). The analysis employed the zg30 pulse sequence in Bruker’s Topspin 4.2.0 software, collecting data at 298 K. The entire process from metabolite extraction to NMR measurement took approximately 5 h.

The NMR spectra used 64 K data points, a sweep width of 17,007.803 Hz, and 128 scans with an acquisition time of 4.20 s. TSP resonance was used as a reference for chemical shift calibration in both qualitative and quantitative analyses. Manual baseline corrections and peak identifications were conducted using 2D NMR spectra techniques such as correlation spectroscopy (COSY) and heteronuclear single quantum coherence (HSQC). COSY was performed with 2 K data points in the t2 domain and 128 increments in the t1 domain at 16 scans, covering a spectral width of 11 ppm. HSQC utilized 2 K data points in the t2 domain and 256 increments in the t1 domain at 16 scans, with spectral widths of 223 ppm and 11 ppm for the f1 and f2 axes, respectively, with a coupling constant of 145 Hz to establish delay durations for short-range correlations. Peak identifications were referenced against the Human Metabolome Database (HMDB). Quantification of the metabolites from 1H NMR spectra was processed using the Chenomx NMR Suite 10.0 (Chenomx, Inc., Edmonton, Alberta, Canada), with TSP as the internal standard and metabolite concentrations expressed in mg/dL, using five replicates for each sample.

DNA extraction and sequencing of microorganisms in pork

The method for extracting DNA from microorganisms in pork was modified on the techniques described by Kim et al.4. To assess the changes in microbial composition during storage, 16S rRNA analysis was performed on samples collected on days 1, 14, and 20, representing the initial, mid-point, and late stages of the storage period, respectively. Five grams of pork mixed with 10 mL sterile saline (0.85%) solution was shaken on a gyratory shaker for 15 min. The mixture was centrifuged at 200×g for 5 min. The supernatant was collected in a new test tube, then are centrifuged at 12,000 × g at 4 °C for 10 min. After the centrifugation, the supernatant was discarded, then the pellet was stored at 70 °C.

DNA extraction, polymerase chain reaction (PCR) amplification, and amplicon preparation for sequencing were analyzed as described by Mun et al.43. Genomic DNA was extracted from the cell pellet using DNeasy PowerSoil Kit (Qiagen, Hilden, Germany). Primer was using the V3-4 region of the 16S rRNA gene: forward, TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGCCAGCMGCCGCGGTAA-3′; reverse, 5′ -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACHVGGGTWTCTAAT-3′. The DNA libraries were sequenced using the Illumina iSeq 100 platform (Illumina, San Diego, CA, USA). The fastq files obtained from the iSeq100 sequencing data were analyzed using Mothur (v.1.41). In Mothur, contigs were read combined using the make.contig instruction and filtered for quality using the screen.seqs command. The sequences were aligned using the SILVA database v.138 and removed chimeric sequences using the VSEARCH program v.2.11.1. Taxonomic classification was assessed using the Greengenes format database 14 and removed from the dataset after chloroplast, archaeal, mitochondrial, and eukaryotic sequences were removed from the dataset. The low abundance taxonomic units and singletons were removed using the Mothur subroutine “split. Abund”, and a distance of 0.03 calculation (97% sequence similarity) was used to classify operational taxonomic units (OTUs). Sequence data used in this study were deposited to the Short Read Archives with the project number PRJNA1144167.

Statistical analyses and Modeling

Pork quality values and quantified metabolites were statistically analyzed using analysis of variance (ANOVA) in SAS software (Version 9.4, SAS Institute Inc., Cary, NC, USA). Differences among the means for each storage date were assessed using Tukey’s multiple comparison test. Statistical significance was set at p < 0.05.

The analysis of metabolomics data in this study was conducted using the R programming language and several packages. The modeling process is illustrated in Fig. 1. During the variable selection process, the glmnet package was utilized to implement LASSO regression. LASSO applies L1 regularization to drive the coefficients of less important variables to zero, effectively selecting variables18. RF analysis was performed using the randomForest package, where each decision tree independently learned from bootstrap samples and evaluated the importance of variables to select significant ones. RF selected metabolites with an importance score of 10 or higher after 10 iterations. Each set of selected metabolites from LASSO and RF was subsequently used for modeling. For modeling with the selected variables, the Ridge regression was implemented using the ridge package, and the SVR was implemented using the e1071 package. The Ridge regression applied L2 regularization to reduce the coefficients of all variables, and a linear kernel was used for the SVR. The performance of the model was evaluated using the coefficient of determination (R2) and the root mean square error (RMSE). R2 indicates how well a model explains the variability in the data, with values ranging from 0 to 1; values closer to 1 signify a better model fit44. RMSE is calculated by taking the square root of the average of the squared differences between the predicted and actual values, where lower values indicate higher predictive accuracy39. Leave-one-out cross-validation (LOOCV) was used to calculate the RMSE for assessing the model’s prediction accuracy.

Elastic net regression was applied to analyze the interactions between metabolomics data and various microbial compositions using the glmnet package45. The data used in the model underwent noise data augmentation using the dplyr and readr packages46. A total of 600 data points were generated for each storage period (day 1, day 14 and day 20) using Monet Carlo simulation5, resulting in a total of 1800 data points (n = 600, N = 1800) from the original data (n = 6, N = 18). This method combines LASSO and Ridge regularization to perform variable selection and coefficient shrinkage simultaneously, thereby enhancing the predictive power and interpretability of the model. The performance of the implemented model was evaluated using the Spearman’s correlation coefficient, which indicates how well the model explains the data. Additionally, to validate the model’s efficacy, RMSE was employed.