Introduction

The change of color in plants, particularly in tobacco leaves, holds significant importance as it serves as a visual indicator of their growth, development, and maturity stages1. This phenomenon is underpinned by intricate biological processes involving protein regulation, material metabolism, and interactions among various molecules within the plant tissues2. Understanding these color changes is crucial for assessing the optimal harvest time, predicting the quality of tobacco products, and ensuring efficient curing processes3.

Tobacco curing is a crucial step in tobacco production. It is not only related to the physical drying of tobacco leaves, but also involves complex biochemical reactions and internal material transformation, which directly affects the final quality and commercial value of tobacco leaves4. The color change is the most obvious and the most easily observed characteristic during the process of tobacco curing. Three main categories of chemical components affect the color of flue-cured tobacco leaves. The first category is the pigments such as carotene and lutein in tobacco. The second category is the polyphenol conversion of enzymatic browning reaction. The appearance of tobacco color from green to yellow indicates that the enzymatic process has been completed5. The third category is the complex produced by non-enzymatic browning reaction6.

The physiological and biochemical reactions during tobacco curing cover multiple levels, including moisture evaporation, carbohydrate metabolism, pigment conversion, protein degradation, aroma production, and redox reactions. These processes together shape the final quality of tobacco leaves7. Pigments play a crucial role in determining the color and appearance of tobacco leaves during the curing process8. Throughout the curing process, the degradation of chlorophyll and the concurrent increase in carotenoids represent a pivotal shift in pigment components that significantly impacts the color of the leaves9. In addition to pigments such as chlorophylls, carotenoids, and anthocyanins that directly affect visible color, polyphenols, particularly those involved in oxidation reactions, also indirectly contribute to the color development and overall appearance of cured tobacco leaves10. Non-enzymatic browning reaction is also known as Maillard reaction. It gives special aroma to tobacco leaves, optimizes the aroma quality of tobacco leaves, and changes the color of tobacco leaves11. Some conventional compounds, such as reducing sugars and starch, are important substrates for biochemical reactions and have a significant effect on color. Meanwhile, the temperature and time of curing also affect the color of tobacco leaves12,13.

With the development of precision and intelligence in tobacco curing, some tobacco workers use spectroscopy to predict the chemical components content of tobacco14,15. Spectrometers are costly compared with spectrophotometers and difficult to adapt to production needs16. Machine learning algorithms can better analyze and summarize all kinds of complex information scientifically and efficiently, and are widely used to deal with linear and nonlinear problems17,18. Machine learning algorithms have shown significant applications in aspects of food quality control, food safety, and predictive analytics19,20. However, few studies reported the use of machine learning algorithms to predict the chemical components of tobacco based on color quantification.

The primary objectives of this study are to (i) clarify the color values closed related to the chemical components of tobacco leaves during curing process, (ii) establish and validate the predicted models of chemical components of tobacco leaves based on machine learning algorithms, (iii) provide theoretical basis and method reference for intelligent monitoring of tobacco curing status and improving the quality of tobacco curing.

Materials and methods

Plant materials

The tobacco variety used in this study is K326, which was cultivated in Nanping City, located within the southeastern Chinese province of Fujian. To ensure the reliability and representativeness of the samples, a meticulous selection process was carried out on the tobacco plants. Only those exhibiting uniform growth patterns, closely matching leaf colors, similar leaf sizes, and consistent field quality were chosen as experimental subjects. The middle and upper leaves were taken and conventionally cured in the tobacco baking room. The curing process was meticulously managed and divided into three distinct stages: yellowing stage, color fixing stage and dry tendon stage21,22. Samples were taken at the time points of 0 h, 24 h, 48 h, 60 h, 72 h, 84 h, 96 h, 108 h, 120 h and 132 h in the curing process. All time points were taken 10 times from the upper, middle and lower layers of the baking room. Nine representative intact tobacco leaves were selected at the sampling port of the same layer each time, and each three leaves were taken as a sample. In summary, three samples were taken at each sampling time point in the same layers. A total of 180 samples were taken.

The samples were first used to determine the color values and the moisture content of tobacco leaves., and then the main veins of tobacco leaves were removed. The leaves samples were frozen at -20 °C and lyophilized using freeze dryer (FreeZone2.5Plus, Labconco, USA), and the milled sample powder was used to determine the chemical components. The stationary phase and mobile phase used for the determination of pigments and polyphenols were of chromatographic grade, and the remaining reagents were of analytical grade. All reagents were purchased from Kemiou Chemical Reagent Co., Ltd, Tianjin, China.

Analysis of samples

Determination of color values

During the curing process, using a portable spectrophotometer (Ci64, X-rite, USA) to determine the lightness value (L*), greenness/redness value (a*), blueness/yellowness value (b*), color ratio (H*), hue (H°) and saturation (C*) of tobacco leaves. Each tobacco leaf was determined at six detection points according to the previously reported method23.

Determination of pigment content

Sample (2 g) in a conical flask (50 mL), added 25 mL 90% acetone, and ultrasound for 20 min. About 2 mL of the mixture was taken from the flask and filtered through a 0.45 μm organic membrane, and the filtrate was collected in a high performance liquid chromatography (HPLC) vial. The contents of lutein and β-carotene in the filtrate were determined by HPLC24.

Determination conditions of HPLC: The separation of pigments was achieved using a reversed-phase C18 column with a particle size of 4 μm and dimensions of 3.9 mm internal diameter by 150 mm in length. The mobile phase consisted of (A) isopropanol and (B) acetonitrile at an 80% concentration, applied with a gradient elution technique at a flow rate of 1.5 mL/min. The optimal gradient program was set as follows: 100% B from 0 to 40 min, followed by 100% A from 40 to 46 min. The column was maintained at a temperature of 30 °C, and the sample injection volume was 10 µL. Detection was performed at a wavelength of 448 nm.

Determination of polyphenol content

Sample (0.1 g) in a conical flask (50 mL), added 20 mL 50% methanol, and ultrasound for 20 min. About 2 mL of the mixture is then removed from the flask and filtered through a 0.45 μm hydrophilic membrane. The filtrate is collected in an HPLC vial. The content of polyphenol content in the filtrate were determined by HPLC25,26.

Determination conditions of HPLC: The separation of polyphenols was conducted on a reversed-phase C18 column featuring a particle size of 5 μm, an internal diameter of 4.6 mm, and a length of 250 mm. The mobile phase used was composed of two solutions: (A) a mixture of water, methanol, and acetic acid in a ratio of 10:88:2 (v/v/v), and (B) a mixture of water, methanol, and acetic acid in a ratio of 88:10:2 (v/v/v). Gradient elution was employed at a flow rate of 1 mL/min. The optimal gradient program was as follows: 100% A from 0 to 16.5 min, a transition to 80% A and 20% B from 16.5 to 30 min, and finally 20% A and 80% B from 30 to 40 min. The column was kept at a constant temperature of 30 °C, and the injection volume was 10 µL. Detection was carried out at a wavelength of 340 nm.

Determination of moisture content and conventional chemical components

The moisture content of wet basis was determined by drying method. The tobacco leaves were placed in an oven (DHG-9140 A, China), and 10 tobacco leaves were dried at 105 °C ± 3 °C for at least 6 h to determine the moisture content of the tobacco leaves27.

The conventional chemical components are determined by a continuous flow meter (AA3, SEAL Analytical, Germany)28. Conventional chemical components include nicotine, total nitrogen, reducing sugar, total sugar and starch content.

Determination of nicotine: A total of 0.25 g tobacco samples was extracted with 25 mL water in a 50 mL Erlenmeyer flask. The nicotine in the extract reacted with p-aminobenzenesulfonic acid and cyanogen chloride, which was produced by the on-line reaction of potassium cyanide and chloramine T. The reaction products were determined at 460 nm by a colorimeter29.

Determination of total nitrogen: 0.1 g sample was weighed in the digestive tube, and 0.1 g of mercuric oxide, 1.0 g of potassium sulfate and 5 mL of concentrated sulfuric acid were added. After digestion and decomposition, the nitrogen was converted into ammonia. Under alkaline conditions, ammonia was oxidized to chloride by sodium hypochlorite, and then reacted with sodium salicylate to produce an indigo dye, which was determined at 660 nm30.

Determination of water-soluble sugars (reducing sugar and total sugar): 0.25 g of sample was extracted in a 50 mL flask with 25 mL of 5% acetic acid aqueous solution. The sugars in the extract reacted with p-hydroxymethyl hydrazide to produce a yellow azo compound in an alkaline medium at 85 C. The colorimetric determination was performed at 410 nm31.

Determination of starch: 0.25 g sample was weighed and ultrasonically extracted with 25 mL of 80% ethanol-saturated sodium chloride solution for 30 min to remove the interfering substances in tobacco products. The extract was discarded and ultrasonically extracted with 40% perchloric acid for 10 min. The starch reacted with iodine under acidic conditions and was determined at 570 nm32.

Determination of free amino acids

An automatic amino acid analyzer (Hitachi 8900, Japan) was used to determine the content of free amino acids in tobacco samples33.

Construction of prediction model

Sample set selection

In order to ensure the uniformity of the distribution of the components to be predicted in the train set and the validation set, the gradient quality method was used to divide the tobacco leaves sample set. All samples were sorted in ascending order according to the content of the components to be predicted, and then one sample was taken as the validation set sample at an equal interval. All Fujian tobacco leaves samples were divided into a modeling set and a validation set at a ratio of 3 : 1. The test set for the model is the Yunnan tobacco sample set. Samples treatment and assay methods were consistent with those of the Fujian samples (Fig. 1).

Fig. 1
figure 1

The specific division of tobacco data.

Modeling method

Four algorithms, including partial least squares regression (PLSR), ridge regression (RR), support vector machine (SVM) and random forest (RF), were used to construct the prediction model of conventional chemical components of tobacco leaves during tobacco curing. In the study, the L* value, a* value, b* value, H* value, H° value and C* value were used as independent variables, and the tobacco leaf to be predicted during the curing process was used as a dependent variable. The system grid search method was used to optimize the model parameters. Through 10-fold cross-validation, the best prediction model was determined when the root mean square error of training set (RMSET) was the smallest.

Model evaluation

Coefficient of determination (R2), RMSET, root mean square error of validation (RMSEV), and residual prediction deviation (RPD) were used to evaluate the performance of the model. The smaller the root mean square error is, the closer the determination coefficient R2is to 1, indicating the higher the accuracy of the model. RPD is an index to evaluate the overall prediction performance of the model. When RPD ≥ 2, it shows that the model can predict the predicted components more accurately. When 1.4 ≤ RPD < 2, it shows that the model can only predict the predicted components roughly34,35.

$$\:{R}^{2}=1-\frac{\sum\:_{i=1}^{n}{\left({Y}_{i}-{Y}_{i}^{pre}\right)}^{2}}{\sum\:_{i=1}^{n}{\left({Y}_{i}-{Y}_{i}^{avg}\right)}^{2}}$$
(1)
$$\:RMSE=\sqrt{\frac{\sum\:_{i=1}^{n}{\left(Yi-{Y}_{i}^{pre}\right)}^{2}}{n}}$$
(2)
$$\:RPD=\sqrt{\frac{\sum\:_{i=1}^{n}\left({Y}_{i}-{Y}_{i}^{avg}\right)}{\sum\:_{i=1}^{n}{\left({Y}_{i}-{Y}_{i}^{pre}\right)}^{2}}}$$
(3)

Note

Yi, and are the true value, estimated value and average value of sample “i”, respectively, and “n” is the number of samples.

Data analysis

All experiments were performed in triplicate, with results reported as mean value ± standard deviation. Significance testing for differences was conducted using SPSS Statistics 21.0 (SPSS Inc., Shanghai, China), and a probability value (P ≤ 0.05) indicated that differences between means were statistically significant. Origin 2021 software (OriginLab Corporation, USA) was used to draw correlation heatmap. Python 3.9 was used to build the model.

Results and discussion

Significance analysis

Variations of color values of tobacco leaves

The change trend of color values of middle and upper tobacco leaves during curing process showed high consistent (Table 1). The L*, b* and C* values continued to increase, and then gradually decreased. The a* value and H* value increased significantly first and then increased trend has slowed. The H° value increased rapidly and then tended to be stable. The L*, b* and C* values of tobacco increased significantly with the increase of curing time, and then decreased significantly after 60–72 h. The a*, H* and H° values increased significantly in the curing time of 0–48 h.

There were some differences between the color values of middle and upper tobacco leaves. The L* value of the upper leaves reached the maximum at 48 h, and the L* value of the middle leaves reached the maximum at 72 h. The b* value and C* value reached the maximum at 24–48 h of curing. Maximum values were reached after 24 h and 48 h of curing for the upper and middle leaves, respectively. During curing process, the b* and C* values of the two sites differed significantly at 48 h and 60 h. The b* and C*values of the upper leaves were significantly higher than those of the middle leaves. The color change of tobacco leaves during curing was divided into two stages. The first stage was that the color of tobacco leaves changed from yellowish green to light yellow (0–72 h), and the second stage was that the color of tobacco leaves changed from light yellow to dark orange (72–132 h). This is consistent with the results reported by Meng et al36.. Combined with the change of moisture content, in the early stage of curing, the moisture content decreased slowly, and the leaves contained high moisture content (50-80%), which gave a brighter feeling visually, so the L* value reached the maximum value. In the later stage of curing, the moisture content decreased rapidly, the yellowing degree of tobacco leaves slowed down, and the color depth continued to accelerate. Therefore, the a* value increased slightly and the b*value decreased6.

Table 1 The variations of color values in tobacco leaves under different curing time.

The changes of pigments content and polyphenol content in tobacco leaves

Overall, the trends of the pigments content and polyphenols content of the middle and upper tobacco during curing process were generally consistent (Table 2). The results demonstrated that the pigments content decreased significantly at the beginning of the curing process and did not change significantly from 48 h to the end of the curing process. During curing process, the tobacco leaves were oxidized and decomposed under the action of lipoxygenase to form intermediate products such as violaxanthin, geraniol and ionone8. This may be the reason for the decrease of pigments content.

The polyphenols content increased significantly from the beginning to the end of curing. No significant change was observed in neochlorogenic acid, caffequinic acid, chlorogenic, rutin and kaempferol glycoside content at the curing time of 84–132 h. Total polyphenols content were not significantly different from 84 h to 132 h of curing. The changes of polyphenols in tobacco leaves were very severe during the curing process, and the total polyphenols increased significantly due to the cleavage and enzymatic decomposition of phenolic glycosides. Under the action of peroxidase and polyphenol oxidase, polyphenols are easily oxidized to light red to dark brown quinones and their polymers, so that the color of tobacco leaves changes from yellow to different degrees of tan37.

Table 2 The changes of pigments content and polyphenols content in tobacco leaves under different curing time.

Moisture content and conventional chemical components changes of tobacco leaves

The moisture content of the upper, middle leaves of tobacco in tobacco baking room was significantly different (Table 3). During curing process, the moisture content decreased in turn. From the beginning to the end of curing, the moisture content of the tobacco leaves decreased to 15-20%. Before the curing time (48 h), the moisture content changed little. The moisture content decreased substantially from the curing time 48 h to 96 h. After that, moisture content tended to be stable. These results indicated that the water loss of tobacco leaves was slow in the early stage (from 0 h to 48 h), accelerated in the middle stage (from 48 h to 96 h), and stabilized in the later stage (from 96 h to 132 h). This is basically consistent with the research results of Condorí et al2.. The results were divided into 55 h and 117 h.

According to Table 3, the content of nicotine and total nitrogen in the two parts of tobacco leaves showed a basically stable trend during curing process. The content of reducing sugar and total sugar increased rapidly and then tended to be stable. Starch showed a trend of rapid decline first and then stabilized. Among them, the content of reducing sugar, total sugar and starch tended to be stable after curing for 60 h. Compared with the two parts, the content of nicotine and total nitrogen in the upper leaves were higher than those in the middle leaves, and the content of reducing sugar and total sugar in the middle leaves were higher than those in the upper leaves, while the changes of starch content in the two parts were basically the same. The content of nicotine, total nitrogen, reducing sugar and total sugar in different parts of tobacco leaves were significantly different during curing process. The content of nicotine and total nitrogen in the upper leaves were about 1% and 0.5% higher than those in the middle leaves, respectively, which may be related to the different nitrogen content, water content and sunlight exposure29. The difference of reducing sugar and total sugar content gradually increased from 0 to 10% and 15% respectively, which indicated that the content of reducing sugar and total sugar was greatly affected during the curing process of tobacco leaves, which may be related to the accumulation of internal macromolecules in the two parts, the tightness of the organizational structure and the degree of influence by curing32.

Table 3 The changes of moisture content and conventional chemical components in tobacco leaves under different curing time.

Free amino acid content changes of tobacco leaves

A total of 21 free amino acids were detected in the tobacco leaves (Tables 4, 5 and 6), which include 18 protein amino acids and 3 non-protein amino acids. The 18 protein amino acids are categorized according to their side chains: (1) aromatic amino acids, including phenylalanin (Phe), tryptophan (Try), tyrosine (Tyr); (2) acidic amino acids, including aspartic acid (Asp) and glutamic acid (Glu); (3) alkaline amino acids including lysine (Lys), histidine (His), and arginine (Arg); (4) aliphatic amino acids, including alanine (Ala), glycine (Gly), isoleucine (Ile), leucine (Leu), and valine (Val); (5) Hydroxy amino acids, including threonine (Thr) and serine (Ser); (6) sulfur-containing amino acids: cystine (Cys); (7) amide amino acids: asparagine (Asn); and (8) sublethionic amino acids: proline (Pro). Three non-protein amino acids, including β-alanine (β-Ala), β-Aminoisobutyric acid (β-AiBA), and γ-aminobutyric acid (γ-ABA).

On the whole, the content of most free amino acids increased first and then decreased during the curing process, which may be due to the degradation and transformation of macromolecular substances (proteins, etc.) in the tobacco leaves during the curing process3,11. The degradation and transformation of macromolecular substances (proteins, etc.) mainly occurred in the yellowing stage and the early color fixing stage. The content of free amino acids decreased to a certain extent during the curing process, mainly due to the non-enzymatic browning reaction between amino acids and reducing sugars, which not only increased the aroma of tobacco leaves, but also increased the proportion of orange leaves33.

With the curing process, Phe, Trp, His and Asn showed a trend of increasing significantly at first and then decreasing gradually. Cys, Val, Pro and β-Ala gradually increased. Tyr and γ-ABA showed a gradual decline, while the acidic amino acids, Lys, Thr and Ser first decreased significantly, and then remained stable. The remaining amino acids were unstable. During the curing process, the aromatic amino acids within the tobacco leaves degrade, which has an impact on the quantity and quality of the tobacco’s aroma. Additionally, they undergo non-enzymatic browning reactions, generating key intermediate products such as Amadori compounds, which subsequently lead to the formation of melanoidins, causing the tobacco leaves to become brown to a certain extent38,39.

Table 4 The changes of aromatic, acidic, alkaline amino acid content in tobacco leaves under different curing time.
Table 5 The changes of amino acid content with aliphatic, hydroxyl and amide groups in tobacco leaves under different curing time.
Table 6 The changes of amino acid content with sulfur, subamino, non-protein and total in tobacco leaves under different curing time.

Correlation of color values and chemical components of tobacco leaves

As shown in Fig. 2, the L* value was significantly positively correlated with moisture, scopoletin, reducing sugar and total sugar contents, and significantly negatively correlated with β-carotene content. The a* value was significantly negatively correlated with moisture, lutein, β-carotene, scopoletin and starch contents, and significantly positively correlated with other chemical components. It could be due to the decrease in the moisture content of wet basis of tobacco leaves affected the activity of Polyphenol oxidase (PPO) and the content of malondialdehyde (MDA), which caused the change of polyphenol content40. The b*value was significantly positively correlated with moisture and reducing sugar contents. Due to the acceleration of Maillard reaction at higher temperatures, the interaction between sugars and amino acids is easy to form brown compounds, resulting in darker color25.

The L* value was significantly positively correlated with Trp (Fig. 3). The a* value was significantly negatively correlated with Asp, Thr and Ser, and significantly positively correlated with Pro and total free amino acids. The b* value was significantly positively correlated with Phe and Trp. Overall, the change in apparent color of tobacco leaves during curing was closely related to the content of chemical components of tobacco leaves.

Fig. 2
figure 2

Heat map of the correlation between color values, moisture content, pigments content, polyphenols content and conventional chemical components of tobacco leaves during curing (P ≤ 0.05).

Fig. 3
figure 3

Heat map of the correlation between color values and free amino acids content of tobacco leaves during curing (P ≤ 0.05). Note: Phe: phenylalanin, Try: tryptophan, Tyr: 360 tyrosine, Asp: aspartic acid, Glu: glutamic acid, Lys: lysine, His: histidine, Arg: arginine, Ala: alanine, Gly: glycine, Ile: isoleucine, Leu: leucine, Val: valine, Thr: threonineand, Ser: serine, Cys: cystine, Asn: asparagine, Pro: proline, β-Ala: β-alanine, β-AiBA: β-Aminoisobutyric acid, γ-ABA: γ-aminobutyric acid. 

Construction and validation of prediction model for chemical components of tobacco leaves during curing process

The PLSR, RR, SVM and RF algorithms were used to construct the prediction model of chemical components of tobacco leaves during flue-cured tobacco curing. It can be seen from Table 7 that the moisture, lutein, β-carotene, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids prediction models constructed by four modeling methods based on color values are all good. Among them, the RF algorithm was used to construct the moisture, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids prediction model of tobacco leaves during curing process, the accuracies of the models were the highest. The RR algorithm was used to construct the prediction model of lutein and β-carotene in tobacco curing process with the highest accuracy. The model validation set R2of these indicators was high, and the RPD is basically higher than 2.0. This indicted that it was feasible to predict the content of some chemical components in tobacco leaves by color quantification. Zhu15proposed that a tobacco chemical component analysis method based on a neural network (TCCANN) performed simultaneous quantitative analysis of multiple chemical compositions of tobacco by using near-infrared (NIR) hyperspectroscopy imagery. However, the proposed TCCANN cannot determine the completely accurate determination of chemical components of tobacco. In this study, a more accurate starch prediction model was constructed by color quantization machine learning algorithm. The reason may be that RF is a non-linear ensemble algorithm, which is suitable for modeling high-dimensional data samples41.

Table 7 Construction and validation of PLSR, RR, SVM and RF models based on color values.

The scatterplots represent the RR models for two chemical components (lutein and β-carotene) in Fig. 4, and the RF models for the fourteen chemical components (Moisture, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids) in Fig. 5. A small difference between estimated and measured values is shown, and most of the points are evenly and compactly distributed along the diagonal. The closer these points are to the diagonal, the higher their predicted values are, and the better the model fits. The clear linear relationships between the predicted and measured values of the sixteen chemical components are observed, which can predict some chemical components by color quantization.

Fig. 4
figure 4

The RR of modeling and verification of content prediction model in curing process.

Fig. 5
figure 5

The RF of modeling and verification of content prediction model in curing process. Note: Thr: threonineand, Asp: aspartic acid, Ala: alanine, Ser: serine, Pro: proline.

The estimation model of moisture content, reducing sugar, total sugar and starch content in RF algorithm constructed by Fujian Nanping tobacco samples has high accuracy (model validation set: R2 > 0.90, RPD > 2.0). On the one hand, the color value in correlation analysis is significantly correlated with moisture, reducing sugar, total sugar and starch. On the other hand, the RF algorithm can better deal with the multicollinearity problem, and successfully identify moisture content, reducing sugar, total sugar and starch content as strong predictors, thereby improving the accuracy of the model. Therefore, the samples other than these four high-precision models (Qujing, Yunnan) are tested. It can be seen from Table 8 that the R2 and RPD of the test set of the prediction model of moisture content in tobacco leaves during curing were 0.84 and 3.51, respectively. The R2 and RPD of the test set of reducing sugar content estimation model were 0.73 and 3.91, respectively. The R2 and RPD of the test set of total sugar content estimation model were 0.77 and 3.95, respectively. The test set R2 and RPD of the starch content estimation model were 0.82 and 5.27, respectively. It showed that the RF algorithm had high accuracy and stable fitting effect in estimating the contents of reducing sugar, total sugar and starch in tobacco leaves outside the model during curing process.

Table 8 Construction and test of RF model based on color values.

From Fig. 6, it can be seen that the measured values and estimated values of the test set of the RF algorithm estimation model with higher accuracy are evenly distributed near the 1 : 1 line. The performance of the estimation model constructed in Fujian is considered good, as indicated by the and RPD values of the four chemical components in the validation set of samples from Yunnan that were external to the model. The model can thus make fairly accurate estimations of these four chemical components. Some samples that were not accurately estimated can be attributed to regional differences in cultivation conditions and climate, among other factors12. This also indicates that there is room for improvement in the model’s performance. Going forward, adjustments to model parameters, the addition of more relevant features, and the use of more advanced algorithms will be employed to address these issues.

Fig. 6
figure 6

Test scatter plot of RF prediction model of moisture, reducing sugar, total sugar and starch content estimation during the curing process.

Conclusions

Changes in the surface color of tobacco leaves are related to the internal chemical content of the tobacco. The correlationship between color values and chemical components of tobacco leaves was analyzed and the results showed that the a* value of tobacco leaves was significantly negatively correlated with moisture, lutein, β-carotene, scopoletin, starch, Asp, Thr and Ser. The b* value of tobacco leaves was significantly positively correlated with moisture content and reducing sugar content. The PLSR, RR, SVM and RF algorithms were used to establish the prediction models of chemical components of tobacco leaves during flue-cured tobacco curing. The results indicated that the RF algorithms performed best to predict tobacco moisture, reducing sugar, total sugar and starch in the curing process, with the R2 values of the model validation sets were more than 0.90 and the RPD values were more than 2.0. These results confirm that it is possible to monitor the chemical components of tobacco in real time during curing process. Based on the color values of tobacco, machine learning algorithm can be used to establish the prediction model of tobacco chemical components in the curing process, which can quickly and accurately predict the moisture, lutein, β-carotene, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids in tobacco, provide theoretical basis and methodological reference for intelligently monitoring of tobacco curing status and improving the quality of tobacco. Further researches would focus on the application of the predicted model on the devices monitoring the images of tobacco leaves to realize the rapid acquisition of chemical components of tobacco leaves during curing process.