Abstract
Color is one of the most important indicators to characteristic the quality of tobacco, which is strongly related to the variations of chemical components. In order to clarify the relationship between the changes of tobacco color and chemical components, here we established several prediction models of chemical components with the color values of tobacco based on machine learning algorithms. The results of correlation analysis showed that tobacco moisture content was highly significantly correlated with the parameters such as a*, H* and H°, the reducing sugar and total sugar content of tobacco was significantly correlated with the color values, and the starch content was highly significantly correlated with the color values except for b* and C*. The random forest models performed best in predicting tobacco moisture, reducing sugar, total sugar and starch constructed with the R2 of the model validation set was higher than 0.90, and the RPD value was greater than 2.0. The consistent between the predictions and measurements verified the availability and feasibility using color values to predict some chemical components of the tobacco leaves with high accuracy, and which has distinct advantages and potential application to realize the real-time monitoring of some chemical components in the tobacco curing process.
Similar content being viewed by others
Introduction
The change of color in plants, particularly in tobacco leaves, holds significant importance as it serves as a visual indicator of their growth, development, and maturity stages1. This phenomenon is underpinned by intricate biological processes involving protein regulation, material metabolism, and interactions among various molecules within the plant tissues2. Understanding these color changes is crucial for assessing the optimal harvest time, predicting the quality of tobacco products, and ensuring efficient curing processes3.
Tobacco curing is a crucial step in tobacco production. It is not only related to the physical drying of tobacco leaves, but also involves complex biochemical reactions and internal material transformation, which directly affects the final quality and commercial value of tobacco leaves4. The color change is the most obvious and the most easily observed characteristic during the process of tobacco curing. Three main categories of chemical components affect the color of flue-cured tobacco leaves. The first category is the pigments such as carotene and lutein in tobacco. The second category is the polyphenol conversion of enzymatic browning reaction. The appearance of tobacco color from green to yellow indicates that the enzymatic process has been completed5. The third category is the complex produced by non-enzymatic browning reaction6.
The physiological and biochemical reactions during tobacco curing cover multiple levels, including moisture evaporation, carbohydrate metabolism, pigment conversion, protein degradation, aroma production, and redox reactions. These processes together shape the final quality of tobacco leaves7. Pigments play a crucial role in determining the color and appearance of tobacco leaves during the curing process8. Throughout the curing process, the degradation of chlorophyll and the concurrent increase in carotenoids represent a pivotal shift in pigment components that significantly impacts the color of the leaves9. In addition to pigments such as chlorophylls, carotenoids, and anthocyanins that directly affect visible color, polyphenols, particularly those involved in oxidation reactions, also indirectly contribute to the color development and overall appearance of cured tobacco leaves10. Non-enzymatic browning reaction is also known as Maillard reaction. It gives special aroma to tobacco leaves, optimizes the aroma quality of tobacco leaves, and changes the color of tobacco leaves11. Some conventional compounds, such as reducing sugars and starch, are important substrates for biochemical reactions and have a significant effect on color. Meanwhile, the temperature and time of curing also affect the color of tobacco leaves12,13.
With the development of precision and intelligence in tobacco curing, some tobacco workers use spectroscopy to predict the chemical components content of tobacco14,15. Spectrometers are costly compared with spectrophotometers and difficult to adapt to production needs16. Machine learning algorithms can better analyze and summarize all kinds of complex information scientifically and efficiently, and are widely used to deal with linear and nonlinear problems17,18. Machine learning algorithms have shown significant applications in aspects of food quality control, food safety, and predictive analytics19,20. However, few studies reported the use of machine learning algorithms to predict the chemical components of tobacco based on color quantification.
The primary objectives of this study are to (i) clarify the color values closed related to the chemical components of tobacco leaves during curing process, (ii) establish and validate the predicted models of chemical components of tobacco leaves based on machine learning algorithms, (iii) provide theoretical basis and method reference for intelligent monitoring of tobacco curing status and improving the quality of tobacco curing.
Materials and methods
Plant materials
The tobacco variety used in this study is K326, which was cultivated in Nanping City, located within the southeastern Chinese province of Fujian. To ensure the reliability and representativeness of the samples, a meticulous selection process was carried out on the tobacco plants. Only those exhibiting uniform growth patterns, closely matching leaf colors, similar leaf sizes, and consistent field quality were chosen as experimental subjects. The middle and upper leaves were taken and conventionally cured in the tobacco baking room. The curing process was meticulously managed and divided into three distinct stages: yellowing stage, color fixing stage and dry tendon stage21,22. Samples were taken at the time points of 0 h, 24 h, 48 h, 60 h, 72 h, 84 h, 96 h, 108 h, 120 h and 132 h in the curing process. All time points were taken 10 times from the upper, middle and lower layers of the baking room. Nine representative intact tobacco leaves were selected at the sampling port of the same layer each time, and each three leaves were taken as a sample. In summary, three samples were taken at each sampling time point in the same layers. A total of 180 samples were taken.
The samples were first used to determine the color values and the moisture content of tobacco leaves., and then the main veins of tobacco leaves were removed. The leaves samples were frozen at -20 °C and lyophilized using freeze dryer (FreeZone2.5Plus, Labconco, USA), and the milled sample powder was used to determine the chemical components. The stationary phase and mobile phase used for the determination of pigments and polyphenols were of chromatographic grade, and the remaining reagents were of analytical grade. All reagents were purchased from Kemiou Chemical Reagent Co., Ltd, Tianjin, China.
Analysis of samples
Determination of color values
During the curing process, using a portable spectrophotometer (Ci64, X-rite, USA) to determine the lightness value (L*), greenness/redness value (a*), blueness/yellowness value (b*), color ratio (H*), hue (H°) and saturation (C*) of tobacco leaves. Each tobacco leaf was determined at six detection points according to the previously reported method23.
Determination of pigment content
Sample (2 g) in a conical flask (50 mL), added 25 mL 90% acetone, and ultrasound for 20 min. About 2 mL of the mixture was taken from the flask and filtered through a 0.45 μm organic membrane, and the filtrate was collected in a high performance liquid chromatography (HPLC) vial. The contents of lutein and β-carotene in the filtrate were determined by HPLC24.
Determination conditions of HPLC: The separation of pigments was achieved using a reversed-phase C18 column with a particle size of 4 μm and dimensions of 3.9 mm internal diameter by 150 mm in length. The mobile phase consisted of (A) isopropanol and (B) acetonitrile at an 80% concentration, applied with a gradient elution technique at a flow rate of 1.5 mL/min. The optimal gradient program was set as follows: 100% B from 0 to 40 min, followed by 100% A from 40 to 46 min. The column was maintained at a temperature of 30 °C, and the sample injection volume was 10 µL. Detection was performed at a wavelength of 448 nm.
Determination of polyphenol content
Sample (0.1 g) in a conical flask (50 mL), added 20 mL 50% methanol, and ultrasound for 20 min. About 2 mL of the mixture is then removed from the flask and filtered through a 0.45 μm hydrophilic membrane. The filtrate is collected in an HPLC vial. The content of polyphenol content in the filtrate were determined by HPLC25,26.
Determination conditions of HPLC: The separation of polyphenols was conducted on a reversed-phase C18 column featuring a particle size of 5 μm, an internal diameter of 4.6 mm, and a length of 250 mm. The mobile phase used was composed of two solutions: (A) a mixture of water, methanol, and acetic acid in a ratio of 10:88:2 (v/v/v), and (B) a mixture of water, methanol, and acetic acid in a ratio of 88:10:2 (v/v/v). Gradient elution was employed at a flow rate of 1 mL/min. The optimal gradient program was as follows: 100% A from 0 to 16.5 min, a transition to 80% A and 20% B from 16.5 to 30 min, and finally 20% A and 80% B from 30 to 40 min. The column was kept at a constant temperature of 30 °C, and the injection volume was 10 µL. Detection was carried out at a wavelength of 340 nm.
Determination of moisture content and conventional chemical components
The moisture content of wet basis was determined by drying method. The tobacco leaves were placed in an oven (DHG-9140 A, China), and 10 tobacco leaves were dried at 105 °C ± 3 °C for at least 6 h to determine the moisture content of the tobacco leaves27.
The conventional chemical components are determined by a continuous flow meter (AA3, SEAL Analytical, Germany)28. Conventional chemical components include nicotine, total nitrogen, reducing sugar, total sugar and starch content.
Determination of nicotine: A total of 0.25 g tobacco samples was extracted with 25 mL water in a 50 mL Erlenmeyer flask. The nicotine in the extract reacted with p-aminobenzenesulfonic acid and cyanogen chloride, which was produced by the on-line reaction of potassium cyanide and chloramine T. The reaction products were determined at 460 nm by a colorimeter29.
Determination of total nitrogen: 0.1 g sample was weighed in the digestive tube, and 0.1 g of mercuric oxide, 1.0 g of potassium sulfate and 5 mL of concentrated sulfuric acid were added. After digestion and decomposition, the nitrogen was converted into ammonia. Under alkaline conditions, ammonia was oxidized to chloride by sodium hypochlorite, and then reacted with sodium salicylate to produce an indigo dye, which was determined at 660 nm30.
Determination of water-soluble sugars (reducing sugar and total sugar): 0.25 g of sample was extracted in a 50 mL flask with 25 mL of 5% acetic acid aqueous solution. The sugars in the extract reacted with p-hydroxymethyl hydrazide to produce a yellow azo compound in an alkaline medium at 85 C. The colorimetric determination was performed at 410 nm31.
Determination of starch: 0.25 g sample was weighed and ultrasonically extracted with 25 mL of 80% ethanol-saturated sodium chloride solution for 30 min to remove the interfering substances in tobacco products. The extract was discarded and ultrasonically extracted with 40% perchloric acid for 10 min. The starch reacted with iodine under acidic conditions and was determined at 570 nm32.
Determination of free amino acids
An automatic amino acid analyzer (Hitachi 8900, Japan) was used to determine the content of free amino acids in tobacco samples33.
Construction of prediction model
Sample set selection
In order to ensure the uniformity of the distribution of the components to be predicted in the train set and the validation set, the gradient quality method was used to divide the tobacco leaves sample set. All samples were sorted in ascending order according to the content of the components to be predicted, and then one sample was taken as the validation set sample at an equal interval. All Fujian tobacco leaves samples were divided into a modeling set and a validation set at a ratio of 3 : 1. The test set for the model is the Yunnan tobacco sample set. Samples treatment and assay methods were consistent with those of the Fujian samples (Fig. 1).
Modeling method
Four algorithms, including partial least squares regression (PLSR), ridge regression (RR), support vector machine (SVM) and random forest (RF), were used to construct the prediction model of conventional chemical components of tobacco leaves during tobacco curing. In the study, the L* value, a* value, b* value, H* value, H° value and C* value were used as independent variables, and the tobacco leaf to be predicted during the curing process was used as a dependent variable. The system grid search method was used to optimize the model parameters. Through 10-fold cross-validation, the best prediction model was determined when the root mean square error of training set (RMSET) was the smallest.
Model evaluation
Coefficient of determination (R2), RMSET, root mean square error of validation (RMSEV), and residual prediction deviation (RPD) were used to evaluate the performance of the model. The smaller the root mean square error is, the closer the determination coefficient R2is to 1, indicating the higher the accuracy of the model. RPD is an index to evaluate the overall prediction performance of the model. When RPD ≥ 2, it shows that the model can predict the predicted components more accurately. When 1.4 ≤ RPD < 2, it shows that the model can only predict the predicted components roughly34,35.
Note
Yi, and are the true value, estimated value and average value of sample “i”, respectively, and “n” is the number of samples.
Data analysis
All experiments were performed in triplicate, with results reported as mean value ± standard deviation. Significance testing for differences was conducted using SPSS Statistics 21.0 (SPSS Inc., Shanghai, China), and a probability value (P ≤ 0.05) indicated that differences between means were statistically significant. Origin 2021 software (OriginLab Corporation, USA) was used to draw correlation heatmap. Python 3.9 was used to build the model.
Results and discussion
Significance analysis
Variations of color values of tobacco leaves
The change trend of color values of middle and upper tobacco leaves during curing process showed high consistent (Table 1). The L*, b* and C* values continued to increase, and then gradually decreased. The a* value and H* value increased significantly first and then increased trend has slowed. The H° value increased rapidly and then tended to be stable. The L*, b* and C* values of tobacco increased significantly with the increase of curing time, and then decreased significantly after 60–72 h. The a*, H* and H° values increased significantly in the curing time of 0–48 h.
There were some differences between the color values of middle and upper tobacco leaves. The L* value of the upper leaves reached the maximum at 48 h, and the L* value of the middle leaves reached the maximum at 72 h. The b* value and C* value reached the maximum at 24–48 h of curing. Maximum values were reached after 24 h and 48 h of curing for the upper and middle leaves, respectively. During curing process, the b* and C* values of the two sites differed significantly at 48 h and 60 h. The b* and C*values of the upper leaves were significantly higher than those of the middle leaves. The color change of tobacco leaves during curing was divided into two stages. The first stage was that the color of tobacco leaves changed from yellowish green to light yellow (0–72 h), and the second stage was that the color of tobacco leaves changed from light yellow to dark orange (72–132 h). This is consistent with the results reported by Meng et al36.. Combined with the change of moisture content, in the early stage of curing, the moisture content decreased slowly, and the leaves contained high moisture content (50-80%), which gave a brighter feeling visually, so the L* value reached the maximum value. In the later stage of curing, the moisture content decreased rapidly, the yellowing degree of tobacco leaves slowed down, and the color depth continued to accelerate. Therefore, the a* value increased slightly and the b*value decreased6.
The changes of pigments content and polyphenol content in tobacco leaves
Overall, the trends of the pigments content and polyphenols content of the middle and upper tobacco during curing process were generally consistent (Table 2). The results demonstrated that the pigments content decreased significantly at the beginning of the curing process and did not change significantly from 48 h to the end of the curing process. During curing process, the tobacco leaves were oxidized and decomposed under the action of lipoxygenase to form intermediate products such as violaxanthin, geraniol and ionone8. This may be the reason for the decrease of pigments content.
The polyphenols content increased significantly from the beginning to the end of curing. No significant change was observed in neochlorogenic acid, caffequinic acid, chlorogenic, rutin and kaempferol glycoside content at the curing time of 84–132 h. Total polyphenols content were not significantly different from 84 h to 132 h of curing. The changes of polyphenols in tobacco leaves were very severe during the curing process, and the total polyphenols increased significantly due to the cleavage and enzymatic decomposition of phenolic glycosides. Under the action of peroxidase and polyphenol oxidase, polyphenols are easily oxidized to light red to dark brown quinones and their polymers, so that the color of tobacco leaves changes from yellow to different degrees of tan37.
Moisture content and conventional chemical components changes of tobacco leaves
The moisture content of the upper, middle leaves of tobacco in tobacco baking room was significantly different (Table 3). During curing process, the moisture content decreased in turn. From the beginning to the end of curing, the moisture content of the tobacco leaves decreased to 15-20%. Before the curing time (48 h), the moisture content changed little. The moisture content decreased substantially from the curing time 48 h to 96 h. After that, moisture content tended to be stable. These results indicated that the water loss of tobacco leaves was slow in the early stage (from 0 h to 48 h), accelerated in the middle stage (from 48 h to 96 h), and stabilized in the later stage (from 96 h to 132 h). This is basically consistent with the research results of Condorí et al2.. The results were divided into 55 h and 117 h.
According to Table 3, the content of nicotine and total nitrogen in the two parts of tobacco leaves showed a basically stable trend during curing process. The content of reducing sugar and total sugar increased rapidly and then tended to be stable. Starch showed a trend of rapid decline first and then stabilized. Among them, the content of reducing sugar, total sugar and starch tended to be stable after curing for 60 h. Compared with the two parts, the content of nicotine and total nitrogen in the upper leaves were higher than those in the middle leaves, and the content of reducing sugar and total sugar in the middle leaves were higher than those in the upper leaves, while the changes of starch content in the two parts were basically the same. The content of nicotine, total nitrogen, reducing sugar and total sugar in different parts of tobacco leaves were significantly different during curing process. The content of nicotine and total nitrogen in the upper leaves were about 1% and 0.5% higher than those in the middle leaves, respectively, which may be related to the different nitrogen content, water content and sunlight exposure29. The difference of reducing sugar and total sugar content gradually increased from 0 to 10% and 15% respectively, which indicated that the content of reducing sugar and total sugar was greatly affected during the curing process of tobacco leaves, which may be related to the accumulation of internal macromolecules in the two parts, the tightness of the organizational structure and the degree of influence by curing32.
Free amino acid content changes of tobacco leaves
A total of 21 free amino acids were detected in the tobacco leaves (Tables 4, 5 and 6), which include 18 protein amino acids and 3 non-protein amino acids. The 18 protein amino acids are categorized according to their side chains: (1) aromatic amino acids, including phenylalanin (Phe), tryptophan (Try), tyrosine (Tyr); (2) acidic amino acids, including aspartic acid (Asp) and glutamic acid (Glu); (3) alkaline amino acids including lysine (Lys), histidine (His), and arginine (Arg); (4) aliphatic amino acids, including alanine (Ala), glycine (Gly), isoleucine (Ile), leucine (Leu), and valine (Val); (5) Hydroxy amino acids, including threonine (Thr) and serine (Ser); (6) sulfur-containing amino acids: cystine (Cys); (7) amide amino acids: asparagine (Asn); and (8) sublethionic amino acids: proline (Pro). Three non-protein amino acids, including β-alanine (β-Ala), β-Aminoisobutyric acid (β-AiBA), and γ-aminobutyric acid (γ-ABA).
On the whole, the content of most free amino acids increased first and then decreased during the curing process, which may be due to the degradation and transformation of macromolecular substances (proteins, etc.) in the tobacco leaves during the curing process3,11. The degradation and transformation of macromolecular substances (proteins, etc.) mainly occurred in the yellowing stage and the early color fixing stage. The content of free amino acids decreased to a certain extent during the curing process, mainly due to the non-enzymatic browning reaction between amino acids and reducing sugars, which not only increased the aroma of tobacco leaves, but also increased the proportion of orange leaves33.
With the curing process, Phe, Trp, His and Asn showed a trend of increasing significantly at first and then decreasing gradually. Cys, Val, Pro and β-Ala gradually increased. Tyr and γ-ABA showed a gradual decline, while the acidic amino acids, Lys, Thr and Ser first decreased significantly, and then remained stable. The remaining amino acids were unstable. During the curing process, the aromatic amino acids within the tobacco leaves degrade, which has an impact on the quantity and quality of the tobacco’s aroma. Additionally, they undergo non-enzymatic browning reactions, generating key intermediate products such as Amadori compounds, which subsequently lead to the formation of melanoidins, causing the tobacco leaves to become brown to a certain extent38,39.
Correlation of color values and chemical components of tobacco leaves
As shown in Fig. 2, the L* value was significantly positively correlated with moisture, scopoletin, reducing sugar and total sugar contents, and significantly negatively correlated with β-carotene content. The a* value was significantly negatively correlated with moisture, lutein, β-carotene, scopoletin and starch contents, and significantly positively correlated with other chemical components. It could be due to the decrease in the moisture content of wet basis of tobacco leaves affected the activity of Polyphenol oxidase (PPO) and the content of malondialdehyde (MDA), which caused the change of polyphenol content40. The b*value was significantly positively correlated with moisture and reducing sugar contents. Due to the acceleration of Maillard reaction at higher temperatures, the interaction between sugars and amino acids is easy to form brown compounds, resulting in darker color25.
The L* value was significantly positively correlated with Trp (Fig. 3). The a* value was significantly negatively correlated with Asp, Thr and Ser, and significantly positively correlated with Pro and total free amino acids. The b* value was significantly positively correlated with Phe and Trp. Overall, the change in apparent color of tobacco leaves during curing was closely related to the content of chemical components of tobacco leaves.
Heat map of the correlation between color values and free amino acids content of tobacco leaves during curing (P ≤ 0.05). Note: Phe: phenylalanin, Try: tryptophan, Tyr: 360 tyrosine, Asp: aspartic acid, Glu: glutamic acid, Lys: lysine, His: histidine, Arg: arginine, Ala: alanine, Gly: glycine, Ile: isoleucine, Leu: leucine, Val: valine, Thr: threonineand, Ser: serine, Cys: cystine, Asn: asparagine, Pro: proline, β-Ala: β-alanine, β-AiBA: β-Aminoisobutyric acid, γ-ABA: γ-aminobutyric acid.
Construction and validation of prediction model for chemical components of tobacco leaves during curing process
The PLSR, RR, SVM and RF algorithms were used to construct the prediction model of chemical components of tobacco leaves during flue-cured tobacco curing. It can be seen from Table 7 that the moisture, lutein, β-carotene, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids prediction models constructed by four modeling methods based on color values are all good. Among them, the RF algorithm was used to construct the moisture, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids prediction model of tobacco leaves during curing process, the accuracies of the models were the highest. The RR algorithm was used to construct the prediction model of lutein and β-carotene in tobacco curing process with the highest accuracy. The model validation set R2of these indicators was high, and the RPD is basically higher than 2.0. This indicted that it was feasible to predict the content of some chemical components in tobacco leaves by color quantification. Zhu15proposed that a tobacco chemical component analysis method based on a neural network (TCCANN) performed simultaneous quantitative analysis of multiple chemical compositions of tobacco by using near-infrared (NIR) hyperspectroscopy imagery. However, the proposed TCCANN cannot determine the completely accurate determination of chemical components of tobacco. In this study, a more accurate starch prediction model was constructed by color quantization machine learning algorithm. The reason may be that RF is a non-linear ensemble algorithm, which is suitable for modeling high-dimensional data samples41.
The scatterplots represent the RR models for two chemical components (lutein and β-carotene) in Fig. 4, and the RF models for the fourteen chemical components (Moisture, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids) in Fig. 5. A small difference between estimated and measured values is shown, and most of the points are evenly and compactly distributed along the diagonal. The closer these points are to the diagonal, the higher their predicted values are, and the better the model fits. The clear linear relationships between the predicted and measured values of the sixteen chemical components are observed, which can predict some chemical components by color quantization.
The estimation model of moisture content, reducing sugar, total sugar and starch content in RF algorithm constructed by Fujian Nanping tobacco samples has high accuracy (model validation set: R2 > 0.90, RPD > 2.0). On the one hand, the color value in correlation analysis is significantly correlated with moisture, reducing sugar, total sugar and starch. On the other hand, the RF algorithm can better deal with the multicollinearity problem, and successfully identify moisture content, reducing sugar, total sugar and starch content as strong predictors, thereby improving the accuracy of the model. Therefore, the samples other than these four high-precision models (Qujing, Yunnan) are tested. It can be seen from Table 8 that the R2 and RPD of the test set of the prediction model of moisture content in tobacco leaves during curing were 0.84 and 3.51, respectively. The R2 and RPD of the test set of reducing sugar content estimation model were 0.73 and 3.91, respectively. The R2 and RPD of the test set of total sugar content estimation model were 0.77 and 3.95, respectively. The test set R2 and RPD of the starch content estimation model were 0.82 and 5.27, respectively. It showed that the RF algorithm had high accuracy and stable fitting effect in estimating the contents of reducing sugar, total sugar and starch in tobacco leaves outside the model during curing process.
From Fig. 6, it can be seen that the measured values and estimated values of the test set of the RF algorithm estimation model with higher accuracy are evenly distributed near the 1 : 1 line. The performance of the estimation model constructed in Fujian is considered good, as indicated by the R²and RPD values of the four chemical components in the validation set of samples from Yunnan that were external to the model. The model can thus make fairly accurate estimations of these four chemical components. Some samples that were not accurately estimated can be attributed to regional differences in cultivation conditions and climate, among other factors12. This also indicates that there is room for improvement in the model’s performance. Going forward, adjustments to model parameters, the addition of more relevant features, and the use of more advanced algorithms will be employed to address these issues.
Conclusions
Changes in the surface color of tobacco leaves are related to the internal chemical content of the tobacco. The correlationship between color values and chemical components of tobacco leaves was analyzed and the results showed that the a* value of tobacco leaves was significantly negatively correlated with moisture, lutein, β-carotene, scopoletin, starch, Asp, Thr and Ser. The b* value of tobacco leaves was significantly positively correlated with moisture content and reducing sugar content. The PLSR, RR, SVM and RF algorithms were used to establish the prediction models of chemical components of tobacco leaves during flue-cured tobacco curing. The results indicated that the RF algorithms performed best to predict tobacco moisture, reducing sugar, total sugar and starch in the curing process, with the R2 values of the model validation sets were more than 0.90 and the RPD values were more than 2.0. These results confirm that it is possible to monitor the chemical components of tobacco in real time during curing process. Based on the color values of tobacco, machine learning algorithm can be used to establish the prediction model of tobacco chemical components in the curing process, which can quickly and accurately predict the moisture, lutein, β-carotene, total polyphenols, reducing sugar, total sugar, starch and partial free amino acids in tobacco, provide theoretical basis and methodological reference for intelligently monitoring of tobacco curing status and improving the quality of tobacco. Further researches would focus on the application of the predicted model on the devices monitoring the images of tobacco leaves to realize the rapid acquisition of chemical components of tobacco leaves during curing process.
Data availability
The data presented in this study are available on request from the corresponding author (guoweimin1984@sina.com).
References
Hać, P. et al. Evaluation of mercury content in combustible tobacco products by employing cold vapor atomic absorption spectroscopy and considering the moisture content: a comprehensive study. Monatshefte für Chemie - Chem. Monthly. 153, 829–836. https://doi.org/10.1007/s00706-022-02965-1 (2022).
Condorí, M. et al. Image processing for monitoring of the cured tobacco process in a bulk-curing stove. Comput. Electron. Agric. 168 https://doi.org/10.1016/j.compag.2019.105113 (2020).
Chen, Y. et al. Dynamic changes in physiological and biochemical properties of flue-cured tobacco of different leaf ages during flue-curing and their effects on yield and quality. BMC Plant Biol. 19 https://doi.org/10.1186/s12870-019-2143-x (2019).
Wang, Y. & Qin, L. Research on state prediction method of tobacco curing process based on model fusion. J. Ambient Intell. Humaniz. Comput. 13, 2951–2961. https://doi.org/10.1007/s12652-021-03129-5 (2021).
Zou, C. et al. Different yellowing degrees and the industrial utilization of flue-cured tobacco leaves. Scientia Agricola. 76, 1–9. https://doi.org/10.1590/1678-992x-2017-0157 (2019).
Zong, J. et al. Effect of two drying methods on chemical transformations in flue-cured tobacco. Drying Technol. 40, 188–196. https://doi.org/10.1080/07373937.2020.1779287 (2020).
Rochester, M. The chemical changes that occur during the curing of tobacco leaves. Science, 397–399 (1931).
Tegan, A. M. et al. Highly resolved systems Biology to dissect the etioplast-to-chloroplast transition in Tobacco leaves. Plant Physiol. 180, 654–681. https://doi.org/10.1104/pp.18.01432 (2019).
Song, Z. et al. The mechanism of Carotenoid Degradation in Flue-cured Tobacco and Changes in the related enzyme activities at the Leaf-Drying Stage during the bulk curing process. Agricultural Sci. China. 9, 1381–1388. https://doi.org/10.1016/s1671-2927(09)60229-7 (2010).
Matheis, G. & R, W. J. Modification of proteins by polyphenol oxidase and peroxidase and their products. J. Food Biochem. 8, 137–162. https://doi.org/10.1111/j.1745-4514.1984.tb00322.x (1983).
Zhang, Q. et al. Microbial and enzymatic changes in cigar tobacco leaves during air-curing and fermentation. Appl. Microbiol. Biotechnol. 107, 5789–5801. https://doi.org/10.1007/s00253-023-12663-5 (2023).
Li, Y. et al. Cold stress in the harvest period: effects on tobacco leaf quality and curing characteristics. BMC Plant Biol. 21 https://doi.org/10.1186/s12870-021-02895-w (2021).
Tang, Z. et al. Climatic factors determine the yield and quality of Honghe flue-cured tobacco. Sci. Rep. 10 https://doi.org/10.1038/s41598-020-76919-0 (2020).
Wang, D. et al. A lightweight convolutional neural network for nicotine prediction in tobacco by near-infrared spectroscopy. Front. Plant Sci. 14 https://doi.org/10.3389/fpls.2023.1138693 (2023).
Zhu, Z. et al. A Long Short-Term Memory Neural Network Based Simultaneous Quantitative Analysis of Multiple Tobacco Chemical Components by Near-Infrared Hyperspectroscopy images. Chemosensors. 10 https://doi.org/10.3390/chemosensors10050164 (2022).
Wei, K. et al. On-Line monitoring of the Tobacco Leaf Composition during Flue-Curing by Near-Infrared spectroscopy and deep transfer learning. Anal. Lett. 55, 2089–2107. https://doi.org/10.1080/00032719.2022.2046021 (2022).
Huang, J. et al. Application and comparison of several machine learning algorithms and their integration models in regression problems. Neural Comput. Appl. 32, 5461–5469. https://doi.org/10.1007/s00521-019-04644-5 (2019).
Qian, Y. Exploration of machine algorithms based on deep learning model and feature extraction. Math. Biosci. Eng. 18, 7602–7618. https://doi.org/10.3934/mbe.2021376 (2021).
Zhang, X. et al. Sensory evaluation and prediction of bulk wine by physicochemical indicators based on PCA-PSO‐LSSVM method. Food Process. Preservation. 3, 46. https://doi.org/10.1111/jfpp.16343 (2022).
Zhang, H. et al. Determination of soluble solids content in oranges using visible and near infrared full transmittance hyperspectral imaging with comparative analysis of models. Postharvest Biol. Technol. 163, 111148 (2020).
Liu, H., Duan, S. & Luo, H. Design and Temperature Modeling Simulation of the Full Closed Hot Air Circulation Tobacco Bulk Curing Barn. Symmetry 14, doi: (2022). https://doi.org/10.3390/sym14071300
Meng, Y. et al. Analysis of the relationship between color and natural pigments of tobacco leaves during curing. Sci. Rep. 14, 166 (2024).
Su, H. Data Research on Tobacco Leaf Image Collection Based on Computer Vision Sensor. J. Sens. 2021 (1-11). https://doi.org/10.1155/2021/4920212 (2021).
Ligor, M. & Buszewski, B. Study of Xanthophyll Concentration in Spinach leaves by Means of HPLC coupled with UV–VIS and Corona CAD detectors. Food. Anal. Methods. 5, 388–395. https://doi.org/10.1007/s12161-011-9256-7 (2011).
Long, M. et al. Effect of different combined moistening and redrying treatments on the physicochemical and sensory capabilities of smoking food tobacco material. Drying Technol. 36, 52–62. https://doi.org/10.1080/07373937.2017.1299752 (2017).
Ji, X. et al. Quantitative determination of polyphenols in tobacco leaves by HPLC. Agric. Environ. 11, 868–870 (2013).
Nirmaan, A. M. C., Prasantha, R., Peiris, B. L. & B. D. & Comparison of microwave drying and oven-drying techniques for moisture determination of three paddy (Oryza sativa L.) varieties. Chem. Biol. Technol. Agric. 7 https://doi.org/10.1186/s40538-019-0164-1 (2020).
Crandell, C. Continuous flow analysis the Auto-Analyzer. J. Autom. Chem. 7, 145–148 (1985).
Mahmoud, B., El-Sayed, A. & Mohame, Y. A. An extractive-spectrophotometric method for the determination of nicotine. Planta Rnedica. 27, 140–144 (1975).
Koistinen, J., Sjöblom, M. & Spilling, K. in Biofuels from Algae Methods in Molecular Biology 206, 81–86 (2019).
Yin, C. et al. Influence of physicochemical characteristics on the effective moisture diffusivity in Tobacco. Int. J. Food Prop. 18, 690–698. https://doi.org/10.1080/10942912.2013.845785 (2015).
Chen, J. et al. Influences of different curing methods on chemical compositions in different types of tobaccos. Ind. Crops Prod. 167 https://doi.org/10.1016/j.indcrop.2021.113534 (2021).
Wang, G. et al. Regional differences of free amino acids during aging and their relationship with sensory and appearance quality of tobacco strips. J. ofSouthern Agric. 21, 1176–1184. https://doi.org/10.3969/j.issn.2095-1191.2020.05.024 (2020).
Huang, J. et al. Predictive modeling of blood pressure during hemodialysis: a comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method. Comput. Methods Programs Biomed. 195, 1–30. https://doi.org/10.1016/j.cmpb.2020.105536 (2020).
Yan, J. et al. Prediction of retention indices for frequently reported compounds of plant essential oils using multiple linear regression, partial least squares, and support vector machine. J. Sep. Sci. 36, 2464–2471. https://doi.org/10.1002/jssc.201300254 (2013).
Meng, Y. et al. Relationship between heat/mass transfer and color change during drying process. J. Food Meas. Charact. 16, 4151–4160. https://doi.org/10.1007/s11694-022-01497-w (2022).
McGrath, T. E. et al. Phenolic compound formation from the low temperature pyrolysis of tobacco. J. Anal. Appl. Pyrol. 84, 170–178. https://doi.org/10.1016/j.jaap.2009.01.008 (2009).
Banožić, M. et al. Carbohydrates-Key players in Tobacco Aroma formation and quality determination. Molecules. 25, 1734. https://doi.org/10.3390/molecules25071734 (2020).
JIN, Y. et al. Simultaneous Detection and Analysis of free amino acids and glutathione in different shrimp. Foods. 17, 11 (2022).
Zhao, S. et al. Determination of optimum humidity for air-curing of cigar tobacco leaves during the browning period. Ind. Crops Prod. 183 https://doi.org/10.1016/j.indcrop.2022.114939 (2022).
Zhang, Y. et al. Quantitative analysis of routine chemical constituents in tobacco by near-infrared spectroscopy and support vector machine. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 71, 1408–1413. https://doi.org/10.1016/j.saa.2008.04.020 (2008).
Acknowledgements
This work was supported by the major science and technology program of CNTC (NO.110202101084) and the major science and technology program of CNTC (NO.110202201051).
Author information
Authors and Affiliations
Contributions
Y. M. and Q. X.: Writing - original draft, Writing - review & editing. G. C., J. L. and S. Z.: Validation. Y.Z., A.W., J.W.: Formal analysis. D.Y., X.C., Q.L. and Q.Z.: Investigation. J.L. and X.C.: Data curation. W.G. and Y.W.: Conceptualization, Methodology, Resources. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests. Yang Meng, Qiang Xu, Yanling Zhang, Aiguo Wang, Jianwei Wang and Weimin Guo were employed by Zhengzhou Tobacco Research Institute of CNTC (Zhengzhou, China). Guangqing Chen, Jianjun Liu and Shuoye Zhou were employed by Henan Provincial Tobacco Company (Zhengzhou, China). Ding Yan and Xianjie Cai were employed by Shanghai Tobacco Company (Shanghai, China). Junying Li and Xuchu Chen were employed by Pingdingshan Branch of Henan Provincial Tobacco Company (Pingdingshan, China). Qiuying Li and Qiang Zeng were employed by Nanping Branch of Fujian Provincial Tobacco Company (Nanping, China). The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Ethics declarations
The authors declare that the collection of plant material (tobacco leaves) and the experimental studies complied with the relevant institutional, national and international guidelines and legislation. The authors confirm that all methods were carried out in accordance with relevant guidelines in the method section. The authors comply with the IUCN Policy Statement on Research Involving Species at Risk of Extinction and the Convention on the Trade in Endangered Species of Wild Fauna and Flora. The authors ensure that the collection of tobacco leaves samples had been licensed by the local tobacco production regulatory agency.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Meng, Y., Xu, Q., Chen, G. et al. Regression prediction of tobacco chemical components during curing based on color quantification and machine learning. Sci Rep 14, 27080 (2024). https://doi.org/10.1038/s41598-024-78426-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-78426-y
Keywords
This article is cited by
-
Bacteria dynamics and its correlation with chemical composition changes in tobacco leaves during flue curing
Applied Microbiology and Biotechnology (2025)