Introduction

As a role player in every pathogenesis, inflammation is the most common mechanism in almost every disease and situation in living organisms. Inflammatory pathways have been the center of numerous mechanisms from autoimmune diseases, to infections and cancers1. One of the well-known and available tests by which we can measure inflammation in a human system is C-reactive protein also known as CRP2. As CRP is mostly produced by hepatocytes in response to pro-inflammatory cytokines, the level of this protein can be detected by a relatively newly introduced test called high sensitivity CRP (hs-CRP). This marker is shown to successfully predict various diseases such as cardiovascular diseases, diabetes, cancers as well and autoimmune diseases which are the inseparable part of these diseases’ blood profile tests3,4. Alongside the diseases that can alter the level of hs-CRP in the human sera, it has been imagined that other serum biomarkers can affect the hs-CRP levels in the serum by different and sometimes unknown mechanisms5. Cardiovascular diseases (CVD), depression, anxiety, and diabetes as the major diseases that hs-CRP level can be altered, are widely studied and reported that they initiate inflammatory responses, stimulate the production of inflammatory cytokines, and increase hypothalamic-pituitary axis activity6.

The decision tree (DT) classifier is considered one of the most famous methods for data classification. Different researchers from various fields including medicine have considered the problem of expanding DTs from available data, such as machine learning, pattern recognition, and statistics7. DT is a nonparametric model which is relatively easy to use and interpret compared to other models8. DT is widely used in medicine and is a trustworthy method to predict the outcome using it.

When it comes to sensitive inflammatory factors such as hs-CRP, DT can help organize the vast range of variables that can affect the level of hs-CRP. DT models help prioritize the important variables in branches that can affect the level of hs-CRP more effectively. Therefore, in this study, we tend to use DT modeling in predicting the association between hematological and demographic factors of a large population with the hs-CRP level. These findings may aid physicians and researchers in better understanding the factors interfering with and altering the hs-CRP level as one of the major paraclinical findings in almost every disease and condition.

Method

Study population

The data were obtained from phase I of the Mashhad stroke and heart atherosclerotic disorder (MASHAD) study, a 10-year cohort conducted in northeast of Iran. Several 9704 participants aged 35 to 65 years were recruited by stratified cluster random sampling technique. All participants provided written informed consent and the study protocol was approved by the Ethical Committee of Mashhad University of Medical Sciences. All participants and authors where blind to their results of the laboratory data. More details regarding the study design and methodology have been described before2,9,10. All methods were performed in accordance with the Declaration of Helsinki guideline and regulations11.

Baseline examination

Participants provided blood samples for analysis. Blood samples were collected via venipuncture of an antecubital vein between 8 and 10 a.m. after a 14-h overnight fasting period. The samples were placed in 20 ml vacuum tubes while the individuals were in a sitting position, following a standard protocol. Within 30–45 min of collection, the blood specimens were centrifuged at room temperature to separate the serum and plasma into six 0.5 ml aliquots, which were subsequently sent to the Bu Ali Research Institute in Mashhad. We tried to assess the hs-CRP level of all samples similarly and avoid pre-analytical variations. Due to the delicacy of hs-CRP level we tried to be as much punctual as possible and all samples were measured in almost similar timing. Additionally, aliquots of serum were preserved at -80°C for future analysis. Low-density lipoprotein cholesterol (LDL-C) was determined from serum total cholesterol (TC), triglyceride (TG), and high-density lipoprotein cholesterol (HDL-C) concentrations using the Friedewald formula12, but only if serum TG concentrations were lower than 400 mg/dL. Dyslipidemia was defined as TC equal to or greater than 200 mg/dL (5.18 mmol/l), LDL-C equal to or greater than 130 mg/dl (3.36 mmol/l), TG equal to or greater than 150 mg/dL (1.69 mmol/l), or HDL-C less than 40 mg/dl (1.03 mmol/l) in men and less than 50 mg/dL (1.30 mmol/l) in women. Type 2 diabetes mellitus (T2DM) is characterized as fasting blood glucose (FBG) equal to or greater than 126 mg/dl or by current treatment with oral hypoglycemic agents or insulin13.

Hypertension was diagnosed in individuals with systolic blood pressure (SBP) at or above 140 mmHg and/or diastolic blood pressure (DBP) at or above 90 mmHg, as well as those who were taking antihypertensive medication14.

Psychometric assessments were conducted using the Beck's Anxiety Inventory (BAI) to calculate anxiety scores, with ranges indicating various levels of anxiety. Likewise, the Beck Depression Inventory-II (BDI-II) was utilized to assess depression, with specific score ranges corresponding to different levels of depression15.

Statistical analysis

All data were analyzed using the R Statistical Software (v4.1.2; R Core Team 2021) and the IBM SPSS Statistics (Version 27). All continuous data are expressed as mean ± SD and frequency (%) for categorical variables. All p-values < 0.05 were regarded as statistically significant. We used the t-test for continuous variables, and the Mann–Whitney test for non-normal data to compare the mean or median of the subjects hs-CRP < 3 mg/dL and hs-CRP \(\ge 3\) mg/dL. Also, we used the ANOVA test for continuous variables, and the Kruskal Wallis test for non-normal data to compare the mean or median of the subjects hs-CRP < 1 mg/dL, 1 mg/L < hs-CRP \(\le 3\) mg/dL and hs-CRP > 3 mg/dL. The chi-square test was implemented to investigate the association between the categorical variables, the binary and Three-part category outcome, hs-CRPs1.

To assess the multicollinearity between independent variables the variance inflation factor (VIF) as well as the computation of correlation coefficient was used. Generally, a correlation higher than 0.7 was considered a highly correlated variable that helps to recognize the possibility for multicollinearity1.

The logistic regression and linear regression were used to compute the odds ratios (OR) and coefficients respectively with their 95% confidence interval based on three models: All Models include the variables CVD, hypertension (HTN), SBP, Anxiety Score, Depression Score, FBG and copper adjusted for Age, physical activity (PAL), LDL, HDL, TG, History of CVD and T2DM. Also, model A adjusted for sex, Model B for male, and Model C for female were presented. The outcome in linear regression was ln (hs-CRP) and logistic regression was hs-CRP < 3 mg/dL and hs-CRP \(\ge 3. \text{mg}/\text{dL}\) All of the analyses were done separately for males and females.

DT model

We utilized a data mining approach to construct a predictive model of hs-CRP measurements, employing the Decision Tree (DT) methodology. A decision tree is a non-parametric method tailored to the characteristics of the target variable, designed to create a predictive model based on predictor variables 16,17,18,19. Specifically, in this study, we incorporated the CHAID technique within the DT. CHAID serves purposes in prediction, classification, and identifying interactions between variables. Various algorithms can be employed to construct a decision tree, such as CART, ID3, C4.5, and CHAID, each aiming to identify the most influential feature through chi-square tests, also known as CHAID. The Pearson metric is the default correlation measure in most programming libraries, for instance, Pandas in Python. The chi-square formula is used to determine significance,

$$\frac{{\sqrt {\left( {y - y^{\prime}} \right)^{2} } }}{{y^{\prime}}}$$

With y representing actual values and y' representing expected values, while successive splits indicate the order of importance of the predictor variables. To assess the accuracy, precision, and sensitivity of the decision tree algorithm, we employed the confusion matrix using SPSS software version 27. This allowed for the evaluation of the decision tree's performance.

Results

Characteristics of the study population

Overall, 9704 individuals were eligible for analysis (57% female). The mean age of the participants was 48.87 ± 8.43 and 47.55 ± 8.09 in males and females, respectively. The clinical characteristics of the participants at the baseline have been summarized in Table 1. The biochemical factors and clinical features were compared between both men and women using t-test, Mann–Whitney U test for non-normal data, and chi-square test for categorical data. All variables had significant differences between the two groups (p < 0.001) except FBG (p = 0.345), HTN (p = 0.114), and T2DM (p = 0.143).

Table 1 Baseline characteristics of male and female.

The clinical characteristics of the participants in three different hs-CRP levels at the baseline have been summarized in Table 2. All variables had significant differences between the three groups (P < 0.001) except the family history of CVD (p = 0.62).

Table 2 Baseline characteristics with tertial divided hs-CRP.

Two data mining techniques were used to investigate the relationship between biochemical factors and confiner variables predictors and binary response variables (hs-CRP < 3 mg/dL, and hs-CRP \(\ge 3\text{ mg}/\text{dL}\)) and Ln (hs-CRP). So, the main objective of this study was to anticipate hs-CRP using the LRs and DT models (binary and tertiles hs-CRP) and to determine their associated factors, especially biochemical factors markers. For this purpose, the dataset in the DT model was randomly split into two parts: training data, and test data (25%-75%). The training dataset was utilized to develop the DT model, which was then validated using test data (25%) that hadn't been used during training.

The association between biochemical factors, clinical features, and hs-CRP using logistic regression (LR) and linear regression model

Table 3 showed the result of linear regression model with both log and binary hs-CRP. CVD, HTN, SBP, Anxiety Score, Depression Score and FBG (mg/dL) were reported to be included in the analysis and the models adjusted for Sex, Age, PAL, LDL, HDL, TG, History of CVD and T2DM as confounding factors. The results of LR in model are divided into A (both genders), B (male), and C (female) (Table 3A). In this part, all biochemical factors and clinical features entered the model in adjusted status and to examine the significance of each, non-significant variables were excluded from the model until all the biochemical factors and clinical features in the model become significant.

Table 3 Regression with log (hs-CRP) and with binary hs-CRP (hs-CRP < 3 vs hs-CRP > 3) response variable.

Table 3A shows the ORs and their 95% confidence intervals (CIs) for incident hs-CRP of linear regression in models A, B, and C. The results in Model A showed that the anxiety score, depression score, and SBP were significant (P < 0.05), while CVD and HTN were not significant. The most important variable with a high effect was the depression score with OR = 1.009 with 95% CI = (1.005, 1.013) and CVD with OR = 1.357 with 95% CI = (1.099, 1.677) respectively in models B and C. Also, the model C risk of incident hs-CRP for each unit increase in CVD = Positive was linearly increased by 0.357. In other words, for each unit increasing in CVD = Positive, the chance of incident hs-CRP \(\ge 3\) increases by 1.357 times but in the models A and B were not significant. The results of Models A and B showed that the depression score (OR = 1.003, 95%CI = (1.000, 1.006) and OR = 1.009, 95%CI = (1.005, 1.013)) respectively were increased hs-CRP \(\ge 3\) but model C was not significant. The results of all models showed the anxiety score was significant. Other biochemical factors and clinical features indices were excluded because of multicollinearity.

The method of the models in Table 3B is similar to the method of the models in Table 3A. Table 3B shows the anxiety score, depression score, SBP, and FBG were significant (P < 0.05) in model A, while the SBP of model B, and depression score in model C were not significant. The most important variable with a high effect was the anxiety score (OR = 1.009) in models A, and C and depression score (OR = 1.024) in model B.

The results of the LR training and testing confusion matrix for biochemical factors, and clinical features in sexual factors are shown in Table 7. The LR algorithm evaluated the various (hs-CRP \(<\) 3 \(\text{mg}/\text{dL}\) and hs-CRP \(\ge 3 \text{mg}/\text{dL}\)) risk factors and categorized them into two groups. Training and testing specificity ranged from 94 to 99.8% with highest in male’s group. Precision in training and testing in LR model ranged from 48 to 80% with highest in testing the model for males. Lastly, accuracy ranged from 67.3 to 76.7% with highest in male’s testing model. Overall, the results were more favorable in male subgroup followed by both genders and lastly, female’s subgroup. The Area under the curve (AUC) and F1 score are also reported in the Table 7. The highest AUC belongs to female subgroup training (63.4%) and highest F1-score belongs to test for both genders (12.5%). Generally, the F1-score in LR model was lower compared to both DT models.

The association between biochemical factors, clinical features, and hs-CRP using DT models

Binary hs-CRP using DT models

Figure 1 illustrates the outcomes of the DT training for biochemical factors, and clinical features in male factors. The DT algorithm determined the various binary factor (hs-CRP < 3 mg/dL vs hs-CRP \(\ge 3 \text{mg}/\text{dL}\)) risk factors and categorized them into 3 layers. According to the DT model, the first variable (root) has the highest significance for classifying data, while the subsequent variables have lower significance. As shown in Fig. 1, Depression Score has the most crucial effect on hs-CRP development risk, followed by FBG, Cholesterol, and Anxiety Score. In the subgroup with 10 < depression score \(\le\) 22 and FBG \(>\) 97 \(\text{mg}/\text{dL}\), 38.3% of participants were hs-CRP (highest risk of hs-CRP \(\ge 3 \text{mg}/\text{dL}\)). Meanwhile, among those with a Depression Score \(\le\) 10 and Cholesterol \(\le\) 217 \(\text{mg}/\text{dL}\), 82.1% of subjects were identified as hs-CRP (lowest risk of hs-CRP < 3 \(\text{mg}/\text{dL}\)). Detailed rules for hs-CRP for males created by the DT model are demonstrated in Table 4.

Fig. 1
figure 1

Decision tree for hsCRP (binary hsCRP) event in male.

Table 4 Detailed rules based on DT models with hs-CRP binary.

The results of the DT training for biochemical factors, and clinical features in female factors are shown in Fig. 2. The DT algorithm evaluated the various hs-CRP < 3 \(\text{mg}/\text{dL}\) vs hs-CRP \(\ge 3\) \(\text{mg}/\text{dL}\) risk factors and categorized them into three layers. In the DT model, the first variable (root) is of the highest importance, with the following variables in the next levels of significance, accordingly. As shown in Fig. 2, FBG has the most crucial effect on hs-CRP development risk, followed by Anxiety Score, Cholesterol, TG, and Depression Score. In the subgroup with 97 < FBG \(\le 120\), Anxiety Score \(\le\) 25, and Cholesterol \(>\) 241, 62.5% of participants also had FBG \(>120\text{ mg}/\text{dL}\), TG \(>120\text{ mg}/\text{dL}\), and Depression Score \(>\) 25, 57.7% of participants were hs-CRP (highest risk of hs-CRP \(\ge 3\text{ mg}/\text{dL}\)). Meanwhile, among those with FBG \(\le 82\text{ mg}/\text{dL}\), Cholesterol \(\le\) 158, and FBG \(\le 76\), 84% of subjects were identified as hs-CRP (lowest risk of hs-CRP < 3 \(\text{mg}/\text{dL}\)). In layer 82 < FBG \(\le 97\) and Anxiety Score \(\le\) 10 by increasing cholesterol, it increases the risk of certain hs-CRP \(\ge 3\text{ mg}/\text{dL}\). Also, with FBG \(>120\text{ mg}/\text{dL}\) and TG \(>120\text{ mg}/\text{dL}\) by increasing Depression Score, 12.6% of participants were hs-CRP (highest risk of hs-CRP \(\ge 3\text{ mg}/\text{dL}\)). Detailed rules for hs-CRP for females created by the DT model are demonstrated in Table 4.

Fig. 2
figure 2

Decision tree for hsCRP (binary hsCRP) event in female.

Figure 3 illustrates the outcomes of the DT training for biochemical factors, and clinical features in sexual factors. The DT algorithm determined the various binary factor (hs-CRP < 3 \(\text{mg}/\text{dL}\) vs hs-CRP \(\ge 3 \text{mg}/\text{dL}\)) risk factors and categorized them into 3 layers. According to the DT model, the first variable (root) has the highest significance for classifying data, while the subsequent variables have lower significance. Figure 3 illustrates that FBG followed by Cholesterol and Anxiety Score, Sex, and TG has the greatest impact on the hs-CRP presence risk. In the subgroup with FBG \(>120\text{ mg}/\text{dL}\), Sex = Female, and TG \(>91 \text{mg}/\text{dL}\), 59.3% of participants were hs-CRP (highest risk of hs-CRP \(\ge 3 \text{mg}/\text{dL}\)). Participants with FBG ≤ 79 \(\text{mg}/\text{dL}\), Cholesterol ≤ 159 \(\text{mg}/\text{dL}\), and SBP ≤ 106.66 mmHg had lower hs-CRP \(\ge 3 \text{mg}/\text{dL}\), according to the DT model, than those with higher TG and FBG levels (0.30 vs. 0.593 incident rate). Table 4 illustrates the specific hs-CRP rules developed by the DT model. Therefore, Anxiety Score, FBG, Depression Score, and Cholesterol were thus determined to be the most crucial variables in the DT model's sexual factor. Also, the confusion matrixes were presented in Table 5 for DT models binary hs-CRP. The highest AUC belongs to female training with 67.8% and the highest F1-score belongs to training in male subgroup with 86.7%. generally, the F1-score showed more favorable results in DT models compared to LR model (Table 5).

Fig. 3
figure 3

Decision tree for hsCRP (binary hsCRP) event in sexual.

Table 5 Performance indices of the DT models with binary hsCRP.

Tertial hs-CRP using DT model

Figure 4 illustrates the outcomes of the DT training for biochemical factors, and clinical features in male factors. The DT algorithm determined the various tertial factors (hs-CRP \(\le\) 1 \(\text{mg}/\text{dL}\), 1 \(\text{mg}/\text{dL}\)< hs-CRP \(\le\) 3 \(\text{mg}/\text{dL},\) and hs-CRP \(>3\)) risk factors and categorized them into 3 layers. According to the DT model, the first variable (root) has the highest significance for classifying data, while the subsequent variables have lower significance. As shown in Fig. 4, FBG has the most crucial effect on hs-CRP development risk, followed by Depression Score, Cholesterol, and SBP. In the subgroup with FBG \(>\) 97 and Depression Score \(\le\) 16 and increasing SBP increased hs-CRP levels (0.25 vs. 0.26 incident rate in hs-CRP \(\le 1 \text{mg}/\text{dL and} 0.50\text{ vs}. 0.56\text{ incident rate in }1\text{ mg}/\text{dL }<\text{ hs}-\text{CRP}\le 3 \text{mg}/\text{dL}\)). Meanwhile, among those with FBG ≤ 82, Cholesterol ≤ 193 and History.CVD + , 17% of subjects were identified as hs-CRP (lowest risk of hs-CRP < 3 \(\text{mg}/\text{dL}\)), and 48.1% of subjects were identified as hs-CRP \(\le\) 1 \(\text{mg}/\text{dL}\). Detailed rules for hs-CRP for males created by the DT model are demonstrated in Table 6.

Fig. 4
figure 4

Decision tree for hsCRP (tertials hsCRP) event in male.

Table 6 Detailed rules based on DT models with tertile divided hsCRP.

The results of the DT training for biochemical factors, and clinical features in female factors are shown in Fig. 5. The DT algorithm evaluated the various (hs-CRP \(\le\) 1 \(\text{mg}/\text{dL}\), 1 \(\text{mg}/\text{dL}\)< hs-CRP \(\le\) 3 \(\text{mg}/\text{dL}\), and hs-CRP \(>3 \text{mg}/\text{dL}\)) risk factors and categorized them into three layers. In the DT model, the first variable (root) is of the highest importance, with the following variables in the next levels of significance, accordingly. As shown in Fig. 5, FBG has the most crucial effect on hs-CRP development risk, followed by Cholesterol, SBP, TG, and Depression Score. In the subgroup with FBG \(>\) 98, Cholesterol \(>\) 241 and Depression Score \(\le\) 16, 54.7% of participants also had FBG \(>\) 98 \(\text{mg}/\text{dL}\), Cholesterol \(>\) 241 \(\text{mg}/\text{dL}\), and Depression Score \(>\) 16, 65.4% of participants were hs-CRP (highest risk of hs-CRP \(\ge 3\text{ mg}/\text{dL}\)). Meanwhile, among those with FBG \(>\) 98 \(\text{mg}/\text{dL}\), 187 \(\text{mg}/\text{dL}\)< Cholesterol \(\le\) 241 \(\text{mg}/\text{dL},\) and HTN-, 6.3% of subjects were identified as hs-CRP (lowest risk of hs-CRP < 1 \(\text{mg}/\text{dL}\)). In layer 82 < FBG \(\le\) 86 and SBP \(>\) 128.667 by increasing TG, it increases the risk of certain hs-CRP \(\ge 3\text{ mg}/\text{dL}\). Detailed rules for hs-CRP for females created by the DT model are demonstrated in Table 6.

Fig. 5
figure 5

Decision tree for hsCRP (tertials hsCRP) event in female.

The results of the DT training for biochemical factors, and clinical features in sexual factors are shown in Fig. 6. The DT algorithm evaluated the various (hs-CRP \(\le\) 1 \(\text{mg}/\text{dL}\), 1 \(\text{mg}/\text{dL}\)< hs-CRP \(\le\) 3 \(\text{mg}/\text{dL}\), and hs-CRP \(>3 \text{mg}/\text{dL}\)) risk factors and categorized them into three layers. In the DT model, the first variable (root) is of the highest importance, with the following variables in the next levels of significance, accordingly. As shown in Fig. 6, FBG has the most crucial effect on hs-CRP development risk, followed by Cholesterol, sexual, Anxiety Score, and Depression Score. In the subgroup with 79 < FBG \(\le\) 85, 145 < Cholesterol \(\le\) 222 increasing the Anxiety Score increases the risk of certain hs-CRP \(\ge 3\text{ mg}/\text{dL}\) and hs-CRP \(\le 1\) \(\text{mg}/\text{dL}\) also decreasing the risk of certain 1 < hs-CRP \(\le 3\). Meanwhile, among those with 85 \(\text{mg}/\text{dL}\)< FBG \(\le\) 97 \(\text{mg}/\text{dL}\), Cholesterol > 241 \(\text{mg}/\text{dL}\), and Sex = Female, 61.1% of subjects were identified as hs-CRP (highest risk of 1 \(\text{mg}/\text{dL}\)< hs-CRP \(\le\) 3 \(\text{mg}/\text{dL}\)). In layer 85 < FBG \(\le\) 97, 145 < Cholesterol \(\le\) 188 by increasing LDL, it increases the risk of certain 1 \(\text{mg}/\text{dL}\)<hs-CRP \(\le 3\text{ mg}/\text{dL}\) but decreases another various hs-CRP. Detailed rules for hs-CRP for females created by the DT model are demonstrated in Table 6. Also, the confusion matrixes were presented in Table 7 for DT models tertial hs-CRP. Table 7 illustrate also the F1-score and AUC of this model. The highest F1-score belongs to female subgroup training with 29.59%. the highest AUC belongs to testing model for both genders with 74.7%. generally, the F1-score in this model were not as favorable as binary DT model but was superior to LR model.

Fig. 6
figure 6

Decision tree for hsCRP (tertials hsCRP) event in sexual.

Table 7 Performance indices of the DT models with tertial hsCRP.

Discussion

CRP is the most-known and available serum factor that can indicate inflammation and inflammatory-related processes in the human body. Hs-CRP as a low-grade inflammatory marker presented as a key marker in epidemiological and cardiovascular studies can be altered by a wide range of variables in the human body which all somehow initiate the inflammatory response20. Similar studies had assessed the effect of hs-CRP in chronic coronary syndrome patients. but to our knowledge there are not similar studies that asses and predict hs-CRP in large population based on Community collection regardless of ACS21,22. We did not set ACS or any other disease as inclusion criteria which make the studies more relatable to society population. Additionally, ethnicity and culture play an important role on the findings of similar studies and previous studies had different ethnicity comparing to our study21,22. Thus, this cohort analysis study investigated the demographic and hematological features that might have a relation to the level of hs-CRP. The findings can be assed further with different ethnicities and populations to be included in the guidelines and help physicians in better understand, manage and treat related diseases and the role of hs-CRP in prognosis and incidence of such diseases.

On the other hand, Table 3 Baseline characteristics which have been divided by the tertial of the hs-CRP, indicated that almost all of the mentioned characteristics had been significantly associated with the level of hs-CRP. For instance, age, HDL, LDL, triglyceride anxiety score, depression score, HTN, T2DM, and even family history of T2DM were all significant. Cardiovascular family history and PAL were not associated with the level of hs-CRP in the baseline characteristics. As known and justified before HTN and T2DM both can cause and affect some level of inflammatory response in the human body by various mechanisms. There are several studies on cell–cell interactions from intracellular to extracellular mechanisms that indicate inflammation pathways related to increase of hs-CRP levels23,20,21,22,23,28. Some meta-analyses indicate that a higher level of hs-CRP is directly associated with a higher risk of both T2DM and HTN29,26,31. Additionally, studies have found that a family history of T2DM is associated with the occurrence of T2DM in one self's which indicates that genetics are one of the vital role players in the occurrence of T2DM10,32.

In both linear regression with log and logistic regression with binary hs-CRP, the results in males and females are pretty similar (Table 3). Studies with similar analysis indicated the importance and priority of LR model than other models in predicting hs-CRP level and their correlation with other factors33. In both modes, males' FBG levels as well as depression and anxiety scores were significantly related to the level of hs-CRP whereas in females SBP, CVD, FBG level, and anxiety score were significant. We tried to include variables which are easy to asceses and asses and can be evaluated in regions with no access to laboratory settings. As previously resulted inflammatory markers especially hs-CRP are capable of being a predictor of CVD in women. Previous studies have also resulted in the fact that changes in the cardiovascular system such as an increase in blood pressure and cardiovascular events have a more intense effect on the level of hs-CRP in women compared to men34. As consistent in previous studies the mean level of CRP is higher in females than males and this might cause the threshold level of CRP to reveal as significant in women rather than men35. Another thought-provoking finding is the alteration of hs-CRP level by anxiety and depression. This fact has been argued before by several studies as direct and direct causes such as obesity, and smoking36,37. The exact mechanism of the direct effect of anxiety on CRP levels is not clear but studies suggest that the endocrine and noninflammatory dysfunction are involved38. Also, as expected following the previous findings in this matter, FBG level has been related to the level of hs-CRP as an inflammatory factor with several proposed mechanisms5. In model A consisting of both genders all previously four discussed variables (anxiety and depression score, FBG level, and SBP) have been significantly related to the level of hs-CRP in both linear regression and logistic regression. Our findings in Table 3 are aligned with previous findings and suggest more coherent information in this area.

Leading to the next table, our findings indicate that in the first node depression score and cholesterol level are determinative variables as in female's FBG level and cholesterol are the ones in charge. As the third section indicates FBG, SBP, and cholesterol are the three major variables in the first node for both genders suggesting the significance of cholesterol in shaping the decision tree in the first place. Cholesterol is also the key variable in the first node in Table 6 which are rules based on DT models with tertial divided hs-CRP. In this table cholesterol among history of CVD and FBG in males and cholesterol and FBG in females and both genders are present in the first node.

Lastly bringing all the data together on the DT model with binary and tertial hs-CRP, findings show the results of these DT models in predicting the hs-CRP based on other variables. Generally, the accuracy, sensitivity, and precession numbers were acceptable considering that hs-CRP is a highly sensitive factor and most human conditions and statuses can significantly alter the level of this nonspecific inflammatory factor. In DT models with binary hs-CRP, the range of accuracy was between 71.4% to 76.7% with the highest for training DT for males and the lowest for testing DT model for both genders. In the matter of sensitivity, the results were mainly in a higher percentage than the precision with the highest of 98.7 in the testing of males' DT model and the lowest of 75.7% in the training of females’ DT model. These results indicate that the DT model can distinguish the true positive and false negative quite successfully but there are some false positive cases which lowered the precision percentage. This might be because hs-CRP can be altered by other variables that were not included in this cohort study. Considering Table 7, the results are nearly the same with the highest accuracy of 76.7% in training of DT model for males and the lowest of 70.5% in testing the DT model for females. Here again, the precision number (highest of 77% in training DT for males and lowest of 71.7% in testing DT for females) is lower than sensitivity (highest of 98.7% in testing DT for males and lowest of 75.7% in training DT for females) in total. By predicting hs-CRP level based on demographic data clinicians can better evaluate the patients on similar population by their demographic data and assess the variables and their role in diseases that are strongly linked to the hs-CRP level. Factors such as depression (based on questioner), cholesterol level and FBG showed to have a significant effect on hs-CRP prediction. Other study had indicated that triglyceride-glucose index can be significantly related to the hs-CRP levels and could help physicians in early prediction and management of ACS patients22. Similar data on hs-CRP level and their association with coronary heart disease (CHD) also indicated that DT model portrait favorable findings and results for prediction. The presence of was strongly associated with hs-CRP in DT model39.

Limitations

There are some limitations on this study that should be asses. Firstly, the DT and LR model used on this study was not further assessed in other datasets. Future studies can use these findings and similar analysis in some other data with various ethnicities to validate these findings. It is crucial for the future of ML and disease prediction that the models can be implied in as much population as possible.

Secondly, follow-ups data would benefit the study by following the patients throw diseases that can be related to hs-CRP level and assess the consequences and prediction of disease. Similar study designs in future can consider follow-ups for better monitoring and evaluating the results on this. By following the patient’s morbidity and mortality data analysis could significantly help the data for better understanding of the significant of hs-CRP.

Lastly, the study would benefit from other models to enhance the prediction of hs-CRP level. We recommend that further studies focus on these issues to better clear out the missing part of this findings. Other similar artificial learning models specially ML and neural network analysis could be helpful in comparing various data analysis and their results.

Conclusion

Considering the relatively impressable nature of hs-CRP as an inflammatory factor, the DT model was acceptably able to predict the hs-CRP level in this great cohort study. The results also indicated that factors such as anxiety and depression scores alongside FBG and systolic blood pressure and history of cardiovascular diseases are some of the main factors that can alter the level of hs-CRP. Our results are aligned with previous findings in this area affecting the hs-CRP level in which our study provided comprehensive and detailed findings about hs-CRP and various variables affecting it.