Table 2 Predictive validity of models.

From: A correlation-based feature analysis of physical examination indicators can help predict the overall underlying health status using machine learning

 

Training set (sample size)

Validation set (sample size)

F1-score (95%CI)

Specificity (95%CI)

Recall (sensitivity) (95%CI)

ROC (AUC) (95%CI)

Cholecystolithiasis

963

413

0.69 (0.66–0.71)

0.73 (0.70–0.76)

0.70 (0.67–0.72)

0.77 (0.75–0.79)

Hypertension

46,171

17,987

0.86 (0.86–0.86)

0.82 (0.82–0.82)

0.85 (0.85–0.86)

0.92 (0.92–0.93)

Hypertension + diabetes

8414

3065

0.92 (0.92–0.92)

0.90 (0.90–0.90)

0.92 (0.92–0.92)

0.97 (0.97–0.98)

Hypertension + coronary heart disease

2032

871

0.93 (0.92–0.93)

0.90 (0.90–0.91)

0.93 (0.92–0.93)

0.97 (0.97–0.98)

Hypertensive + diabetes + coronary heart disease

909

389

0.96 (0.95–0.96)

0.95 (0.94–0.95)

0.95 (0.94–0.97)

0.99 (0.98–0.99)

Hyperlipidemia

1687

722

0.86 (0.85–0.86)

0.82 (0.81–0.83)

0.85 (0.84–0.86)

0.94 (0.93–0.94)

Coronary heart disease

1298

557

0.90 (0.89–0.92)

0.87 (0.84–0.89)

0.90 (0.88–0.91)

0.96 (0.96–0.97)

Coronary heart disease + diabetes

274

118

0.93 (0.93–0.94)

0.91 (0.88–0.95)

0.94 (0.92–0.95)

0.98 (0.97–1.00)

Rhinallergosis

152

66

0.71 (0.64–0.79)

0.80 (0.73–0.86)

0.70 (0.64–0.76)

0.79 (0.73–0.84)

Hypothyroidism

1627

698

0.77 (0.76–0.78)

0.73 (0.71–0.74)

0.76 (0.75–0.76)

0.84 (0.83–0.86)

Hyperthyroidism

751

322

0.72 (0.71–0.72)

0.73 (0.71–0.75)

0.73 (0.71–0.74)

0.79 (0.77–0.81)

Cervical spondylopathy

311

134

0.71 (0.66–0.75)

0.73 (0.64–0.81)

0.71 (0.69–0.73)

0.78 (0.78–0.80)

Rheumatoid arthritis

387

167

0.79 (0.78–0.81)

0.76 (0.71–0.81)

0.78 (0.77–0.79)

0.86 (0.83–0.89)

Chronic rhinitis

313

135

0.61 (0.58–0.64)

0.57 (0.56–0.57)

0.61 (0.58–0.63)

0.66 (0.60–0.72)

Nephropathy

564

242

0.73 (0.72–0.74)

0.81 (0.79–0.84)

0.76 (0.75–0.78)

0.84 (0.82–0.85)

Diabetes

11,545

4949

0.90 (0.90–0.90)

0.90 (0.89–0.90)

0.90 (0.90–0.90)

0.96 (0.96–0.96)

Gout

2095

898

0.88 (0.88–0.88)

0.85 (0.84–0.87)

0.86 (0.83–0.88)

0.94 (0.94–0.94)

Parkinson's syndrome

192

83

0.91 (0.90–0.91)

0.87 (0.79–0.94)

0.90 (0.89–0.91)

0.97 (0.95–0.98)

Stomach trouble

1269

545

0.68 (0.68–0.69)

0.71 (0.70–0.73)

0.70 (0.70–0.70)

0.77 (0.75–0.78)

Chronic pharyngitis

765

329

0.63 (0.62–0.65)

0.66 (0.65–0.66)

0.67 (0.65–0.68)

0.72 (0.69–0.75)

Lumbar disc protrusion

377

162

0.77 (0.72–0.81)

0.70 (0.63–0.77)

0.75 (0.70–0.79)

0.85 (0.82–0.88)

Hepatitis B

691

297

0.73 (0.70–0.77)

0.79 (0.77–0.80)

0.75 (0.72–0.77)

0.83 (0.81–0.85)

Hypertension + other diseases

2360

1012

0.86 (0.85–0.88)

0.82 (0.81–0.83)

0.86 (0.85–0.86)

0.93 (0.93–0.94)

Coronary heart disease + other diseases

98

43

0.88 (0.84–0.92)

0.83 (0.76–0.90)

0.86 (0.83–0.88)

0.94 (0.91–0.97)

Diabetes + other diseases

365

157

0.90 (0.87–0.94)

0.89 (0.84–0.94)

0.90 (0.86–0.94)

0.96 (0.93–0.98)

Bronchial disease

562

241

0.76 (0.70–0.83)

0.80 (0.76–0.84)

0.77 (070–0.83)

0.83 (0.79–0.88)

Other disease conditions

2720

1167

0.68 (0.67–0.70)

0.69 (0.66–0.73)

0.69 (0.67–0.71)

0.75 (0.74–0.77)

Brain diseases

251

108

0.86 (0.81–0.90)

0.83 (0.75–0.90)

0.87 (0.82–0.91)

0.93 (0.91–0.95)

Hepatic adipose infiltration

803

115

0.82 (0.78–0.87)

0.75 (0.67–0.82)

0.82 (0.77–0.87)

0.92 (0.89–0.94)

Asthma

1640

803

0.75 (0.74–0.76)

0.77 (0.69–0.84)

0.75 (0.74–0.76)

0.88 (0.84–0.92)

Other cardiac diseases

336

145

0.79 (0.78–0.81)

0.80 (0.75–0.86)

0.78 (0.76–0.80)

0.88 (0.84–0.92)

Heart disease

224

96

0.89 (0.87–0.90)

0.91 (0.87–0.94)

0.89 (0.86–0.92)

0.94 (0.90–0.99)

Hepatopathy

176

76

0.71 (0.65–0.76)

0.78 (0.68–0.87)

0.73 (0.68–0.78)

0.80 (0.74–0.85)

Pregnant

54

24

0.83 (0.76–0.90)

0.85 (0.79–0.92)

0.82 (0.77–0.86)

0.91 (0.90–0.93)

Normal or non-normal condition

91,028

39,012

0.83 (0.83–0.83)

0.81 (0.81–0.81)

0.84 (0.84–0.84)

0.9 (0.90–0.90)

  1. The number of training sets and valid set samples was obtained after under-sampling and data random splitting. Normal condition or disease was used to classify all types of diseases into disease states, followed by under-sampling with a sample of healthy people and data division.
  2. ROC receiver operating characteristic, AUC area under the curve.