Table 2 Quantitative characteristics and stability of identified multivariate linear mixture models tested on full and undersampled dataset.

From: Multivariate linear mixture models for the prediction of febrile seizure risk and recurrence: a prospective case–control study

 

Dataset size

100%

90%

80%

70%

60%

50%

Model1

Model detection rate [%]

*66.0

*20.8

11.2

6.4

3.2

2.2

Total number of identified models

6

61

90

121

142

160

Height regression coefficient

0.0049 ± 0.0002

0.0051 ± 0.0007

0.0052 ± 0.0008

0.0055 ± 0.0009

0.0059 ± 0.0011

0.0067 ± 0.0012

Fe regression coefficient

− 0.0715 ± 0.0005

− 0.0702 ± 0.0025

− 0.0687 ± 0.0035

− 0.0684 ± 0.0049

− 0.0664 ± 0.0053

− 0.0654 ± 0.0074

Fer regression coefficient

0.0025 ± 0.0001

0.0028 ± 0.0003

0.0031 ± 0.0004

0.0033 ± 0.0005

0.0037 ± 0.0007

0.0040 ± 0.0007

UIBC regression coefficient

0.0119 ± 0.0004

0.0129 ± 0.0010

0.0136 ± 0.0015

0.0145 ± 0.0017

0.0153 ± 0.0023

0.0165 ± 0.0028

F-statistics

38.41 ± 0.66

35.00 ± 2.43

31.54 ± 2.92

28.96 ± 3.31

25.45 ± 3.41

23.17 ± 3.83

Root mean square error

0.6239 ± 0.0024

0.6221 ± 0.0094

0.6206 ± 0.0128

0.6162 ± 0.0163

0.6144 ± 0.0197

0.6012 ± 0.0247

Explained variance R2 [%]

46.04 ± 0.42

46.61 ± 1.62

46.75 ± 2.20

47.95 ± 2.75

48.61 ± 3.28

50.92 ± 4.04

Pearson correlation (y1 vs yp1)

0.643 ± 0.000

0.646 ± 0.012

0.648 ± 0.017

0.656 ± 0.021

0.663 ± 0.026

0.677 ± 0.031

Non-seizure/seizure separating threshold

0.5744 ± 0.0317

0.6853 ± 0.0793

0.7355 ± 0.1091

0.8442 ± 0.1372

0.9531 ± 0.1674

1.0655 ± 0.2197

Training: sensitivity

95.49 ± 1.61

93.90 ± 4.98

95.51 ± 4.60

92.68 ± 5.92

91.39 ± 6.18

93.40 ± 6.53

Training: specificity

69.43 ± 1.15

70.90 ± 4.15

68.95 ± 5.25

72.24 ± 6.39

73.91 ± 7.31

71.25 ± 8.07

Testing: sensitivity

 

87.32 ± 15.00

89.65 ± 12.12

84.72 ± 12.09

80.26 ± 13.38

83.27 ± 12.29

Testing: specificity

 

67.36 ± 18.25

65.29 ± 14.61

66.71 ± 10.79

67.45 ± 10.67

66.43 ± 9.29

Model2

Model detection rate [%]

*100.0

*72.0

*50.9

*24.9

14.2

6.8

Total number of identified models

1

44

64

127

160

209

Age regression coefficient

− 0.0050 ± 0.0002

− 0.0052 ± 0.0007

− 0.0056 ± 0.0009

− 0.0060 ± 0.0010

− 0.0064 ± 0.0011

− 0.0072 ± 0.0014

Height regression coefficient

0.0036 ± 0.0002

0.0036 ± 0.0005

0.0038 ± 0.0006

0.0040 ± 0.0007

0.0043 ± 0.0008

0.0048 ± 0.0010

satFe regression coefficient

− 3.2236 ± 0.0455

− 3.1911 ± 0.2123

− 3.1108 ± 0.2796

− 3.0829 ± 0.3491

− 2.9630 ± 0.3964

− 2.8432 ± 0.4344

UIBC regression coefficient

0.0093 ± 0.0003

0.0094 ± 0.0011

0.0098 ± 0.0015

0.0100 ± 0.0016

0.0108 ± 0.0019

0.0113 ± 0.0020

F− statistics

28.82 ± 0.63

25.12 ± 2.25

22.85 ± 2.59

21.00 ± 3.00

19.20 ± 3.24

17.28 ± 3.37

Root mean square error

0.3620 ± 0.0019

0.3638 ± 0.0076

0.3642 ± 0.0097

0.3601 ± 0.0123

0.3558 ± 0.0150

0.3509 ± 0.0175

Explained variance R2 [%]

47.38 ± 0.54

47.47 ± 2.10

48.01 ± 2.69

49.13 ± 3.41

51.02 ± 4.08

53.10 ± 4.60

Pearson correlation (y2 vs yp2)

0.660 ± 0.000

0.662 ± 0.015

0.667 ± 0.020

0.674 ± 0.026

0.691 ± 0.031

0.704 ± 0.034

Non− seizure/seizure separating threshold

0.3495 ± 0.0261

0.3657 ± 0.0899

0.3779 ± 0.1170

0.4135 ± 0.1332

0.4652 ± 0.1607

0.4906 ± 0.1730

Training: sensitivity

83.53 ± 1.04

81.15 ± 4.19

83.30 ± 4.35

80.72 ± 5.91

82.28 ± 6.42

85.87 ± 6.79

Training: specificity

82.89 ± 0.92

86.06 ± 4.14

84.81 ± 4.94

88.90 ± 5.16

89.72 ± 5.66

88.30 ± 7.02

Testing: sensitivity

 

75.60 ± 18.66

75.50 ± 13.62

71.20 ± 12.15

70.69 ± 10.81

72.76 ± 10.10

Testing: specificity

 

81.14 ± 15.07

78.56 ± 12.53

81.53 ± 9.97

79.77 ± 10.16

77.35 ± 11.19

Model3

Model detection rate [%]

*51.5

28.4

10.4

4.6

2.0

1.1

Total number of identified models

15

73

203

293

383

506

Height regression coefficient

− 0.0072 ± 0.0005

− 0.0070 ± 0.0007

− 0.0079 ± 0.0011

− 0.0080 ± 0.0012

− 0.0083 ± 0.0012

− 0.0088 ± 0.0012

HGB regression coefficient

0.0129 ± 0.0009

0.0136 ± 0.0013

0.0153 ± 0.0022

0.0158 ± 0.0024

0.0171 ± 0.0028

0.0179 ± 0.0028

satFe regression coefficient

6.1796 ± 0.5323

6.1236 ± 0.8455

6.8889 ± 1.3212

7.0798 ± 1.4197

7.8790 ± 1.7529

9.1360 ± 2.7797

F-statistics

8.24 ± 0.83

7.41 ± 1.30

8.79 ± 2.07

8.48 ± 2.31

8.60 ± 2.54

10.17 ± 3.67

Root mean square error

0.4182 ± 0.0055

0.4130 ± 0.0095

0.4068 ± 0.0148

0.3917 ± 0.0178

0.3799 ± 0.0218

0.3615 ± 0.0300

Explained variance R2 [%]

26.04 ± 1.93

26.37 ± 3.35

32.33 ± 4.90

35.13 ± 5.75

40.08 ± 6.62

47.23 ± 8.58

Pearson correlation (y3 vs yp3)

0.441 ± 0.005

0.457 ± 0.033

0.495 ± 0.046

0.533 ± 0.053

0.577 ± 0.057

0.630 ± 0.067

Non-recurrent/recurrent seizure separating threshold

1.4001 ± 0.0999

1.4947 ± 0.1410

1.6849 ± 0.2406

1.7372 ± 0.2945

1.9210 ± 0.3366

2.0830 ± 0.3144

Training: sensitivity

83.86 ± 7.67

86.15 ± 10.19

88.11 ± 11.57

91.45 ± 10.93

92.50 ± 8.92

92.03 ± 9.59

Training: specificity

58.44 ± 6.69

60.50 ± 6.80

64.23 ± 9.26

66.54 ± 10.81

69.81 ± 10.25

76.00 ± 10.77

Testing: sensitivity

 

73.33 ± 44.24

69.80 ± 30.16

74.70 ± 28.62

74.29 ± 26.08

69.81 ± 24.57

Testing: specificity

 

45.18 ± 27.23

44.85 ± 19.98

42.13 ± 17.47

46.57 ± 14.52

49.61 ± 12.24

  1. All values were averaged from utilized 5000 iterations with randomized initial conditions. Values are represented as mean ± standard deviation among the iterations. In a majority of the listed quantitative measurements, the mean values are quite stable and standard deviation increases as the dataset is more undersampled.
  2. *The bold highlighted “Model detection rate” represents that the model with listed regression coefficients has been the most often identified as the best model characterizing the data among the iterations.
  3. The adaptive synthetic sampling matched the number of female samples in the case groups to minimize the risk of the imbalanced learning within each modeling iteration.
  4. The separating threshold has been identified by maximizing sum of sensitivity and specificity. Then, the classifying sensitivity and specificity have been tested on the training dataset itself and on the training dataset (i.e., the samples excluded from the training due to dataset undersampling).