Table 1 Comparison of the general features of the training and test sets.

From: Application of machine learning algorithm in predicting distant metastasis of T1 gastric cancer

Variable

Training set (%)

N = 1889

Test set (%)

N = 809

P value

Validation set

N = 107

Age(years)

  

0.7953

 

 < 40

59 (3.12%)

25 (3.09%)

8

 40–60

405 (21.44%)

178 (22.00%)

67

 60–80

971 (51.41%)

426 (52.66%)

31

 > 80

454 (24.03%)

180 (22.25%)

1

Sex

  

0.2379

 

 Male

1050 (55.58%)

429 (53.03%)

71

 Female

839 (44.42%)

380 (46.97%)

36

T stage

  

0.3458

 

 T1a

1054 (55.80%)

468 (57.85%)

40

 T1b

835 (44.20%)

341 (42.15%)

67

N stage

  

0.3622

 

 N0

1553 (82.21%)

671 (82.94%)

81

 N1

231 (12.23%)

96 (11.87%)

21

 N2

80 (4.24%)

26 (3.21%)

4

 N3

25 (1.32%)

16 (1.98%)

1

M stage

  

1

 

 M0

1669 (88.35%)

715 (88.38%)

93

 M1

220 (11.65%)

94 (11.62%)

14

Tumor size(cm)

  

0.6843

 

 < 2

653 (34.57%)

276 (34.12%)

37

 02-May

589 (31.18%)

261 (32.26%)

64

 > 5

132 (6.99%)

47 (5.81%)

6

 NA

515 (27.26%)

225 (27.81%)

0

Differentiation

  

0.081

 

 Well

233 (12.33%)

80 (9.89%)

1

 Moderate

523 (27.69%)

230 (28.43%)

37

 Poorly

853 (45.16%)

356 (44.00%)

63

 Undifferentiated

28 (1.48%)

21 (2.60%)

6

 NA

252 (13.34%)

122 (15.08%)

0

Primary site

  

0.9185

 

 Fundus

92 (4.78%)

35 (4.33%)

 

12

 Body

303 (16.04%)

130 (16.07%)

 

29

 Antrum

713 (37.74%)

306 (37.82%)

 

37

 Pylorus

63 (3.34%)

35 (4.33%)

 

4

 Lesser curve

233 (12.33%)

101 (12.48%)

 

12

 Greater curve

99 (5.24%)

46 (5.69%)

 

1

 Overlapping

135 (7.15%)

52 (6.43%)

 

12

 NOS

251 (13.29%)

104 (12.86%)

 

0