Table 1 General overview of the data used for this study.

From: Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Feature

Mean

(SD)

N

(%)

Completeness

Feature selection

[%]

Algorithm

Clinical

age

Age at diagnosis (years)

60.25

(13.96)

  

100

(1)

ratly*

Ratio between positive and removed lymph nodes

0.11

(0.22)

  

100

(2)

 

rly

No. of removed lymph nodes

8.13

(7.84)

  

98.7

(3)

 

ptmm

Tumor size [mm]

20.32

(13.79)

  

87.1

(4)

pts

Pathological tumor stage

100

(5)

 

 I

  

15,412

(42.04)

   

 IIA

  

10,848

(29.59)

   

 IIB

  

4766

(13.00)

   

 IIIA

  

3145

(8.57)

   

 IIIB

  

782

(2.13)

   

 IIIC

  

1705

(4.65)

   

grd

Tumor grade

89.8

(6)

 1 (well differentiated)

  

7563

(20.63)

   

 2 (moderately differentiated)

  

17,926

(48.89)

   

 3 (poorly differentiated)

  

11,169

(30.46)

   

mor

Tumor morphology

100

 

 Ductal

  

29,473

(80.39)

   

 Lobular

  

4109

(11.20)

   

 Mixed

  

1464

(3.99)

   

 Other

  

1612

(4.40)

   

ply

No. of positive lymph nodes

1.53

(3.53)

  

97.6

 

rec*

Receptor status

100

 

 Triple+

  

1983

(5.40)

   

 HR+

  

28,170

(76.84)

   

 HR−

  

2048

(5.58)

   

 Triple−

  

4457

(12.16)

   
  1. Features marked with an asterisk were engineered (see text for more details). The number in parenthesis in the Algorithm column corresponds to the feature’s ranking given by the feature selection. SD standard deviation, HR + hormone receptor positive, HR - hormone receptor negative.