Table 1 Summary of the 7774 Mayo Clinic cancer pathology reports dataset

From: Synoptic reporting by summarizing cancer pathology reports using large language models

Data Element

% Non-missing

No. of Patients

Age in Years

Mean [Min-Max]

Type of Cancer (%)

M

F

Breast

Lung

Pancreas

Skin

Digestive

Organs

Female Reproductive Organs

Male

Reproductive

Organs

Other

Protocol Biopsy

25

1145

772

53.7 [19–85]

1

0

0

18

1

0

2

76

Procedure

70

2736

2682

60.6 [19–97]

13

6

3

2

13

8

16

38

Specimen

30

1096

1216

63.3 [21–97]

0

14

7

1

29

19

1

29

Primary Tumor

59

2401

2224

61.6 [19–97]

15

7

4

2

15

9

19

29

Specimen Integrity

17

530

801

63.3 [19–95]

0

23

0

5

4

30

3

34

Surgical Margins

44

1514

1898

62.1 [19–97]

20

10

5

3

20

7

2

33

Laterality

32

935

1545

58.0 [19–91]

29

13

0

1

1

0

1

54

Histologic Type

68

2674

2614

60.5 [19–97]

14

7

2

2

13

8

17

37

Histologic Grade

63

2525

2344

61.0 [19–97]

13

7

3

1

14

9

17

36

Mitotic Rate

13

209

793

59.3 [19–95]

59

1

4

7

3

0

0

26

Pathologic Staging Descriptors

30

1059

1295

60.9 [19–93]

25

6

5

2

14

8

15

25

Tumor Focality

27

873

1249

60.1 [19–92]

29

15

2

1

5

1

2

44

Tumor Site

43

1623

1698

60.5 [19–97]

7

10

5

3

19

8

1

47

Tumor Size

49

1584

2195

61.5 [19–97]

17

9

5

3

19

12

2

35

Lymphovascular

Invasion

53

2226

1859

62.3 [19–97]

15

8

4

3

16

8

20

25

Regional Lymph Nodes

59

2376

2222

61.6 [19–97]

15

7

4

2

15

9

18

29

Lymph Node Sampling

30

917

1378

60.4 [19–93]

31

0

0

1

1

18

35

13

Number Examined

44

2270

1115

61.7 [19–97]

0

10

5

1

20

4

25

36

Number Involved

37

1957

924

62.1 [19–97]

0

10

6

1

22

3

26

32

Distant Metastasis

50

1940

1914

61.4 [19–97]

17

8

4

3

18

4

12

33

Perineural Invasion

16

764

462

63.1 [21–97]

0

2

13

4

47

0

1

32

Treatment Effect

29

1505

784

62.3 [19–95]

9

13

5

1

23

1

34

14

  1. We show the top 22 most frequently reported data elements. The data elements listed do not occur in every report—the percent non-missing indicates the percentage of reports containing the corresponding data element.