Table 1 Corpus token statistics.

From: A distributable German clinical corpus containing cardiovascular clinical routine doctor’s letters

 

CARDIO:DE

CARDIO:DE400

CARDIO:DE100

Total

993,143

805,617

187,526

Mean

1,986

2,014

1,875

Min

588

588

597

25%

1,064

1,082

992

Median

1,704

1,764

1,448

75%

2,638

2,647

2,562

Max

6,644

6,644

5,322

  1. Total token count and quantitative analysis of token count per doctor’s letter per CARDIO:DE split.