Table 1 Overview of the dataset, presenting the number of pages and lines for four historical books in both basic and diplomatic transcriptions. For the basic transcription, the table also includes the number of words and unique word classes. The total row provides a summary across all books.

From: Nuremberg Letterbooks: A Multi-Transcriptional Dataset of Early 15th Century Manuscripts for Document Analysis

 

# of pages

# of lines

# of words

# of word classes

# of lines

book 2

256

8426

73468

9071

7997

book 3

548

16183

160297

15088

15471

book 4

290

8439

84255

8761

8134

book 5

617

17932

175402

15448

16720

total

1711

50980

493422

32707

48322