Table 1 Introduction of eye-tracking datasets across different languages.
From: Hong Kong Corpus of Chinese Sentence and Passage Reading
Corpus names (abbreviations) | Language | Participants | Word tokens read by one participant | Accumulated word tokens1 |
---|---|---|---|---|
Dundee Corpus | English L1 & French L1 | 10 native speakers each | Tokens: 56,216 (types: 9,776); newspaper texts | 1,083,890 |
Tokens: 52,173 (types: 11,321); newspaper texts | ||||
Potsdam Sentence Corpus (PSC) | German | 222 native speakers | Tokens: 1,138; Sentences: 144 | 252,636 |
Dutch Eye-Movements ONline Internet Corpus (DEMONIC) | Dutch | 55 native speakers | Tokens: 1746; Sentences: 224 | 96,030 |
Balanced Corpus of Contemporary Written Japanese (BCCWJ-EyeTrack) | Japanese | 24 native speakers | Bunsetsu2: 411 out of 1643; 20 newspaper texts | 9,864 |
Ghent Eye-Tracking Corpus (GECO) | Dutch L1 & English L2 | 19 unbalanced bilinguals | Tokens: 59,716 (types: 5,575); Gulliver’s Travels I | 1,134,604 |
Tokens: 54,364 (types: 5,012); Gulliver’s Travels II | 1,032,916 | |||
English | 14 monolinguals | Tokens: 54,364 (types: 5,012); Gulliver’s Travels | 761,096 | |
Provo Corpus | English | 84 native speakers | Tokens: 2,689 (types: 1,197); Passages: 55 | 145,206 |
Zurich Cognitive Language Processing Corpus (ZuCo) | English | 12 native adults | Tokens: 21,629; Sentences: 1107 | 259,548 |
Russian Sentence Corpus (RSC) | Russian | 96 Russian participants | Tokens: 1,362; Sentences: 144 | 196,128 |
Beijing Sentence Corpus (BSC) | Chinese | 60 native speakers | Tokens: 1,685; Sentences: 120 | 101,100 |
Multilingual Eye-Movement Corpus (MECO) | Dutch | 45 native speakers | Tokens: 2231; Sentences: 112 | 100,395 |
English | 46 native speakers | Tokens: 1540; Sentences: 112 | 70,840 | |
Estonia | 52 native speakers | Tokens: 2109; Sentences: 99 | 109,668 | |
Finnish | 49 native speakers | Tokens: 1487; Sentences: 110 | 72,863 | |
German | 45 native speakers | Tokens: 2027; Sentences: 115 | 91,215 | |
Greek | 45 native speakers | Tokens: 2083; Sentences: 99 | 93,735 | |
Hebrew | 47 native speakers | Tokens: 1950; Sentences: 121 | 91,650 | |
Italian | 54 native speakers | Tokens: 2114; Sentences: 90 | 114,156 | |
Korean | 32 native speakers | Tokens: 1796; Sentences: 101 | 57,472 | |
Norway | 42 native speakers | Tokens: 2106; Sentences: 116 | 88,452 | |
Russian | 46 native speakers | Tokens: 1894; Sentences: 107 | 87,124 | |
Spanish | 48 native speakers | Tokens: 2412; Sentences: 98 | 115,776 | |
Turkish | 29 native speakers | Tokens: 1697; Sentences: 104 | 49,213 | |
Ghent Eye-tracking COrpus of sentence reading for Chinese-English bilinguals (GECO-CN) | Chinese L1 & English L2 | 32 bilinguals | Tokens: 59,403 (types: 5053); Sentences: 5066 | 1,900,896 |
The Mysterious Affair at Styles (Chapters 1–7) | ||||
Tokens:56,841 (types: 5363); Sentences: 5242 | 1,818,912 | |||
The Mysterious Affair at Styles (Chapters 18–13) | ||||
Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts (CopCo) | Danish | 22 native speakers | Tokens: 34,897; Sentences: 1,832; speech manuscripts | 767,734 |
Chinese Eye-Movement Database (CEMD) | Simplified Chinese | 1,718 native speakers | Types: 8551; Sentences: 8015 | 1,339,9603 |
TURead | Turkish | 196 native speakers | Tokens: 2943 (types: 2185) | 576,828 |
192 short texts, each composed of 1–3 sentences |