Table 3 Distribution of annotated entities in all datasets. The n corresponds to the number of documents while figures in cells correspond to the number of instances for this entity.
Entities | MUSCADET-InHouse (n = 1700) | MUSCADET-Synthetic (n = 340) | UW-FrenchSDOH (n = 364) | InHouse Tuberculosis and ALS (n = 400) | ||
|---|---|---|---|---|---|---|
Train | Dev | Test | Test | Test | Test | |
Living_Alone | 194 | 32 | 55 | 37 | 17 | 10 |
Living_WithOthers | 412 | 61 | 128 | 89 | 73 | 44 |
MaritalStatus_Single | 50 | 9 | 14 | 20 | 20 | 6 |
MaritalStatus_InRelationship | 629 | 72 | 184 | 132 | 127 | 72 |
MaritalStatus_Divorced | 70 | 13 | 14 | 17 | 18 | 8 |
MaritalStatus_Widowed | 69 | 8 | 17 | 18 | 9 | 3 |
Descendants_Yes | 845 | 101 | 226 | 154 | 80 | 64 |
Descendants_No | 98 | 13 | 25 | 33 | 5 | 6 |
Job | 828 | 109 | 247 | 216 | 95 | 58 |
Last_job | 751 | 100 | 217 | 204 | 85 | 49 |
Employment_Working | 348 | 53 | 92 | 114 | 56 | 15 |
Employment_Unemployed | 113 | 18 | 31 | 17 | 9 | 7 |
Employment_Student | 30 | 4 | 14 | 17 | 8 | 0 |
Employment_Pensioner | 305 | 45 | 91 | 48 | 34 | 23 |
Employment_Other | 82 | 3 | 32 | 21 | 16 | 8 |
Alcohol | 498 | 70 | 127 | 193 | 251 | 42 |
Tobacco | 627 | 94 | 164 | 219 | 263 | 60 |
Drug | 78 | 11 | 20 | 101 | 136 | 17 |
Housing_Yes | 682 | 86 | 203 | 122 | 48 | 137 |
Housing_No | 16 | 3 | 3 | 13 | 0 | 2 |
PhysicalActivity_Yes | 154 | 23 | 33 | 63 | 21 | 4 |
PhysicalActivity_No | 37 | 6 | 11 | 14 | 3 | 0 |
Income | 24 | 3 | 13 | 8 | 0 | 2 |
Education | 58 | 5 | 18 | 20 | 3 | 1 |
Ethnicity | 69 | 13 | 21 | 18 | 1 | 27 |