Table 1 The number of variables selected by information retrieval models for each risk factor category

From: Investigating causal networks of dementia using causal discovery and natural language processing models

Target phrasea

Number of selected phrases

Similarity score from word2vec mean (range)

Similarity score from doc2vec mean (range)

Alcohol

17

0.78 (0.66, 1.0)

0.49 (0.41, 0.58)

Anxiety

14

0.7 (0.62, 0.85)

0.55 (0.44, 0.61)

Arthritis

5

0.68 (0.58, 0.76)

0.56 (0.53, 0.59)

Atrial Fibrillation

6

0.64 (0.54, 0.8)

0.57 (0.53, 0.6)

Cancer

10

0.76 (0.65, 0.81)

0.54 (0.46, 0.61)

Carotid atherosclerosis

4

0.72 (0.71, 0.74)

0.51 (0.49, 0.54)

Cholesterol

72

0.72 (0.1, 1.0)

0.58 (0.52, 0.66)

Cognitive decline

9

0.66 (0.6, 0.73)

0.58 (0.46, 0.64)

Cognitive engagement

2

0.69 (0.66, 0.73)

0.59 (0.57, 0.62)

Dementia

2

0.63 (0.62, 0.64)

0.59 (0.59, 0.59)

Depression

10

0.7 (0.58, 0.76)

0.5 (0.44, 0.6)

Diabetes

13

0.7 (0.61, 0.78)

0.59 (0.58, 0.61)

Diet

11

0.77 (0.71, 0.87)

0.5 (0.42, 0.58)

Hearing loss

5

0.81 (0.78, 0.84)

0.46 (0.41, 0.5)

Hypertension

4

0.79 (0.77, 0.82)

0.61 (0.6, 0.61)

Inflammatory markers

1

0.61 (0.61, 0.61)

0.59 (0.59, 0.59)

Memory loss

1

0.75 (0.75, 0.75)

0.57 (0.57, 0.57)

Metabolic syndrome

15

0.65 (0.62, 0.71)

0.61 (0.59, 0.64)

Motor function

8

0.64 (0.59, 0.74)

0.6 (0.58, 0.61)

Noneb

58

0.64 (0.52, 0.76)

0.59 (0.43, 0.66)

Peripheral artery disease

8

0.66 (0.59, 0.74)

0.6 (0.58, 0.63)

Pesticides

1

0.77 (0.77, 0.77)

0.47 (0.47, 0.47)

Physical activity

12

0.77 (0.7, 0.82)

0.46 (0.4, 0.5)

Renal disease

14

0.71 (0.65, 0.77)

0.56 (0.54, 0.6)

Sleep

5

0.76 (0.67, 0.85)

0.54 (0.49, 0.6)

Smoking

19

0.78 (0.61, 0.9)

0.46 (0.41, 0.62)

Social engagement

2

0.64 (0.63, 0.64)

0.6 (0.6, 0.61)

Stress

4

0.62 (0.49, 0.69)

0.59 (0.58, 0.6)

Stroke

12

0.67 (0.59, 0.74)

0.6 (0.56, 0.65)

  1. aTarget phrase: manually categorised selected variables to the risk factor category.
  2. bNone: the selected phrase cannot be assigned to any risk factor category.