Fig. 3: Criterion validity evidence of LLM personality measurements per domain. | Nature Machine Intelligence

Fig. 3: Criterion validity evidence of LLM personality measurements per domain.

From: A psychometric framework for evaluating and shaping personality traits in large language models

Fig. 3: Criterion validity evidence of LLM personality measurements per domain.

Each row depicts a personality domain paired with a theoretically related criterion test, with upwards arrows indicating an expected positive relationship and downwards arrows indicating an expected negative relationship. Rows 1 and 2: extraversion (EXT), and positive and negative affect, compared with human baselines (leftmost column)reported in previous research on personality and affect39. PA, positive affect; NA, negative affect. Rows 3–6: agreeableness (AGR) with subscales of trait aggression, measured by the BPAQ. PHYS, physical aggression; VRBL, verbal aggression; ANGR, anger; HSTL, hostility. Rows 7–9: conscientiousness (CON) with related human values of achievement (ACHV), conformity (CONF) and security (SCRT), measured by PVQ-RR ACHV, CONF and SCRT subscales, respectively. Rows 10 and 11: neuroticism (NEU) with PA and NA compared with human baselines39. Rows 12 and 13: openness (OPE) with creativity, measured by the creative self-efficacy (CSE) and creative personal identity (CPI) subscales of the SSCS. N = 22,500 total LLM observations. All LLM correlations > 0.09 are statistically significant at P < 0.0001 (2-sided values computed using Student’s t-distribution; n = 1,250 per model, per domain).

Back to article page