Extended Data Table 3 Summary of convergent and discriminant validity evidence across models, representing N = 22,500 total observations

From: A psychometric framework for evaluating and shaping personality traits in large language models

  1. LLM personality measurements demonstrate convergent validity when the average convergent correlation (rconv) between equivalent IPIP-NEO and BFI subscales is strong (≥ 0.60; marked in italics) or very strong (≥ 0.80; marked in boldface). Discriminant validity is evidenced when the average difference (Δ) between a model’s convergent (rconv) and respective discriminant (rdiscr) correlations between personality tests is at least moderate (avg. Δ ≥ 0.40; shown in boldface). All underlying convergent correlations of models with an average rconv ≥0.05 are statistically significant at p <.0001 (two-sided values computed using Student’s t-distribution; n = 1, 250 per model). Supplementary Note A.5 contains further information.