Extended Data Fig. 3: Ridge plots comparing model abilities to concurrently shape LLM personality traits.
From: A psychometric framework for evaluating and shaping personality traits in large language models

Ridge plots showing the effectiveness of model variants in concurrently shaping LLM personality traits, measured by the distances of IPIP-NEO personality score distributions when models were prompted to be ‘extremely low’ (Level 1) vs. ‘extremely high’ (Level 9) on a particular trait. Each column of plots represents the observed scores on a specific domain subscale across all prompt sets. Each row depicts all domain score distributions for a specific model. Each model–domain subplot comprises two traces of score distributions. Red traces represent responses to prompt sets where the domain tested in the subscale (column) is set to ‘extremely low’ and the other four domains are set to one of the two extreme levels an equal number of times. Analogously, blue traces represent responses when one domain is set to ‘extremely high’ and all other domains are equally set to the two extremes. n = 1,600 simulated response profile scores per subplot.