Table 1 Results summary across experiments, parameters and tested models
From: A psychometric framework for evaluating and shaping personality traits in large language models
Construct validity | Shaping | ||||||||
|---|---|---|---|---|---|---|---|---|---|
Reliability | Convrg. ↑ | Discr. ↑ | Criter. | Single | Multi. | Dwnstr. | Overall | ||
Model | Variant | ||||||||
PaLM 62B | Base | − − | 0.05 | −0.24 | − − | NT | NT | NT | − − |
Flan-PaLM | |||||||||
8B | IT | + | 0.69 | 0.23 | − | + | − | NT | − |
62B | IT | + | 0.87 | 0.41 | + | + | + | NT | + |
540B | IT | + + | 0.90 | 0.51 | + | + + | + + | + + | + + |
Flan-PaLMChilla 62B | CO, IT | +a | 0.87 | 0.48 | + + | + | + | NT | + |
Llama 2 | |||||||||
7B | Base | − − | −0.01 | −0.03 | − − | NT | NT | NT | − − |
13B | Base | − − | −0.01 | −0.05 | − − | NT | NT | NT | − − |
70B | Base | − − | 0.00 | −0.02 | − − | NT | NT | NT | − − |
Llama 2-Chat | |||||||||
7B | IT | + | 0.59 | 0.15 | − | − | − | NT | − |
13B | IT | + + | 0.82 | 0.29 | + + | − | + | NT | + |
70B | IT | + + | 0.82 | 0.39 | + + | + | + | + + | + |
Mistral 7B | |||||||||
v0.1 | Base | − − | 0.03 | −0.01 | − − | NT | NT | NT | − − |
Instruct v0.1 | IT | − | 0.28 | 0.09 | + | − − | − − | NT | − − |
Mixtral 8x7B | |||||||||
v0.1 | MoE, Base | − − | 0.04 | 0.01 | − − | NT | NT | NT | − − |
Instruct v0.1 | MoE, IT | + + | 0.80 | 0.40 | + + | − | + | + + | + |
GPT- | |||||||||
3.5 Turbo | IT | + + | 0.84 | 0.28 | + + | − | − | NT | − |
4o mini | MM, IT | + + | 0.81 | 0.38 | + + | + | + | NT | + |
4o | MM, IT | + + | 0.90 | 0.48 | + + | + + | + + | + + | + + |
Prompt set parameters | |||||||||
Personality profiles | 0 | 45 | 32 | 45 | |||||
Biographic descriptions | 50 | 50 | 50 | 50 | |||||
Item instructions | 5 | 1 | 1 | 0 | |||||
Items | 419 | 300 | 300 | 0 | |||||
Item postambles | 5 | 1 | 1 | 0 | |||||
Simulated response profiles | 1,250 | 2,250 | 1,600 | 2,250 | |||||
Responses per model | 523,750 | 675,000 | 480,000 | 56,250 | |||||
Section/Supplementary Note | |||||||||