Table 1 Baseline Irritability Scores by Model and Questionnaire
From: Assessing the impact of safety guardrails on large language models using irritability metrics
Guardrail level | Model | BITe | IRQ | CIS | |||
|---|---|---|---|---|---|---|---|
M | SD | M | SD | M | SD | ||
High | Claude | 1.8 | 0.15 | 0.87 | 0.07 | 0.91 | 0.06 |
GPT-4o | 0.99 | 0.24 | 0.54 | 0.08 | 0.46 | 0.07 | |
Low | Grok-3-mini | 2.00 | 0.05 | 1.63 | 0.1 | 1.74 | 0.11 |
Nous | 0.89 | 0.38 | 0.77 | 0.16 | 0.57 | 0.13 | |