Table 1 Baseline Irritability Scores by Model and Questionnaire

From: Assessing the impact of safety guardrails on large language models using irritability metrics

Guardrail level

Model

BITe

IRQ

CIS

M

SD

M

SD

M

SD

High

Claude

1.8

0.15

0.87

0.07

0.91

0.06

GPT-4o

0.99

0.24

0.54

0.08

0.46

0.07

Low

Grok-3-mini

2.00

0.05

1.63

0.1

1.74

0.11

Nous

0.89

0.38

0.77

0.16

0.57

0.13