Fig. 2: Analysis of 58 jailbreak prompts. | Nature Machine Intelligence

Fig. 2: Analysis of 58 jailbreak prompts.

From: Defending ChatGPT against jailbreak attack via self-reminders

Fig. 2

We examine their attributes alongside the average ASR percentage for ChatGPT. Performance is tested five times with the Azure ChatGPT API gpt-3.5-turbo-0301. ah, Prompt count and average ASR: sorted by prompt length (a), categorized by the setting of a virtual persona that is exempt from standard rules (b), categorized by the setting of a fictional scenario (c), categorized on the basis of using a warning tone (d), categorized by the presence of specific dialogical examples (e), categorized by the detailed outlining of constraints against generating ethics-related disclaimers and warnings in output (f), categorized by the specification of dual response roles in output (g) and categorized by the explicit requirement for an associated disclaimer in output (h). w/, with; w/o, without.

Back to article page