Table 1 Comparison of Features between GPT-3.5 and GPT-4.

From: Evaluating prompt engineering on GPT-3.5’s performance in USMLE-style medical calculations and clinical scenarios generated by GPT-4

Features

GPT-3

GPT-4

Performance

Outperformed its predecessors

Scoring 40% higher on internal factual performance benchmark

Model size

175 billion parameters

Estimated to have over 10 trillion parameters

Steerability

Capable of changing its behavior

Designed to be more steerable

Alignment

Not specifically designed for alignment improvement

Designed to improve model alignment, resulting in more truthful output

Image input

Can only use text input

Can use image inputs and text

Multilingual support

Supports multiple languages

Improved multilingual support compared to GPT-3

Training data

Trained on a large dataset but not as diverse as GPT-4

Trained on a larger dataset