Table 1 Evolution of transformer-based large language models and their impact.
Model | Year | Parameters | Key features | Impact |
|---|---|---|---|---|
GPT-1 | 2018 | 117 million | Transformer architecture, unsupervised pretraining, fine-tuning for specific tasks | Showed that pretraining on large text corpora transfers well to downstream tasks |
GPT-1 | 2018 | 117 million | Transformer architecture, unsupervised pretraining, fine-tuning for specific tasks | It showed that pretraining on large text corpora transfers well to downstream tasks |
BERT | 2018 | 110 million (base) / 340 million (large) | Bidirectional transformers, pretraining with masked language modeling | Revolutionized NLP benchmarks; led to wide adoption in classification and QA tasks |
GPT-2 | 2019 | 1.5 billion | Larger model size, text generation, strong performance without task-specific training | Demonstrated high-quality text generation and raised concerns about potential misuse |
GPT-3 | 2020 | 175 billion | Few-shot learning, general-purpose text generation across multiple tasks | Popularized the use of large-scale language models for diverse AI applications |
PALM | 2022 | 540 billion | Dense decoder-only transformer, chain-of-thought prompting | Achieved strong reasoning and arithmetic performance; precursor to PaLM-E and PaLM-2 |
GPT-4 | 2023 | Estimated 100+ billion to trillions | Multimodal capabilities (text and images), improved reasoning | Enhanced accuracy, multimodal understanding, and problem-solving capabilities |
Gemini 1 | 2023 | Estimated 100+ billion | Multimodal AI combining text and vision, inspired by DeepMind’s advancements | Competes with GPT-4, focusing on integrating language and vision for broader AI applications |
Gemini 1.5 / 2 | Late 2023/Early 2024 | Estimated trillions | Continued advancements in multimodal AI, reasoning, and grounding | Expected to push the boundaries of AI performance, particularly in complex reasoning and autonomy |
LLaMA | 2023 | 7B–65B | Open-access models optimized for performance on smaller compute | Democratized LLM research by enabling researchers to experiment without massive infrastructure |