Table 1 Evolution of transformer-based large language models and their impact.

From: Sustainability in large language model supply chains-insights and recommendations using analysis of utility for affecting factors

Model

Year

Parameters

Key features

Impact

GPT-1

2018

117 million

Transformer architecture, unsupervised pretraining, fine-tuning for specific tasks

Showed that pretraining on large text corpora transfers well to downstream tasks

GPT-1

2018

117 million

Transformer architecture, unsupervised pretraining, fine-tuning for specific tasks

It showed that pretraining on large text corpora transfers well to downstream tasks

BERT

2018

110 million (base) / 340 million (large)

Bidirectional transformers, pretraining with masked language modeling

Revolutionized NLP benchmarks; led to wide adoption in classification and QA tasks

GPT-2

2019

1.5 billion

Larger model size, text generation, strong performance without task-specific training

Demonstrated high-quality text generation and raised concerns about potential misuse

GPT-3

2020

175 billion

Few-shot learning, general-purpose text generation across multiple tasks

Popularized the use of large-scale language models for diverse AI applications

PALM

2022

540 billion

Dense decoder-only transformer, chain-of-thought prompting

Achieved strong reasoning and arithmetic performance; precursor to PaLM-E and PaLM-2

GPT-4

2023

Estimated 100+ billion to trillions

Multimodal capabilities (text and images), improved reasoning

Enhanced accuracy, multimodal understanding, and problem-solving capabilities

Gemini 1

2023

Estimated 100+ billion

Multimodal AI combining text and vision, inspired by DeepMind’s advancements

Competes with GPT-4, focusing on integrating language and vision for broader AI applications

Gemini 1.5 / 2

Late 2023/Early 2024

Estimated trillions

Continued advancements in multimodal AI, reasoning, and grounding

Expected to push the boundaries of AI performance, particularly in complex reasoning and autonomy

LLaMA

2023

7B–65B

Open-access models optimized for performance on smaller compute

Democratized LLM research by enabling researchers to experiment without massive infrastructure