Table 3 Select Large Language Models (LLM)

From: Can AI help with the hardest thing: pro health behavior change

Product Acronym

Released

Company

Transformer Capabilities

Applications

LLM Training Parameters

BERT

2017

Google

bidirectional encoder representations from transformers

Encoder reads text achieving contextual relationship for next sentence prediction

110–340 M

GPT-3

2020

OpenAI

generative pre-trained transformers-3

Decoder with 2048 token long word fragments in context for predicting the next token

175 B

LaMDA

2020

Google

large language models (LLM) for dialog applications

Transformer-based neural model of 2.8 T tokens predicts next token in given contexts

137 B

DALL-E

2021

OpenAI

transformer language model (focused version of GPT-3)

Decoder model generates images using 1280-token-long text caption-image pairs

12 B

PaLM

2021

Google

pathways language model (6144-chip TPU accel. cluster)

Autoregressive decoder parallel task computing for logical inference, joke explaining

540 B

ChatGPT

2022

OpenAI

general purpose chatbot-focused model of GPT-3.5

Trained on 300B tokens to emulate text writing; intuitive user interface on many topics

175 B

Galactica (GAL)

2022

MetaAI

open-source LLM for scientific knowledge (5 models)

Tokenizes scientific information in curated corpus to write papers, solve equations, etc.

250 M-120 B

GPT-4.0

2023

OpenAI

multimodal, 60% less likely than ChatGPT to hallucinate

Accepts prompts composed of both images and text, returning textual responses

175 B

LlaMA-3.3

2024

MetaAI

multimodal, open source on the cloud (business workflows)

Interprets charts, maps, image texts; multilingual understanding (customer svc, marketing)

70 B

Qwen-QwQ-32B

2025

Alibaba

mixture of experts (MoE) model, 32 token context windows

Enterprise applications; mathematical reasoning and coding; efficient (less computing)

72B

GPT-4.5

2025

Open AI

advanced unsupervised learning; hierarchical token processing

High “EQ” for creative insights, following user intent for problem solving, writing, etc.

6 T (?)

DeepSeek-R1

2025

DeepSeek

MoE sub models activated by chain-of-thought inputs

Understands long-form content; rapidly does complex math, finance and coding tasks

671 B

Definitions

     

Transformer

Attention mechanism that learns contextual relationships between words (and sub-words) in text; generalizes across domains and tasks

    

Encoder

Reads an entire sequence of words at once; input sequence of tokens is embedded into vectors to be processed by the neural network

    

Decoder

Receives encoder output and prior timestep decoder output; replaces text with tokens (masking) to train the predictive model for the task

    

Bidirectional

Text reading models that learn the context of a word based on all of its surroundings (not by R-to-L or L-to-R directional reading)