Table 1 Specifications of the language models evaluated in this study
From: Multi-step retrieval and reasoning improves radiology question answering with large language models
Model name | Parameters (billion) | Category | Accessibility | Knowledge cutoff date | Developer | Context length (thousand tokens) |
|---|---|---|---|---|---|---|
Ministral-8B | 8 | IT | Open-source | October 2023 | Mistral AI | 128 |
Mistral Large | 123 | IT | Open-source | November 2024 | Mistral AI | 128 |
Llama3.3-8B | 8 | IT | Open-weights | March 2023 | Meta AI | 8 |
Llama3.3-70B | 70 | IT | Open-weights | December 2023 | Meta AI | 128 |
Llama3-Med42-8B | 8 | IT, clinically-aligned | Open-weights | August 2024 | M42 Health AI Team | 8 |
Llama3-Med42-70B | 70 | IT, clinically-aligned | Open-weights | August 2024 | M42 Health AI Team | 8 |
Llama4 Scout 16E | 17 | IT, 17B active parameters | Open-weights | August 2023 | Meta AI | 10,000 (10 M tokens) |
DeepSeek R1-70B | 70 | Reasoning | Open-source | January 2025 | DeepSeek | 128 |
DeepSeek-R1 | 671 | Reasoning | Open-source | January 2025 | DeepSeek | 128 |
DeepSeek-V3 | 671 | Mixture of experts | Open-source | July 2024 | DeepSeek | 128 |
Qwen 2.5-0.5B | 0.5 | IT | Open-source | September 2024 | Alibaba Cloud | 32 |
Qwen 2.5-3B | 3 | IT | Open-source | September 2024 | Alibaba Cloud | 32 |
Qwen 2.5-7B | 7 | IT | Open-source | September 2024 | Alibaba Cloud | 131 |
Qwen 2.5-14B | 14 | IT | Open-source | September 2024 | Alibaba Cloud | 131 |
Qwen 2.5-70B | 70 | IT | Open-source | September 2024 | Alibaba Cloud | 131 |
Qwen 3-8B | 8 | Reasoning, mixture of experts | Open-source | December2024 | Alibaba Cloud | 32 |
Qwen 3-235B | 235 | Reasoning, mixture of experts | Open-source | July 2025 | Alibaba Cloud | 32 |
GPT-3.5-turbo | Undisclosed | IT | Proprietary | September 2021 | OpenAI | 16 |
GPT-4-turbo | Undisclosed | IT | Proprietary | December 2023 | OpenAI | 128 |
o3 | Undisclosed | Reasoning | Proprietary | June 2024 | OpenAI | 200 |
GPT-5 | Undisclosed | IT, reasoning | Proprietary | September 2024 | OpenAI | 128 |
MedGemma-4B-it | 4 | Gemma 3-based, multimodal, IT, clinical reasoning | Open-weights | July 2025 | Google DeepMind | 128 |
MedGemma-27B-text-it | 27 | Gemma 3-based, text only, IT, clinical reasoning | Open-weights | July 2025 | Google DeepMind | ≥ 128 |
Gemma-3-4B-it | 4 | IT | Open-weights | August 2024 | Google DeepMind | 128 |
Gemma-3-27B-it | 27 | IT | Open-weights | August 2024 | Google DeepMind | 128 |