Table 1 An overview of the considered LLMs and their properties

From: Evaluation and mitigation of the limitations of large language models in clinical decision-making

Model

Base

Parameters

Training dataset

Downloadable

Llama 2 Chat32

Llama 2 (ref. 32)

70B

Public dataa

OASST33

Llama 2 (ref. 32)

70B

Public dataa, https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/, open-source data

WizardLM34

Llama 2 (ref. 32)

70B

Public dataa, Evol-Instruct generated34

Clinical Camel19

Llama 2 (ref. 32)

70B

Public dataa, https://sharegpt.com/; ShareGPT; PubMed articles (before 2021)19, MedQA13

Meditron35

Llama 2 (ref. 32)

70B

Public dataa, https://huggingface.co/datasets/epfl-llm/guidelines/; clinical guidelines, public PubMed abstracts35, public PubMed papers35, RedPajama58

Chat-GPT59

GPT3.5 (ref. 60)

???

User conversationsb, Common Crawl61, WebText2 (ref. 62), Books1 (ref. 63), Books2 (ref. 63), Wikipedia

GPT-4 (ref. 64)

???

???

???

Med-PaLM9

Flan-PaLM65

540B

Webpagesb, Wikipediab, social mediab, GitHubb, news articlesb, booksb, 473 instruction fine-tuning datasets65, HealthSearchQA9, MedicationQA66, LiveQA67

Med-PaLM 2 (ref. 8)

PaLM 2 (ref. 68)

340B

Web Documentsb, booksb, codeb, mathematicsb, conversational datab, MedQA13, HealthSearchQA9, MedicationQA66, LiveQA67

  1. Due to the data usage agreement of MIMIC-IV, only open-access models that can be downloaded can be used with the data; thus, only LLMs based on Llama 2 were used in this study. ??? indicates no information has been made public.
  2. aMeta defines ‘public data’ as a ‘mix of data from publicly available sources’.
  3. bNo further information provided.