Table 1 An overview of the considered LLMs and their properties
From: Evaluation and mitigation of the limitations of large language models in clinical decision-making
Model | Base | Parameters | Training dataset | Downloadable |
---|---|---|---|---|
Llama 2 Chat32 | Llama 2 (ref. 32) | 70B | Public dataa | ✓ |
OASST33 | Llama 2 (ref. 32) | 70B | Public dataa, https://huggingface.co/OpenAssistant/llama2-70b-oasst-sft-v10/, open-source data | ✓ |
WizardLM34 | Llama 2 (ref. 32) | 70B | Public dataa, Evol-Instruct generated34 | ✓ |
Clinical Camel19 | Llama 2 (ref. 32) | 70B | Public dataa, https://sharegpt.com/; ShareGPT; PubMed articles (before 2021)19, MedQA13 | ✓ |
Meditron35 | Llama 2 (ref. 32) | 70B | Public dataa, https://huggingface.co/datasets/epfl-llm/guidelines/; clinical guidelines, public PubMed abstracts35, public PubMed papers35, RedPajama58 | ✓ |
Chat-GPT59 | GPT3.5 (ref. 60) | ??? | User conversationsb, Common Crawl61, WebText2 (ref. 62), Books1 (ref. 63), Books2 (ref. 63), Wikipedia | ✗ |
GPT-4 (ref. 64) | ??? | ??? | ??? | ✗ |
Med-PaLM9 | Flan-PaLM65 | 540B | Webpagesb, Wikipediab, social mediab, GitHubb, news articlesb, booksb, 473 instruction fine-tuning datasets65, HealthSearchQA9, MedicationQA66, LiveQA67 | ✗ |
Med-PaLM 2 (ref. 8) | PaLM 2 (ref. 68) | 340B | Web Documentsb, booksb, codeb, mathematicsb, conversational datab, MedQA13, HealthSearchQA9, MedicationQA66, LiveQA67 | ✗ |