Fig. 4: Comparison of the performances between Fu-LLM (finetune_qwen2_7b) and five other popular public LLM models (DeepSeek-v3 (2024_12_26), GPT-3.5-turbo (2025_01_25), GPT-4o (2024_11_20), claude 3.5-sonnet (2024_10_22) and gemini-2.0-pro (2025_02_05)) in the study dataset. | Nature Communications

Fig. 4: Comparison of the performances between Fu-LLM (finetune_qwen2_7b) and five other popular public LLM models (DeepSeek-v3 (2024_12_26), GPT-3.5-turbo (2025_01_25), GPT-4o (2024_11_20), claude 3.5-sonnet (2024_10_22) and gemini-2.0-pro (2025_02_05)) in the study dataset.

From: A large language model for clinical outcome adjudication from telephone follow-up interviews: a secondary analysis of a multicenter randomized clinical trial

Search

Advanced search

Search

Quick links