Fig. 2: The expert-driven workflow.

This figure illustrates the structure and process of the expert-driven workflow used to optimize LLM performance through human-guided prompt refinement, including baseline evaluation, systematic error analysis and application across Llama3.1, Llama3.2, and Med42 to assess performance differences. P0 initial prompt, XPn expert prompt number n, FPs false positive cases, FNs false negative cases.