Fig. 1

Flowchart of overall study design. Two laboratory specialists collected 64 questions based on classic clinical issues and provided them to large language models (LLMs) for answering, with a focus on autoimmune diseases. Subsequently, eight clinicians evaluated the answers across multiple dimensions to assess the performance of the LLMs in the clinical autoimmune disease context. In parallel, the accuracy of the answers to 30 report interpretation questions was compared between four doctors and the LLMs.