Table 3 Task 3 results.

	Title	Abstract	Introduction	Methods	Results	Discussion	References
Round 1 – Feb 2025
ChatGPT o3-mini-high	A	P.A.	P.A.	I	I	I	I
Claude Sonnet 3.7 with Extended Thinking	P.A.	P.A.	P.A.	P.A.	P.A.	P.A.	P.A.
Google Gemini 2.0 Flash Thinking Experimental	P.A.	P.A.	P.A.	I	P.A.	P.A.	I
DeepSeek R1	P.A.	P.A.	I	I	I	I	I
Mistral Le Chat	A	P.A.	I	I	I	I	I
Round 2 – Apr 2025
ChatGPT o4-mini-high	I	P.A.	I	I	I	I	I
Claude Sonnet 3.7 with Extended Thinking	P.A.	P.A.	P.A.	P.A.	P.A.	P.A.	P.A.
Google Gemini 2.5 Pro Experimental	P.A.	P.A.	P.A.	I	P.A.	P.A.	I
DeepSeek R1	P.A.	P.A.	I	I	I	I	I
Mistral Le Chat	A	P.A.	I	I	I	I	I
Grok 3	A	P.A.	P.A.	I	I	I	I

Summary of the results of Task 3 – final manuscript production – in the two rounds of evaluation. A: appropriate; P.A.: partially appropriate; I: inappropriate.

Quick links

Search