Table 3 Task 3 results.
| Â | Title | Abstract | Introduction | Methods | Results | Discussion | References |
|---|---|---|---|---|---|---|---|
Round 1 – Feb 2025 | |||||||
ChatGPT o3-mini-high | A | P.A. | P.A. | I | I | I | I |
Claude Sonnet 3.7 with Extended Thinking | P.A. | P.A. | P.A. | P.A. | P.A. | P.A. | P.A. |
Google Gemini 2.0 Flash Thinking Experimental | P.A. | P.A. | P.A. | I | P.A. | P.A. | I |
DeepSeek R1 | P.A. | P.A. | I | I | I | I | I |
Mistral Le Chat | A | P.A. | I | I | I | I | I |
Round 2 – Apr 2025 | |||||||
ChatGPT o4-mini-high | I | P.A. | I | I | I | I | I |
Claude Sonnet 3.7 with Extended Thinking | P.A. | P.A. | P.A. | P.A. | P.A. | P.A. | P.A. |
Google Gemini 2.5 Pro Experimental | P.A. | P.A. | P.A. | I | P.A. | P.A. | I |
DeepSeek R1 | P.A. | P.A. | I | I | I | I | I |
Mistral Le Chat | A | P.A. | I | I | I | I | I |
Grok 3 | A | P.A. | P.A. | I | I | I | I |