Table 4 Overall performance of human and o1-preview on systematic thinking.
From: Comparative evaluation of OpenAI O1 and human performance in higher order cognition
ST instrument | Dimension | Human score (mean ± SD) | o1-Preview (mean ± SD) | Z-score |
|---|---|---|---|---|
The Village of Abeesee | Problem identification | 1.62 ± 0.64 | 2.50 ± 0.62 | 1.38 |
Information needs | 1.81 ± 0.52 | 2.90 ± 0.21 | 2.10 | |
Stakeholder awareness | 1.23 ± 0.99 | 2.95 ± 0.16 | 1.74 | |
Goals | 1.71 ± 0.62 | 2.90 ± 0.21 | 1.92 | |
Unintended consequences | 1.38 ± 0.58 | 2.65 ± 0.24 | 2.19 | |
Implemented challenges | 1.64 ± 0.57 | 2.70 ± 0.35 | 1.86 | |
Alignment | 1.71 ± 1.00 | 2.35 ± 0.41 | 0.64 | |
The Lake Urmia Vignette (LUV) | Variables | 10.95 ± 4.00 | 19.70 ± 1.57 | 2.19 |
Causal links | 9.17 ± 3.97 | 23.30 ± 2.21 | 3.56 | |
Feedback loops | 0.16 ± 0.45 | 3.10 ± 1.10 | 6.53 | |
Total score | 20.08 ± 8.13 | 46.10 ± 4.12 | 3.20 |