Fig. 1
From: LLMs outperform outsourced human coders on complex textual analysis

Overall performance, across tasks and coding strategies. This figure displays the overall performance across all tasks and coding strategies. For T1, the figure shows the Macro \(\text {F}_1\) score; for T2, the Mean Absolute Error (MAE); and for T3, T4, and T5, it shows the accuracy. For T2, a lower number denotes better performance (i.e., smaller errors in identifying the correct number of municipalities), while for the remaining tasks, higher numbers indicate better performance (i.e., closer alignment with the expert benchmark). The “All correct” panel indicates the proportion of news articles for which all tasks were completed entirely correctly, broken down by coding strategy.