Introduction

The capacity to accumulate adaptive culture—known as cumulative cultural evolution (CCE)—is crucial to the advancement of the human species1,2,3,4. Because cultural evolution is faster than biological evolution, it has enabled humans to adapt to and dominate the earth’s ecosystems5,6,7. Critical to CCE is the ability to selectively engage in social learning, particularly by identifying and adopting effective, high-payoff solutions8,9,10,11. This capacity allows adaptive culture to accumulate over historical time, giving rise to complex cultural traits—such as stone tools12, agriculture13 and sophisticated communication systems14—that no single person could invent on their own.

Theoretical models highlight the importance of demography to CCE15: larger populations provide more access to high-performing solutions, greater variation, and more opportunities for solution recombination16. These benefits support the population size hypothesis—the idea that larger populations promote CCE. Yet, support for the population size hypothesis is mixed17. Here, we report two large-scale experiments that clarify the conditions under which large populations reliably enhance CCE. Experiment 1 addresses the mixed findings in the literature by testing the role of attention filtering—the ability to focus on a single high-performing demonstrator or solution—in facilitating payoff-biased social learning and thus enhancing CCE in large groups. Experiment 2 explores a complementary pathway to enhanced CCE in large populations: the use of external representations (e.g., diagrams and writing systems) that reduce cognitive load, support payoff-based comparisons, and facilitate the recombination of solutions.

Ethnographic studies link population growth to increases in cultural complexity, and population decline to cultural loss18,19,20. Complementing these findings, mathematical modeling suggests that larger populations can enhance the retention and adaptive evolution of culturally-acquired skills15,21. However, critics argue that the ethnographic record does not support these conclusions, challenge the assumptions underlying the mathematical models, and propose that environmental factors are the primary drivers of cultural complexity22,23,24,25. Experimental support for the population size hypothesis is also mixed. In these studies, participants are assigned to laboratory groups of different sizes and tasked with developing and improving a technology (e.g., an arrowhead, fishing net, or paper plane). Some studies find that larger groups are better able to maintain cultural complexity26 and improve artifact performance27, while others find no such benefit28 or even that larger groups impede technological adaptation29. A key factor that may explain these discrepancies is whether participants can engage in attention filtering—that is, selectively attend to adaptive, high-payoff solutions and avoid low-payoff solutions.

The most prominent theoretical model contends that large populations enhance CCE when people can selectively attend to a single successful demonstrator15. This aligns with the experimental findings: large populations enhance CCE when participants can attend to a single successful solution produced by a single demonstrator26,27, but not when they must evaluate multiple solutions from multiple demonstrators and engage in payoff-biased social learning without the aid of a filtering mechanism28,29. While payoff-biased social learning is generally adaptive, its effectiveness may be limited by cognitive constraints. These include the serial position effect—a tendency to remember the first and last items in a list better than those in the middle30—and the limited capacity of working memory, which can hold only about four chunks of information at a time31,32. Experiment 1 directly tests whether attention filtering moderates the effect of population size on CCE. We compare cumulative performance in small and large groups under conditions that either allow or restrict attention filtering. We predict that attention filtering will enhance CCE in large populations by reducing the cognitive load associated with payoff-based comparisons.

A process that is not included in Henrich’s theoretical model15 is recombination—combining the different solutions produced by different demonstrators to generate novel solutions—something that has been argued to be critical to CCE33,34. Recombination depends on the ability to compare and extract high-performing elements across multiple options, a process that also relies on payoff-biased social learning. However, this process may break down when cognitive overload limits individuals’ ability to evaluate and integrate multiple payoffs. This limitation can be mitigated by external representations—such as diagrams35,36 and written records37—which allow individuals to offload cognitive complexity and reduce internal memory demands. Furthermore, external representations can expand the effective population size (i.e., the interacting members of a population) by enabling access to the solutions produced by demonstrators from different geographic locations and prior generations—solutions from across space and time. The link between the adoption of the printing press and the number of famous scientists and artists born in each city supports the importance of external representations to culture38—similar arguments have been made for digital media39. External representations afford three key benefits: they help mitigate cognitive overload, facilitate payoff-based comparisons and attention filtering, and enable solution recombination. Experiment 2 tests the importance of external representations to CCE. We compare the cumulative performance of small and large groups with and without access to an external record of group members’ solutions. We predict that access to an external record will enhance CCE in large populations.

Two large-scale experiments are reported (combined N = 941). In each, participants were assigned to an Individual Learning condition or to a 3-Person or 6-Person laboratory group tasked with developing and improving a virtual arrowhead technology25,35. The Individual Learning condition provided a baseline against which to compare the Social Learning conditions36. Three-person groups were used because the number of observed solutions (two arrowheads plus their associated performance scores) likely falls within working memory capacity—approximately 4 chunks of information. Six-person groups were included because the number of solutions (five solutions plus their associated performance scores—approximately ten chunks of information) likely exceeds this limit. Hence, the 3- and 6-Person groups represent small and large populations, respectively.

Across 10 trials, participants repeatedly built virtual arrowheads on a computer and received performance feedback after each trial based on arrowhead shape. In the Social Learning conditions, participants could also view the arrowheads produced by the other members of their group at the end of each trial. Experiment 1 manipulated attention filtering. In the View 1 Model condition participants could selectively attend to a single arrowhead based on its task performance score, enabling attention filtering. In the View All Models condition, participants were required to attend to each group member’s arrowhead, presented in a random order. This design tested whether the ability to focus on a single high-payoff solution—thereby supporting payoff-biased social learning—would enhance CCE, particularly in larger groups where cognitive overload is more likely. Experiment 2 manipulated access to external representations. In both the View All Models and External Record conditions, participants were required to attend to all group members’ arrowheads, but only the latter condition provided access to a persistent external record of the solutions. This allowed us to test whether offloading information externally would mitigate cognitive overload and support payoff-based comparison and recombination. Together, the two experiments directly test how attention filtering and external representations shape cumulative culture in small and large populations.

Experiment 1

A key difference between previous experimental studies that support and do not support a population size effect is the opportunity for attention filtering—the ability to selectively attend to high-payoff solutions and avoid exposure to low-payoff solutions. When participants can choose to view a single high-performing solution, a benefit of large populations is observed26,27. When required to attend to all available solutions—regardless of quality—no benefit is observed28,29.

Experiment 1 directly tests whether attention filtering moderates the population size effect. In the View 1 Model condition, participants were restricted to viewing a single arrowhead based on task performance, allowing them to focus on the highest-payoff solution, which in turn enabled more accurate payoff-biased copying and improved task performance. In the View All Models condition, participants viewed each group member’s arrowhead in a random order, reducing their ability to focus on high-payoff solutions and thereby limiting payoff-biased copying. If attention filtering moderates the population size effect, then CCE should be enhanced in large populations (compared to small populations) in the View 1 Model condition but not in the View All Models condition.

Experiment 1 methods

The study received approval from the University of Western Australia Ethics Committee on 4 February 2020 (Project Identifier: 2019/RA/4/20/6004). Participants viewed an information sheet before giving informed consent to take part in the study. All methods were conducted in accordance with the National Statement on Ethical Conduct in Human Research (NHMRC/ARC/Universities Australia) and the Declaration of Helsinki.

Participants

A convenience sample of 287 undergraduate students from the University of Western Australia participated in exchange for partial course credit. Of these, 281 participants consented to their data being used for research purposes (reported here; 195 self-identified as women, 85 as men and 1 as non-binary). Participant ages ranged from 18 to 57 years old (M = 22.20, SD = 5.95). Participants were randomly assigned to groups/conditions (see Table 1).

Table 1 The number of groups and participants in each condition of experiment 1.

Procedure

Participants played a web-based game based on the virtual arrowhead task developed by Mesoudi40 and used by Derex et al.27. The goal was to produce virtual arrowheads that returned the highest possible scores. Participants could not communicate with each other during the game, and all instructions were given on their screens. The game lasted for ten trials, with each trial consisting of a Production phase and an Observation phase. After the final trial, participants were asked whether they consented for their data to be used for research purposes, and to provide their age and gender. They were then debriefed.

Production phase

During the Production phase, participants were given a 5 × 7 grid marked out by squares (see Fig. 1A). They started by setting the “width” parameter that determined the spacing between the squares and therefore the overall size of the grid (options ranged from 30 to 50, with 30 the default; as per Derex et al.27, this impacted participants’ scores via arrowhead size). Participants then clicked a “Draw Mode” button, which disabled the width input and enabled participants to draw lines connecting any two squares in the grid, by clicking on those squares. This drawing represented the arrowhead. A “Reset” button erased any lines and allowed participants to set the width parameter again. After 90 s had elapsed (indicated by a shrinking bar at the bottom of the screen), whatever the participant had drawn by that point was saved as their arrowhead for that trial, and they proceeded to the Observation phase. Participants were required to wait on an interstitial screen (“Please wait for other group members to finish…”) until all members of their group had finished the Production phase for that trial.

Fig. 1
Fig. 1
Full size image

Screenshots from the task: (A) the Production phase, (B) the menu of the Observation phase in a 6-Person group, (C) arrowhead playback during the Observation phase, and (D) the Production phase of Experiment 2’s External Record condition in a 6-Person group.

Arrowhead scoring

The arrowhead scoring system was based on Derex et al.27. Once an arrowhead was saved, a score was automatically calculated based on its size, symmetry and triangularity, the number of notches, and the regularity of the notches. Participants were not given any information about how the scores were calculated.

Observation phase

The Observation phase varied across conditions. In the Individual Learning condition, participants were shown the score for the arrowhead they had just produced, then clicked “Continue” to proceed to the next trial. In the Social Learning conditions, the purpose of the Observation phase was to show participants the arrowheads just produced by the other members of their group. Participants were shown a menu page with the score for their own arrowhead, and a table with the scores returned by the other group members’ arrowheads (see Fig. 1B). For participants in the 3-Person groups, there were two other group members listed, and for participants in the 6-Person groups there were five. In the View 1 Model condition, participants chose a single arrowhead to view, by clicking a “View” button corresponding to one of their group members’ arrowheads in the table. In the View All Models condition, participants viewed every arrowhead, one at a time, in a predetermined random order. The random order was imposed on participants by only showing one “View” button at a time. When participants clicked a “View” button, they were taken to a screen similar to that from the Production phase, and the construction of the selected arrowhead was played back over 20 s, with an additional 5 s to show the completed arrowhead (see Fig. 1C). After the 25 s had elapsed (indicated by a shrinking bar at the bottom of the screen), participants were returned to the Observation menu page, where they could select the next arrowhead to view (if there were any remaining). They were unable to view the same arrowhead more than once. If they had viewed all the arrowheads for that trial (i.e., one in the View 1 Model condition, or else two or five depending on their group size), they clicked a “Continue” button to proceed to the next trial.

Calculating arrowhead similarity

To assess how closely participants copied each other’s arrowheads, we measured the similarity between arrowheads produced in trials 2–10 with the arrowheads produced in the same groups in the previous trials (i.e., trial t compared to trial t − 1). To calculate the similarity between any two arrowheads, each arrowhead was redrawn as a 60 × 60px image (without drawing the squares, and with each line 1px wide), and the similarity score was computed as the number of pixels in which both images had a line, divided by the number of pixels in which either image (i.e., one or the other or both) had a line (Jaccard similarity). As a baseline we calculated a “chance” similarity score by comparing each arrowhead (from trials 2–10) against a previous-trial arrowhead produced by a random participant from a different group.

Experiment 1 results

Unless otherwise stated, the data were analyzed using linear mixed effects models. The random effects structure included by-participant random intercepts that were nested within their group. All analyses were performed, and all figures were created in R41. Statistical models were estimated using the lmer() function of lmerTest42. The analytic strategy is identical to that used in Fay et al.29.

Prerequisites for large populations to enhance cumulative culture

Large populations can enhance cumulative culture if they provide access to greater artifact variation and better-adapted, high-payoff artifacts. This was tested using linear regression and correlational analyses. Artifact variation (operationalised as the range in arrowhead performance scores at Trial 1) was higher in the larger 6-Person groups than in the 3-Person groups (p = 0.002; see Fig. 2A). The maximum arrowhead performance score was higher in the 6-Person groups than in the 3-Person groups (p = 0.051)—although this effect was marginal—and higher in the 3-Person groups than in the Individual Learning condition (p < 0.001; Fig. 2B). Artifact variation and the maximum artifact performance scores were positively correlated (r = 0.68, p < 0.001; Fig. 2C). The greater variation afforded by larger populations also gave rise to more poorly-adapted, low-payoff artifacts: the minimum arrowhead performance score was lower in the larger 6-Person groups than in the 3-Person groups (p = 0.056)—although this effect was marginal—and lower in the 3-Person groups than in the Individual Learning condition (p < 0.001; Fig. 2D). Artifact variation and the minimum artifact performance scores were negatively correlated (r = − 0.72, p < 0.001; Fig. 2E).

So, the greater variation afforded by larger populations was a double-edged sword: it provided greater access to better-adapted, high-payoff artifacts, but also to more poorly adapted, low-payoff artifacts. This highlights the likely value of attention filtering for exploiting the advantages and avoiding the disadvantages of large populations.

Fig. 2
Fig. 2
Full size image

The data reported is from trial 1, i.e., prior to social learning. This ensured the Individual Learning and Social Learning conditions are comparable. (A) The variation in arrowhead performance (maximum minus minimum performance scores) in the 3- and 6-Person Social Learning conditions. The coloured bars indicate the mean variation score for each condition and the dot points indicate the arrowhead variation score for each group. Error bars are the bootstrapped 95% CIs. Only one arrowhead was produced in the Individual Learning condition so there was no variation score to report. (B) The maximum arrowhead performance scores and (D) the minimum arrowhead performance scores in the Individual Learning and in the 3- and 6-Person Social Learning conditions. The coloured bars indicate the mean maximum/minimum arrowhead performance score for each condition, and the dot points indicate the maximum/minimum arrowhead performance score for each group. Error bars are the bootstrapped 95% CIs. The correlation between the variation in arrowhead performance and (C) the maximum arrowhead performance scores and (E) the minimum arrowhead performance scores in the 3- and 6-Person Social Learning conditions. The dark gray straight line is the linear model fit, and the light gray shaded area is the bootstrapped 95% CI.

Does social learning outperform individual learning, and does attention filtering enhance cumulative culture in large populations?

We first tested if social learning outperformed individual learning (tested separately for the 3- and 6-Person groups). We then compared arrowhead performance across the Social Learning conditions to test if attention filtering enhanced CCE in the large populations. The data were analysed across trials 2–10. Trial 1 was treated as a practice trial, allowing participants to familiarise themselves with the task interface. The same pattern of results is returned when trial 1 is included in the analysis.

In the Individual Learning and the 3-Person Social Learning conditions there was an effect of Trial (β = 19.66, t = 4.00, p < 0.001) and an effect of Condition. The effect of Trial indicates that arrowhead performance improved over trials in all conditions. The effect of Condition is due to the higher task performance in the Social Learning conditions compared to the Individual Learning condition (View All Models, β = 110.61, t = 2.33, p = 0.020; View 1 Model, β = 125.85, t = 2.80, p = 0.005). There was no statistical evidence of a performance difference between the 3-Person Social Learning conditions (p = 0.773). In the Individual Learning and the 6-Person Social Learning conditions there was an effect of Trial (β = 19.66, t = 3.73, p < 0.001), Condition and a Trial by Condition interaction. The effect of Condition is due to the higher task performance in the Social Learning conditions compared to the Individual Learning condition (ps < 0.002), and the higher task performance in the View 1 Model condition compared to the View All Models condition (β = 90.32, t = 2.09, p = 0.037). The Condition by Trial interaction is due to the improvement in task performance over trials in the Individual Learning (β = 19.66, t = 4.34, p < 0.001) and the View 1 Model condition (β = 29.78, t = 5.86, p < 0.001) and the absence of a performance improvement in the View All Models condition (p = 0.360).

Next, we tested the prediction that attention filtering will enhance CCE in large populations. As predicted, the larger 6-Person groups outperformed the 3-Person groups in the View 1 Model condition (β = 36.28, t = 2.09, p = 0.037). There was no statistical evidence of a population size effect in the View All Models condition (p = 0.499; see Fig. 3).

Fig. 3
Fig. 3
Full size image

Change in arrowhead performance scores over trials in the (A) Individual Learning condition and in the (B) Social Learning conditions (3-Person Groups in green and 6-Person groups in orange). The dot points indicate the overall mean arrowhead performance score for each condition at each trial and the vertical intersecting line indicates the 95% CI. Each thin line indicates the performance score for each participant in the Individual Learning condition or the mean performance score for each group in the Social Learning conditions. The thicker gray/green/orange straight line is the linear model fit, and the light gray/green/orange shaded area is the bootstrapped 95% CI (across trials 2–10). For ease of reference, we included the linear model fit for the Individual Learning condition (gray straight line) within the Social Learning conditions (panel B).

Why does attention filtering enhance cumulative culture in large populations?

We predict that it does so by making it easier for people to selectively attend to, and more accurately copy, high-payoff artifacts.

First, we examined which arrowheads participants chose to view in the View 1 Model condition, ranking selections from the highest- to the lowest-performing arrowheads. Participants overwhelmingly chose the highest-ranked arrowhead, consistent with strategic, payoff-biased social learning. This was true in the 3-Person groups (one sample t-test against chance level of 50%; M = 70.26%, t(37) = 6.06, p < 0.001) and in the 6-Person groups (one sample t-test against chance level of 20%; M = 68.93%, t(74) = 21.37, p < 0.001). In the 6-Person groups, the second most viewed arrowhead (rank 5) was selected at a rate that did not differ from chance (M = 17.73%, t(74) = −1.40, p = 0.165; see Fig. 4A).

Next, we tested if the strategic decision-making in the View 1 Model condition improved payoff-biased copying, explaining the enhanced CCE in the larger 6-Person groups. Copying was assessed by measuring arrowhead similarity (see Methods). This was tested across the 3-Person Social Learning conditions and then tested across the 6-Person Social Learning conditions. For the 3-Person groups there was an effect of Arrowhead Rank (β = −0.04, t = 8.13, p < 0.001), but no statistical evidence of a difference between the Social Learning conditions (p = 0.793). Consistent with payoff-biased copying, participants preferentially copied the more successful arrowheads from their group and were similarly able to do so in the different 3-Person Social Learning conditions. For the 6-Person groups there was an effect of Condition, Arrowhead Rank and a Condition by Arrowhead Rank interaction (all ps < 0.001). While participants in each condition preferentially copied the higher-payoff arrowheads from their group, payoff-biased copying was stronger in the View 1 Model condition (β = −0.032) compared to the View All Models social learning condition (β = −0.021) (ps < 0.001). Note also that payoff-biased copying led group members to converge on particular arrowhead designs, as reflected by the higher-than-chance similarity across designs regardless of rank (see Fig. 4B).

Fig. 4
Fig. 4
Full size image

(A) Choice to view the highest to lowest performing arrowhead (ranked from 1 st to 5th) in the 3- and 6-Person groups of the View 1 Model condition (3-Person Groups in green and 6-Person groups in orange). The dot points indicate the mean percentage of views for each participant (averaged across trials). The solid straight line is the linear model fit and the shaded area is the bootstrapped 95% CIs. (B) Similarity to the highest to lowest performing arrowheads (ranked from 1 st to 6th) in the 3- and 6-Person groups of the Social Learning conditions: View All Models and View 1 Model (3-Person Groups in green and 6-Person groups in orange). The dot points indicate the arrowhead similarity score for each participant (averaged across trials). The solid straight line is the linear model fit, and the shaded area is the bootstrapped 95% CI. The gray horizontal dashed line indicates chance arrowhead similarity.

We have shown that attention filtering in the View 1 Model condition enhances CCE in larger populations and increases the accuracy of payoff-biased copying. To link these two findings in a two-step causal mechanism, we tested whether more accurate payoff-biased copying (similarity to the highest-ranked arrowhead) predicted higher arrowhead performance. This analysis pooled data across trials 2–10 and group sizes and included all the experimental conditions. We found a significant main effect of Payoff-biased Copying (β = 152.02, t = 2.30, p = 0.022) and an interaction between Payoff-biased Copying and Condition. In each condition, more accurate payoff-biased copying was associated with higher arrowhead performance. This association was substantially stronger in the View All Models (β = 791.26) and the View 1 Model (β = 674.37) conditions than in the Individual Learning condition (β = 138.44; see Fig. 5).

Fig. 5
Fig. 5
Full size image

The relationship between the accuracy of payoff-biased copying and arrowhead performance in the Individual Learning, View All Models and View 1 Model conditions. The dot points indicate the arrowhead performance scores for each participant. The dark gray straight line is the linear model fit, and the light gray shaded area is the bootstrapped 95% CI.

In summary, social learning and larger populations provided access to better-adapted, high-payoff arrowheads, but also to more poorly adapted, low-payoff arrowheads, highlighting the potential value of attention filtering to CCE in the larger populations. As predicted, attention filtering in the larger 6-Person groups gave rise to a population size benefit (View 1 Model condition), replicating Derex et al.27. Without an attention filtering mechanism, no benefit of population size was observed (View All Models condition), replicating Caldwell and Millen28 and Fay et al.29. These results resolve the mixed findings in the experimental literature by demonstrating the moderating role of attention filtering to enhanced CCE in large populations. The benefit of attention filtering is that it improves participants’ ability to selectively attend to and more accurately copy the highest-payoff solutions produced by a single successful demonstrator. These findings support Henrich’s theoretical model15.

Experiment 2

The extended mind thesis argues that human cognitive processes extend beyond the brain and body to incorporate external tools—such as pen and paper, books, diagrams, smartphones and computers—as integral parts of the cognitive system43(see also Norman’s concept of cognitive artifacts44, which highlights how external tools can reshape and support cognitive processes). Experiment 2 tests if a cognitive aid—an external record of the arrowheads and their associated performance scores—can enhance cumulative cultural evolution (CCE) in large populations by reducing cognitive demands and supporting more effective social learning. As described earlier, external representations may support CCE in three key ways: (1) by extending working memory capacity, (2) by facilitating attention filtering—the ability to selectively attend to high-payoff solutions—and (3) by enabling solution recombination.

In Experiment 2, all participants viewed each arrowhead produced by the other members of their group. In the View All Models condition, arrowheads were presented sequentially, with no persistent record available when participants created their own arrowheads, mirroring the View All Models condition from Experiment 1. In contrast, the External Record condition also presented arrowheads sequentially, but saved each one, along with its performance score, to an on-screen visual record that remained accessible during subsequent design trials. This manipulation allowed participants to offload information to an external aid, potentially improving their ability to compare solutions, filter based on performance, and recombine elements. If an external record enhances CCE in large populations, then a population size effect should be observed in the External Record condition—but not in the View All Models condition.

Methods

The study received approval from the University of Western Australia Ethics Committee on 4 February 2020 (Project Identifier: 2019/RA/4/20/6004). Participants viewed an information sheet before giving informed consent to take part in the study. All methods were conducted in accordance with the National Statement on Ethical Conduct in Human Research (NHMRC/ARC/Universities Australia) and the Declaration of Helsinki.

Participants

A convenience sample of 654 undergraduate students from the University of Western Australia participated in exchange for partial course credit. Of these, 644 participants consented to their data being used for research purposes (reported here; 462 self-identified as women, 170 as men, 8 as non-binary, and 4 preferred not to provide a gender). Participant ages ranged from 17 to 57 years old (M = 22.47, SD = 6.25). Participants were randomly assigned to groups/conditions (see Table 2).

Table 2 The number of groups and participants in each condition of experiment 2.

Procedure

The same arrowhead task was used as in Experiment 1. The Individual Learning and View All Models conditions were the same as in Experiment 1. The External Record condition was the same as the View All Models condition (i.e., arrowheads were viewed in a predetermined random order), except that an “OBSERVED ARROWHEADS” box was added to the bottom of the screen. During the Observation phase of each trial, images of the arrowheads viewed by the participant were added to the box, along with their scores and width values. These arrowheads remained in the box during the subsequent Production phase (see Fig. 1D), providing the participant with a reference of the arrowheads they had just viewed. The box was then emptied for the next Observation phase.

Experiment 2 results

Experiment 2 followed the same analytic approach taken in Experiment 1.

Prerequisites for large populations to enhance cumulative culture

The same pattern of results returned by Experiment 1 was returned by Experiment 2: artifact variation was higher in the 6-Person groups than in the 3-Person groups (see Fig. 6A), the maximum arrowhead performance scores were higher in the 6-Person groups than in the 3-Person groups and higher in the 3-Person groups than in the Individual Learning condition (ps < 0.003; see Fig. 6B), artifact variation and the maximum artifact performance scores were positively correlated (r = 0.74; Fig. 6C), the minimum arrowhead performance scores were lower in the 6-Person groups than in the 3-Person groups and lower in the 3-Person groups than in the Individual Learning condition (ps < 0.013; Fig. 6D), and artifact variation and the minimum artifact performance scores were negatively correlated (r = − 0.60; Fig. 6E).

Again, the greater variation afforded by larger populations was a double-edged sword: it provided greater access to better-adapted high-payoff artifacts, but also to more poorly adapted, low-payoff artifacts.

Fig. 6
Fig. 6
Full size image

The data reported is from trial 1, i.e., prior to social learning. This ensured the Individual Learning and Social Learning conditions are comparable. (A) The variation in arrowhead performance (maximum minus minimum performance scores) in the 3- and 6-Person Social Learning conditions. The coloured bars indicate the mean variation score for each condition and the dot points indicate the arrowhead variation score for each group. Error bars are the bootstrapped 95% CIs. Only one arrowhead was produced in the Individual Learning condition so there was no variation score to report. (B) The maximum arrowhead performance scores and (D) the minimum arrowhead performance scores in the Individual Learning and in the 3- and 6-Person Social Learning conditions. The coloured bars indicate the mean maximum/minimum arrowhead performance score for each condition, and the dot points indicate the maximum/minimum arrowhead performance score for each group. Error bars are the bootstrapped 95% CIs. The correlation between the variation in arrowhead performance scores and (C) the maximum arrowhead performance scores and (E) the minimum arrowhead performance scores in the 3- and 6-Person Social Learning conditions. The dark gray straight line is the linear model fit, and the light gray shaded area is the bootstrapped 95% CI.

Does social learning outperform individual learning and does an external record enhance cumulative culture in large populations?

We first tested if social learning outperformed individual learning (tested separately for the 3- and 6-Person groups). We then compared performance across the Social Learning conditions to test if an external record enhanced CCE in the large populations.

In the Individual Learning and the 3-Person Social Learning conditions, there was an effect of Trial (β = 17.93, t = 4.07, p < 0.001) and Condition. The effect of Trial indicates that arrowhead performance improved over trials in all conditions. The effect of Condition is due to the higher task performance in the Social Learning conditions compared to the Individual Learning condition (View All Models, β = 178.21, t = 4.83, p < 0.001; External Record, β = 223.91, t = 6.01, p < 0.001). There was no statistical evidence of a performance difference between the 3-Person Social Learning conditions (p = 0.162). In the Individual Learning and the 6-Person Social Learning conditions there was an effect of Trial (β = 12.82, t = 3.96, p < 0.001), Condition and a Trial by Condition interaction. The effect of Condition is due to the higher task performance in the Social Learning conditions compared to the Individual Learning condition (ps < 0.002), and the higher task performance in the External Record condition compared to the View All Models condition (β = 64.85, t = 2.91, p = 0.004). The Condition by Trial interaction is due to the improvement in task performance over trials in the External Record condition (β = 12.82, t = 3.86, p < 0.001) and the absence of a performance improvement in the View All Models condition (p = 0.421).

Next, we tested the prediction that an external record will enhance cumulative culture in large populations. As predicted, the larger 6-Person groups outperformed the 3-Person groups in the External Record condition (β = 18.78, t = 2.13, p = 0.034). There was no statistical evidence of a population size effect in the View All Models condition (p = 0.178; see Fig. 7).

Fig. 7
Fig. 7
Full size image

Change in arrowhead performance scores over trials in the (A) Individual Learning condition and in the (B) Social Learning conditions (3-Person groups in green and 6-Person groups in orange). The dot points indicate the overall mean arrowhead performance scores for each condition at each trial and the vertical intersecting line indicates the 95% CI. Each thin line indicates the performance score for each participant in the Individual Learning condition or the mean performance score for each group in the Social Learning conditions. The thicker gray/green/orange straight line is the linear model fit, and the light gray/green/orange shaded area is the bootstrapped 95% CI (across trials 2–10). For ease of reference, we included the linear model fit for the Individual Learning condition (gray straight line) within the Social Learning conditions (panel B).

Why does an external record enhance cumulative culture in large populations?

We predict that it does so by making it earlier for people to accurately copy the high-payoff artifacts (potentially in addition to the opportunity for recombination). This was tested across the 3-Person Social Learning conditions and then tested across the 6-Person Social Learning conditions (see Fig. 8).

For the 3-Person groups there was an effect of Arrowhead Rank (β = −0.05, t = − 14.33, p < 0.001). Consistent with payoff-biased copying, participants preferentially copied the more successful arrowheads from their group and were similarly successful in doing so in both Social Learning conditions. For the 6-Person groups there was an effect of Arrowhead Rank, Condition and an Arrowhead Rank by Condition interaction (all ps < 0.001). While participants in each condition preferentially copied the higher-payoff arrowheads from their group, payoff-biased copying was stronger in the External Record condition (β = −0.026) compared to the View All Models condition (β = −0.019). As in Experiment 1, payoff-biased copying also led to within-group convergence (reflected in often higher-than-chance arrowhead similarity regardless of rank).

Fig. 8
Fig. 8
Full size image

Similarity to the highest to lowest performing arrowheads (ranked from 1 st to 6th) in the 3- and 6-Person groups of the Social Learning conditions: View All Models and External Record (3-Person groups in green and 6-Person groups in orange). The dot points indicate the arrowhead similarity score for each participant (averaged across trials). The solid straight line is the linear model fit, and the shaded area is the bootstrapped 95% CI. The grey horizontal dashed line indicates chance arrowhead similarity.

As before, we tested whether more accurate payoff-biased copying (similarity to the highest-ranked arrowhead) predicted higher arrowhead performance. This analysis pooled data across trials 2–10 and group sizes and included all experimental conditions. We found a significant main effect of Payoff-biased Copying (β = 406.89, t = 5.17, p < 0.001), Condition and an interaction between Payoff-biased Copying and Condition. In each condition, more accurate payoff-biased copying was associated with higher arrowhead performance. This association was substantially stronger in the View All Models (β = 673.58) and the External Record (β = 560.02) conditions than in the Individual Learning condition (β = 340.82; see Fig. 9).

Fig. 9
Fig. 9
Full size image

The relationship between the accuracy of payoff-biased copying and arrowhead performance in the Individual Learning, View All Models and External Record conditions. The dot points indicate the arrowhead performance scores for each participant. The black straight line is the linear model fit, and the light gray shaded area is the bootstrapped 95% CI.

Discussion

The extent to which large populations enhance Cumulative Cultural Evolution (CCE) is a matter of ongoing debate. In this paper, we report two large-scale experiments that identify the conditions under which large populations can enhance CCE. Experiment 1 demonstrates that attention filtering—the ability to selectively attend to a single high-payoff solution—enables large populations to enhance CCE. Experiment 2 identifies a novel route to enhanced CCE in large populations: the use of external representations that reduce cognitive load, facilitate attention filtering and enable solution recombination.

In Experiment 1, task performance was higher in each of the Social Learning conditions relative to the Individual Learning condition, underscoring the fundamental value of social learning. Importantly, a population size benefit emerged only in the attention filtering condition, where participants could focus on the arrowhead produced by a single successful group member. Attention filtering may commonly occur in situations where people have access to indirect indicators of information quality, such as scientific citation rates45,46 and social media follow counts47, and may serve to reduce cognitive overload and support payoff-biased social learning by focusing attention on high-payoff information. This aligns with broader evidence that constraining information flow—whether through partially connected networks48,49,50 or via dyadic interaction in language evolution studies51,52,53,54—can act as a form of attention filtering, helping large populations more effectively identify, copy, and build upon high-payoff innovations. These results support Henrich’s theoretical model15: previous experiments suggest that payoff-biased copying, which underpins the model’s population size effect, is disrupted by cognitive overload in larger populations; however, our results show that attention filtering enables people to selectively attend to and accurately copy high-payoff solutions in large groups. Therefore, Henrich’s model is plausible when attention filtering is possible, clarifying when larger populations facilitate CCE.

Experiment 2 explored whether an external record—a persistent on-screen display of previous arrowheads and their performance—could also enhance CCE. Human reliance on external representations is pervasive, from ancient cave art dating back 40,000 years55,56 to writing systems developed 5,000 years ago57, and extending to modern digital media. While research has primarily focused on individual58,59 and collective16 contributions to CCE, little attention has been given to how external technologies43—such as notebooks and smartphones—extend our internal cognitive capabilities and influence CCE (but see Hohol et al.60 for historical examples of how external technologies may have promoted cumulative culture). In Experiment 2, task performance was higher in the Social Learning conditions compared to the Individual Learning condition, replicating the findings from Experiment 1, and reinforcing the value of social learning. Importantly, a population size benefit emerged only in the External Record condition. Like attention filtering, the external record allowed participants to more accurately copy the high-payoff solutions produced by a single successful demonstrator. These findings highlight how external representations can scaffold cognitive processes—such as memory, attention, and evaluation—thereby enhancing CCE in larger groups.

In addition to enabling attention filtering, the External Record also created opportunities for solution recombination. Because we had no direct measure of recombination, we conducted an exploratory comparison between the View 1 Model condition (Experiment 1) and the External Record condition (Experiment 2) to examine whether recombination might have contributed to enhanced CCE. Payoff-biased copying was stronger in the 6-person groups of the View 1 Model condition (β = −0.06) than in the External Record condition (β = −0.03), suggesting that participants in the External Record condition may have attempted recombination (i.e., copying more than just the highest-ranked arrowhead). However, there was no statistical evidence that this enhanced CCE: arrowhead performance improved more strongly over trials in the View 1 Model condition (β = 26.43) than in the External Record condition (β = 14.52). The enhanced CCE observed in the External Record condition of Experiment 2 may primarily reflect the effect of attention filtering rather than recombination, although this interpretation is tentative given the exploratory nature of the analysis and the lack of a direct measure of recombination.

Together, the experiments reported demonstrate that larger populations enhance CCE only under conditions that help people manage informational complexity: either by filtering who they learn from or by externalising the information they need to evaluate and copy. These findings are consistent with the broader research on group performance and decision-making61. As group size increases, coordination problems and cognitive overload can reduce group productivity relative to potential productivity62,63,64. For example, in verbal brainstorming groups, the average number of ideas generated per person declines as group size is increased65. Similarly, coordination problems and cognitive overload hinder casual conversation in groups larger than four members66,67, and cause communication to become more centralized in large decision-making groups68. Yet these limitations can be mitigated. For example, electronic brainstorming tools that allow ideas to be submitted and reviewed asynchronously in a shared workspace—similar to the Experiment 2 External Record condition—enable larger groups to generate more ideas than smaller ones69,70. Our findings suggest that similar mechanisms—selective attention and external records—can mitigate the downsides of large-group cognition in a cultural learning task.

A potential limitation of our experiments is the use of 6-person groups as proxies for ‘large’ populations. Ethnographic studies of hunter-gatherer societies—the primary context in which the relationship between demography and cumulative culture has been studied—typically involve population sizes greater than fifty71. This raises the concern that the findings from our 6-person groups may not generalize to larger populations in real-world contexts. We argue this does not undermine the generalisability of our findings for two reasons. First, the effective population size relevant to cultural transmission often reflects the size of decision-making groups, not total census size. In modern societies, decision-making often occurs in small-to-medium-sized groups, such as juries (6–15 members72 and committees (4–21 members73. This may be adaptive, with simulation studies showing that small-to-medium-sized groups outperform larger groups in realistic (complex) task environments74,75. Second, our goal was not to replicate real-world population structures, but to isolate the cognitive and informational conditions under which larger populations enhance CCE. Without attention filtering or an external record, our 6-person groups failed to leverage the benefits of population size. In larger real-world groups, the challenge of cognitive overload would likely be magnified, further increasing the importance of these mechanisms to help learners manage complexity.

Conclusion

Our findings demonstrate that large populations enhance CCE under specific conditions. Experiment 1 shows that attention filtering enables individuals to harness the adaptive potential of large groups by selectively attending to and more accurately copying high-payoff solutions. This helps resolve the mixed findings in the experimental literature17 but it does not fully account for variability in the ethnographic and archeological records23,24,25. We view Experiment 1 as a proof of concept, highlighting a key cognitive mechanism that can support CCE in larger populations. Experiment 2 identifies a complementary and novel route to enhanced CCE: the use of external representations that extend cognitive capabilities. These results align with historical evidence38, demonstrating the value of external tools—from cave art and writing systems to digital media—for supporting cultural accumulation and innovation in large populations.