Introduction

Cultural evolution of human language

There are approximately 7000 languages in the world today, and these languages are unique in terms of their phonology, morphology, and grammar. Such diversity is typically observed in the biological world. Organisms began as prokaryotes, and through the processes of ‘macro-evolution’ (major changes in species occurring over a long period) and ‘micro-evolution’ (small changes in diversity within species), these processes resulted in species of various creatures. In evolutionary linguistics, it is believed that modern languages have been diversified through a similar process (Dediu et al., 2013); that is, language ability could have emerged in the human species through the macro-evolution, and then it could have led to diverse modern languages in the world through the micro-evolution. The former is called ‘biological evolution’, and the latter is called ‘cultural evolution’. This paper focuses on the cultural evolution of language, especially grammar.

The study of cultural evolution has dealt with various subjects, including institutions, morality, and religion (Mesoudi, 2011). Unlike the evolution of biological species, most of these phenomena are not visible in traits. It is often said that ‘language does not fossilise’ (Hauser et al., 2002). The cultural evolution of language has been particularly difficult to study. However, the progress of computational models, analytical methods, and laboratory experiments has led to the development of research on the cultural evolution of language (Blythe, 2012; Kirby et al., 2008, 2014). In addition, the appearance of big data and information technology has allowed us to quantify the dynamics of the cultural evolution of language (Lieberman et al., 2007), as already seen in the evolution of biological species. Notably, Newberry et al. (2017) analysed the historical changes in the inflexions of verbs and the auxiliary verb DO in English, demonstrating that these changes are caused not only by natural selection but also by random drift (neutral evolution).

In this paper, we focus on a specific case of the cultural evolution of grammar—auxiliary verb selection in the evolution of the English perfect. Although this phenomenon was previously documented with the analysis of small corpora, we quantify it by using three large-scale data sources, providing insights into the evolutionary dynamics of English grammar.

Auxiliary verb selection and evolution of the perfect

It is widely recognised that, in most cases, the auxiliary verb BE was replaced by HAVE throughout the evolution of the English perfect. This phenomenon is referred to as ‘auxiliary selection’.

First, we review previous studies of the perfect in historical linguistics. In many languages, notably Indo-European languages, the auxiliary verb BE or HAVE is used to construct the perfect (Ackema and Sorace, 2017; Aranovich, 2007; Sorace, 2000). In such cases, certain restrictions can be observed in which an auxiliary verb is chosen. In general, HAVE is chosen for transitive verbs, while BE is chosen for intransitive verbs (see (1)).

(1)

a.

Ria heeft de schuur geverfd.

(Dutch)

  

Ria has the shed painted.

 
  

‘Ria has painted the shed.’

 
 

b.

Onze nieuwe piano is eindelijk gearriveerd.

(Dutch)

  

our new piano is finally arrived

 
  

‘Our new piano has finally arrived’

(Ackema and Sorace, 2017, p. 2)

However, in the case of intransitive verbs, BE is selected for the unaccusative verb, which assigns a theme role to an underlying object (see (2)), while HAVE is selected for the unergative verb which assigns an agent role to its subject (see (3)). This phenomenon is referred to as auxiliary selection.

(2)

a.

Ma sœur est arrivée/*a arrivé en retard.

(French)

  

my sister is arrived/has arrived late

 
 

b.

Der Zug ist/*hat spät angekommen.

(German)

  

the train is/has late arrived

(Sorace, 2004, p. 256)

(3)

a.

Les ouvriers ont travaillé/*sont travaillés toute la nuit.

(French)

  

the workmen have worked/are worked whole the night

 
 

b.

Kurt hat/*ist den ganzen Tag gearbeitet.

(German)

  

Kurt has/is the whole day worked

(ditto)

In English, however, there has been a substantial change in the auxiliary verb selection during the process of cultural evolution (Huber, 2019; author, 2011). In modern English, only HAVE is used as an auxiliary verb (see (4)), but the perfect use of the auxiliary verb BE was often seen (see (5)) from Old English through Late Modern English. It is recognised today that the number of such perfect variations using the verb BE decreased markedly in the 19th century (Rydén and Brorström, 1987).

(4)

a.

John has/*is eaten pizza.

 
 

b.

John has/*is worked for an hour.

 
 

c.

John has/*is arrived

(McFadden, 2007, p. 675)

(5)

a.

oþþæt wintra bið þusend urnen

 
  

until winters(GEN) is thousand run

 
  

‘until a thousand years have passed’

(Phoen 363; Denison, 1993, p. 359)

 

b.

Whanne he escaped was

(Chaucer, CT.Mk. VII.2735; ditto)

 

c.

yet Benedicke was such another; and now is he become

 
  

a man.

(Shakespeare, Ado III.iv.86; ditto)

To analyse the frequency and structural changes of the HAVE and BE perfect, McFadden and Alexiadou (2010) used corpora of Old English through Early Modern English (The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE), The Penn-Helsinki Parsed Corpus of Middle English (PPCME2), The Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), and McFadden (2017) used a corpus spanning Early Modern to Modern English (PPCMBE). These corpora are relatively small: YCOE (1.5 M), PPCME2 (1.2 M), PPCEME (1.7 M), and PPCMBE (1.4 M). As a result of these analyses, it was observed that the total frequency of the HAVE perfect increased in Late Middle English (1420–1500), and that of the BE perfect did not decrease until around 1700 and then decreased in the 19th century (1810–1861).Footnote 1

Previous studies succeeded in identifying some of the factors that influenced the increase of the HAVE perfect, which include linguistic factors such as past counterfactual modality, iterative, durative and atelic meanings, as well as telic eventualities and extralinguistic factors such as chronology and text type (author, 2011; McFadden, 2017; McFadden and Alexiadou, 2010; Rydén and Brorström, 1987). Here, we add quantitative evidence of directional forces for grammar evolution, providing detailed dynamics of each single verb (i.e., evolutionary speed and trajectory) based on computational analysis of large-scale data sources.

With reference to these studies, we aimed to assess whether this change was caused by natural selection or random drift using a large-scale dataset. We combined three large-scale data sources (as explained in the next section) to detect evolutionary forces from systematically selected verbs in the English perfect using a robust method. The results complement previous findings and provide insights into the cultural evolution of English grammar.

Data and methods

Data

Table 1 summarises the large-scale data sources used for this study. The first is Early English Books Online (EEBO), a historical corpus that contains British-related printed matters published between the 15th and 17th centuries (Lodge, 2017). EEBO contains various texts spanning genres, such as literature, history, philosophy, science, politics, law, and economy. Given a query, the EEBO database returns matched sentences with the relevant information, including the corresponding document IDs, pages, and years, from which we can compute the frequency of matched cases (counts).

Table 1 Three data sources used in this study.

The second is the Corpus of Historical American English (COHA), comprising American English texts in fiction, popular magazines, newspapers, and non-fiction (books) spanning the 19th to 21st centuries (Davies, 2010).Footnote 2 COHA is one of the largest structured historical corpora of American English, and it enables performing queries and providing matched sentences with their time stamps. Similarly, from this, we can compute the frequency of matched cases (counts).

The third is N-gram data from Google Books Ngram Viewer site (English, 2012 version) Google (2012) that is used to incorporate data spanning the 18th to 21st centuries, which EEBO and COHA do not cover. In this paper, we simply refer to this as Google Books data. Using web scraping, we searched and collected both British and American English publications between 1700 and 2000 at the site, as EEBO covers British English 1473–1700 and COHA does American English 1810–2009.

It should be noted that the collected Google Books Ngram Viewer site does not have the Part-of-Speech (POS) tag function; given a simple query, it only provides the relative frequency of words/phrases (NOT raw counts), without matched source texts. Thus, we need to adjust the resulting frequencies of Google Books N-gram search based on those from the two other corpora, which we call ‘scaling’ and explain later.

Intransitive verbs

As the be+PP (past participle) construction is used not only in the perfect but also in the passive, it is difficult for computer programs to automatically distinguish between these in large-scale corpora (EEBO and COHA) and especially in big data without grammatical tags (Google Books). Therefore, in this study, we focused only on ‘intransitive verbs’ as targets to accurately detect the perfect construction of be+PP. This is because intransitive verbs are not used in the passive, in principle. Due to this restriction, we cannot cover diverse expressions of the perfect, but instead, we can accurately count the occurrences of the perfect in the datasets. To select intransitive verbs, we used the Longman Dictionary of Contemporary English (LDOCE) Online (Education), which is a standard dictionary of the English language.

Selection of target verbs

To quantify the cultural evolution of the English perfect, we systematically selected intransitive verbs that appeared in all three datasets and had a high frequency within them. In this study, we used two groups of target verbs: (A) verbs selected based on the frequency and (B) verbs that did not meet the conditions in (A) but were used in prior studies, as shown next.

Verb group selected based on frequency

The Corpus of Contemporary American English (COCA; 400M tokens between 1990 and 2019) provides a full list of the most frequent contemporary words (n = 60,000), and we used this list as a starting point.

From this list, we filtered all verbs (Step 1) and then filtered intransitive verbs using LDOCE Online mentioned in the section “Intransitive verbs” (Step 2). As shown in Table 2, limiting the analysis to intransitive verbs decreases the number of target verbs, but it is a necessary procedure. This is because most English verbs exhibit properties of both intransitive and transitive verbs (van Gelderen, 2018).

Table 2 Steps for the frequency-based verb selection.

Then, from the filtered intransitive verbs, we selected only those that appeared more than 200 times in each of EEBO, COHA, and Google Books (Step 3). This is the strictest condition for our quantification. We also examined a mild condition (30 times in each dataset), as described later. We set these values based on our pre-investigation results: (1) no target verbs remain when the threshold is larger than 200, and therefore we set it as the upper bound; (2) when the threshold is below 30, the computed frequencies of less popular verbs around the threshold are just unreliable and therefore it was set as the lower bound.

Finally, we selected only those intransitive verbs with the be+PP frequency of 0.5 or greater in our oldest corpus EEBO. This eliminates intransitive verbs for which the have+PP form was already dominant in earlier English. This process aims to ensure that, for all the target verbs, the evolution of the perfect began in the period covered by EEBO, COHA, and Google Books.

After these steps, we obtained 13 intransitive verbs that were in the be+PP form in earlier English (EEBO) and sufficiently frequent in the three data sources (1473–2000).

Verb group selected based on prior studies

In addition to Group A, we investigated additional example verbs. Among intransitive verbs that were excluded using the frequency-based method in verb group A, verbs that were already considered in prior studies would be examined separately from those of Group A. Among verbs listed in LDOCE Online with both intransitive and transitive usage, prior studies (author, 2011; Rydén and Brorström, 1987) indicated that the six verbs were frequently used in the perfect with BE (see Group B in Table 3). Therefore, we analysed them separately from Group A and expected to find evolutionary trajectories different from those of Group A, which may give us additional insights into the cultural evolution of grammar.

Table 3 Target verbs.

Detection of evolutionary forces

Unidirectional selection, in which new forms replace older forms over generations, often (but not always) produces an S-shaped growth curve when viewed as a change in allele frequency in a population (Blythe and Croft, 2012), which is often accepted as evidence of directional force favouring one variant over others. The historical changes in the selection of auxiliary verbs be/have+PP can also be expressed mathematically within the same framework. For detecting evolutionary forces, there are two established methods, the Frequency Increment Test (FIT) (Feder and Kryazhimskiy, 2014; Newberry et al., 2017) and a neural network-based classification (TSC) (Karsdorp et al., 2020). Both methods are based on the Wright–Fisher model (Ewens, 2012), which describes the drift dynamics of two competing types in a population of fixed size N and uses it as the null model.

The FIT is a method to detect selection in time series data, applying it to population genetics experiments and historic DNA samples (Feder and Kryazhimskiy, 2014). The FIT rejects ‘random drift’ (the null hypothesis) when the distribution of increments in the frequency of normalised alleles shows a mean value that deviates significantly from zero, which suggests the possibility of evolutionary forces or ‘selection’. However, the FIT has several drawbacks as reported in the literature (Karjus et al., 2020): it can be sensitive to how the corpus/dataset is organised into temporal segments (i.e., binning), and it assumes the normality of data, which the real data often violate.

To resolve these problems, Karsdorp et al. (2020) proposed a deep neural network model, in which the problem of detecting evolutionary forces is formulated as a binary classification task for a given time series data. This is called the neural time series classification (TSC). This model was trained on time series of cultural change simulated by the Wright–Fisher model. They demonstrated that the neural TSC could resolve problems mentioned above associated with the FIT: it has robustness for specific binning methods, and the normality assumption of frequency increments does not play a role in this model. Nevertheless, the neural TSC can consistently and accurately distinguish time series produced by random drift from time series subject to selection pressure.

Therefore, in this study, we used the neural TSC library (Karsdorp, 2020) to detect evolutionary forces underlying the transition from be+PP to have+PP. In addition, for reference, we also used the FIT in our datasets. However, as explained later (and in Supporting Material), our post-hoc power analysis shows that the data size is insufficient to apply the FIT.

Data processing

Figure 1 shows schematic illustrations of our data processing for detecting evolutionary forces. As explained in the section “ Selection of target verb”, we restricted our search queries to the basic form of be/have+PP in EEBO, COHA, and Google Books to prevent false positives (i.e., be+PP for the passive) and accurately compute their frequencies of matches across time for each target verb. Since the spellings of (auxiliary) verbs vary over history, we referred to the formats found in EEBO and formulated lists (Tables S1–S3 in Supplementary Material) to perform a comprehensive search of be/have and the past participle of each verb.

Fig. 1: Data processing.
figure 1

A Construction of the frequency time series for a target verb using three data sources, in which the years represent the time range covered by each corpus in our analysis. B Binning method. The data size per bin should be comparable.

When merging the search results of be/havee+PP from the three datasets, it should be kept in mind that, for EEBO and COHA, we can retrieve the patterns that match the be/have+PP construction, along with the year in which the sentence appeared (see Fig. 1A upper middle table). By aggregating these data, we can construct a frequency time series given a target verb (raw counts of occurrences per year). As mentioned before, with the Google Books Ngram Viewer, we can only obtain the percentages of matches adjusted by year (relative frequencies of occurrences per year) without the matched patterns. Furthermore, as Google Books is more than three orders of magnitude larger than those of the other two corpora (see Table 1), we must properly adjust the search results by ‘scaling’.

Google Books and COHA have an overlapping period between 1810 and 2000. We, therefore, took advantage of this fact to scale the search results from Google Books so that the resulting frequencies roughly matched those from COHA. Specifically, for the years 1810–2000, we focused on all verbs to find the average frequency of be/have+PP in COHA (fC), and then obtained the average frequency in Google Books (fG) to estimate a scaling constant of CfC/fG. The multiplication of Google Books search results by C was used as data to complement the periods not covered by EEBO and COHA (i.e., 1700–1810). More specifically, we reconstructed the search results for Google Books as those from EEBO and COHA (like the one in Fig. 1A upper middle) based on the scaled frequency. No scaling was applied to EEBO as the corpus size was comparable to COHA, and there was no overlapping period between these corpora.

Figure 1B illustrates the binning for constructing relative frequency time series of be/have+PP, which used the same setting reported in Newberry et al. (2017), where the bin size was set to \(\log N\) (N being the total counts). Then, we split the data so that each bin had approximately the same data size, which is a necessary treatment because the neural TSC (and FIT) assumes the frequency changes within the same population (i.e., the Wright–Fisher model Ewens (2012) as the null model). The median of the year data in each bin was used as the time for the bin.

The frequency time series of be/have+PP are shown in Figs. S1 and S2 in Supplementary Material. For both Groups A and B, we can confirm that for all verbs, the trend of frequency increase during the period of overlap between Google Books and COHA (1810–2000) is consistent, and the frequency at the border between EEBO and Google Books is mostly consistent. In other words, this indicates that the Google Books results can properly complement the other two corpora through scaling and binning. Although we are aware of little gaps between 1700 and 1750 in the combined time series, this may not considerably affect the results of the neural TSC, because the frequency time series were smoothed by binning and the inflection point seems after these gaps, which was about 1800 according to the literature (McFadden, 2017; McFadden and Alexiadou, 2010).

Results

Figure 2 shows the historical changes in the frequency of be/have+PP constructions for the 13 verbs in Group A that were selected based on frequency. For all verbs except bound, there was a clear increase in frequency from be+PP to have+PP. While be+PP was dominant with most verbs before 1600, there was a sharp increase in the frequency of have+PP between 1750 and 1800. This result clearly shows that the form of have+PP became dominant during the cultural evolution of grammar.

Fig. 2: Historical frequency changes of be/have+PP for the 13 verbs in Group A.
figure 2

There was a noticeable rise in frequency from be+PP to have+PP.

Furthermore, we performed the same analysis on the six intransitive verbs from Group B. These verbs were analysed in prior work and excluded from the selection criteria of verb Group A. As shown in Fig. 3, similar evolutionary trends can be seen as in Group A; that is, be+PP transitions to have+PP. We also note that the earlier onset (‘escape’) and the later onset (‘descend’) are identified in the frequency increase, suggesting individual differences between verbs.

Fig. 3: Historical frequency change of be/have+PP for the six verbs in Group B.
figure 3

There was a noticeable rise in frequency from be+PP to have+PP.

Were the frequency changes of be/have+PP due to random drift or directional forces at play? The previous studies (McFadden, 2017; McFadden and Alexiadou, 2010) did not address this question quantitatively. To examine this, we applied the neural TSC to all the verbs in Groups A and B. Table 4 summarises the results, where if the probability is above 0.5, it is deemed as ‘selection’. Of the 19 verbs, 17 verbs were classified as ‘selection’. The verbs classified as random drift were ‘bound’ from Group A and ‘go’ from Group B. This is visually confirmed by Figs. 2 and 3; these verbs did not exhibit an apparent increasing trend or a tipping point in the historical frequency changes of be/have+PP. It is likely that these verbs were not classified as ‘selection’ because the increase in have+PP was suppressed due to the presence of the be+PP passive usage (e.g., ‘England is bounded on the north by Scotland.’ Is bounded here is passive.) or the adjective usage of PP (e.g., ‘His money was gone and his health broken.’ Gone here can be regarded as an adjective.). Note that ‘bound’ is categorised as an intransitive verb by LDOCE.

Table 4 Neural TSC result for Groups A and B.

For reference, we also conducted the FIT for our datasets. The result is shown in Table S4 in the Supplementary material. Our post-hoc power analysis revealed that the estimated power d < 0.8 for all verbs, indicating that the data size was insufficient for this statistical test (Cohen, 2013), even though we used the largest data sources. Therefore, we did not use the FIT results for discussion.

Discussion

Evolutionary forces in the cultural evolution of grammar

In modern English, the perfect take the form of have+PP, and this grammatical rule applies to most verbs. However, in earlier English, there were verbs that formed the perfect with be+PP. As mentioned earlier, the prior studies have addressed this phenomenon at the aggregated level; on average, the frequency of the HAVE perfect started increasing in Late Middle English (1420–1500), and that of the BE perfect started decreasing in the 19th century (1810–1861) (McFadden, 2017; McFadden and Alexiadou, 2010). Similar trends can be observed in Figs. 2 and 3.

In this study, we quantified evolutionary forces by applying a robust neural network-based classification (the neural TSC) to the large-scale data sources: EEBO, COHA, and Google Books with scaling. We found that most verbs in Groups A and B clearly exhibited an increase in the frequency of have+PP, among which 17 verbs were classified as ‘selection’ by the neural TSC. Two exceptional cases are explained as the existence of the be+PP passive usage or the adjective usage of PP, which can suppress the increase in the frequency of have+PP. They could cause the neural TSC to fail in classification. Given the fact that most verbs in Groups A and B exhibited substantial frequency increases in the form of have+PP, it is unlikely that the English perfect evolved through random drift; selection might have played a major role in the evolution of grammar. Although previous studies (described in the section “Auxiliary verb selection and evolution of the perfect”) identified linguistic and extralinguistic factors accounting for the rise of HAVE and the corresponding decline of BE with intransitive, our analysis allows us to not only reconfirm the tipping point of the BE to HAVE transition at an aggregated level but also provide evolutionary speeds, patterns, and dynamics for each verb.

The role of random drift in language evolution has been emphasised in a previous study by Newberry et al. (2017), but our findings are contrasting in that respect.

We also examined the same experiment using a different setting. When lowering the threshold for an expanded selection of target verbs to those appearing at least 30 times in each dataset, comparable results were obtained. Most of the target verbs (33/36) were classified as ‘selection’ and the exceptional verbs (‘meddle’ in addition to ‘bound’ and ‘go’) are all explainable. For example, consider the sentence, ‘This election was meddled with by the Russians.’ Here, meddled is passive, although ‘meddle’ is classified as an intransitive verb by LDOCE Online. Thus, this result further supports the conclusion that natural selection was the major force for the transition from be+PP to have+PP.

Limitations

There are limitations and several potential issues to be considered in our study. First, our conclusion is based on the neural TSC that was trained on time series of cultural change simulated by the Wright–Fisher model (as the null model). Thus, if the frequency time series of a verb violated this assumption, the neural TSC could not detect evolutionary forces even if they existed, although the same applies to the other methods, including the FIT. Similar to Newberry et al. (2017), the combined datasets of EEBO, COHA, and Google Books include British and American English in a mixture, which might affect the raw counts of matched search results because the pace at which be/have+PP evolved could be different in British and American English, but the overall tendency holds as we computed the relative frequency of be/have+PP. In addition, there is a ‘bias’ in Google Books resulting from the increasing inclusion of scientific texts (Pechenick et al., 2015). Verbs that are commonly used in scientific contexts differ from those that are prevalent in other contexts, and as a result, the relative frequency of be+PP/have+PP may also vary by genre. Therefore, it is crucial for future research to acknowledge the significance of genre as a factor in the analysis.

Second, we showed that selection may have played a role in the evolution of the English perfect in many verbs. However, the linguistic reasons for the rise of have+PP and the fall of be+PP are excluded from our consideration, although linguistic factors, as well as extralinguistic ones, are relevant to the dissipation of be+PP, as explained in the previous studies. Moreover, it is known that throughout its evolution, the perfect has become capable of expressing not only ‘completion’ and ‘result’, but also ‘experience’ and ‘continuity’. We speculate that the development of such new functions might be related to the shift of auxiliary verbs from BE to HAVE. This may be because functional differentiation between BE and HAVE could be adaptive to avoid expressive confusion. That is, BE is often used for the passive rather than the perfect, and therefore be+PP has always been at risk of being mistaken for the passive. The evolution of the have+PP rule reduces such a risk, allowing for the expression of complex meanings. This is still at the stage of a hypothesis and needs further investigation. Going forward, we should conduct further studies on the relation between the evolution of the perfect and the functional developments within the categories of Aspect and Modality. Moreover, with the same analytical framework, languages other than English should be analysed to investigate the universal dynamics of linguistic diversity resulting from cultural evolution.

Summary

In summary, our study confirmed the findings of prior work (Huber, 2019; author, 2011; McFadden, 2017; McFadden and Alexiadou, 2010; Rydén and Brorström, 1987) using the larger data sources and further provided evidence of the possibility of evolutionary forces at work in the transition from be+PP to have+PP for each verb. Whether the evolution of the perfect is due to selection or drift in other languages remains an important future research question. Our study provides detailed descriptions of processes helpful for conducting such attempts in multiple languages, which is intrinsic to developing the theoretical framework of the cultural evolution of grammar.

Data accessibility

The data sources used in this study are available: EEBO (https://www.english-corpora.org/eebo/); COHA (https://www.english-corpora.org/coha/); Google Books (https://books.google.com/ngrams/). The code and process data are available at https://github.com/soramame0518/have_be_pp.