Introduction

Interlanguage is a unique linguistic system that learners develop in the process of second language (L2) acquisition, which reflects features from both their first language (L1) and the target language (Selinker, 1972). In the past two decades, Complex Dynamic System Theory (CDST), which stems from Complex System Sciences (Larsen-Freeman, 1997; Holland, 2006), has gained increasing attention in investigating the complexities and dynamics of L2 development, shedding new light on how human language structures emerge from dynamic language usages (Ellis, Larsen-Freeman (2009a); Chen & Wang, 2023).

Under the CDST framework, previous empirical studies have measured lexical and syntactic complexity indices such as vocabulary richness, mean lengths of language units, and fine-grained phrasal and clausal features (Wolfe-Quintero et al., 1998; Kyle & Crossley, 2018; Chen & Xu, 2019), revealing several critical properties of dynamic interlanguage systems such as non-linearity, phase transition, and self-organization (De Bot et al., 2007; Hiver et al., 2022). However, two major challenges remain: how to measure linguistic features more consistently (as highlighted by Deng et al., 2021) and how to explore interactions between language sub-systems in L2 development. For instance, Verspoor et al. (2012) demonstrated that word length and word frequency remained unchanged across five different proficiency levels, while the Guiraud index (root-Type Token Ratio, root-TTR) showed variations. Additionally, chunks have emerged as one of the most accurate predictors of proficiency level. Therefore, further examinations, including the exploration of new indices, are necessary across various second languages, proficiency levels, and language backgrounds.

Complex network sciences, which emerged at the end of the 20th century, may provide a possible solution. A network is a graph composed of nodes and links, where nodes represent the basic elements of a system, and links represent the interactions between them (Liu, 2008a). Real systems are usually complex network systems with non-trivial topological features that are not found in simple networks (e.g., random networks), such as small-worldness and scale-freeness (Watts & Strogatz, 1998; Barabási & Albert, 1999). Language is a real complex adaptive system (CAS) that can be modeled as a complex network (Cong & Liu, 2014). Linguistic networks at various levels can be seen as language sub-systems that have been constructed and explored by researchers, such as phonological networks, morphological networks, phrasal networks, syntactic networks, and semantic networks (Cong & Liu, 2014; Chen et al., 2018; Dattner et al., 2022).

The complex network model is intimately linked to the complex systems theory, which posits that a system is a group of interconnected and interdependent elements that behave in accordance with a set of rules to constitute a cohesive whole (Backlund, 2000). Unlike the traditional connectionism paradigm in cognitive sciences (Williams, 2006), the complex network approach (also known as Cognitive Network Science in cognitive sciences) can not only capture the connections between relevant properties of the language system and human cognition (Sizemore et al. 2018) but also provide practical tools and methods to process and visualize large-scale materials and data (Mehler et al., 2016). For example, the complex network approach has found extensive applications in diverse areas such as authorship attributions (Machicao et al., 2018), stylometry analyses (Quispe et al., 2021), and investigations into language complexity (Cong & Liu, 2014).

Moreover, in recent years, the complex network approach has gained significant popularity in the field of language development and acquisition, particularly in psycholinguistics (Castro & Siew, 2020; Ciaglia et al., 2023). In L2 studies, the network approach has been applied to analyze syntactic structures based on dependency treebanks, demonstrating its ability to differentiate between L2 learners of different proficiency levels. Recent data-driven studies have used network analyses to investigate overall syntactic development in L2 learners, providing new insights into applied linguistics research. For example, Jiang et al. (2019) applied a syntactic network approach and discovered that English L2 learning is characterized by a gradual approximation to the target language rather than a sudden emergence as in L1 development. In another study, Hao et al. (2021) found varying regularities in the development of a syntactic network between interlanguage and native language acquisition, highlighting the network parameters’ ability to differentiate interlanguage modalities and indicate L2 proficiency in L2 Chinese.

However, the syntactic network approach particularly relies on dependency-treebank resources, which require a lot of manual work to ensure parsing accuracy, making it difficult to use frequently to monitor learners’ interlanguage development (especially in the Chinese context). Additionally, previous studies have shown that the annotation scheme used can significantly affect the results of dependency measures (Yan & Liu, 2022).

Given that complex linguistic structures can emerge from the co-occurrence of linguistic units in actual language use (Bybee & Hopper, 2001; Cong & Liu, 2021), this paper adopts a simpler word co-occurrence network approach to measure L2 Chinese development. Word co-occurrences, or bi-gram words, are increasingly being recognized as an important feature for distinguishing L2 learners’ proficiencies under the phraseological complexity construct in recent years (Arnon & Christiansen, 2017; Garner et al., 2019; Vandeweerd et al., 2022). Moreover, unlike phraseological complexity indices that need a reference corpus, the word co-occurrence network approach does not require extra resources.

Previous studies consistently highlight the reliability of bigrams or word co-occurrences as predictors of L2 proficiency, with higher proficiency levels being associated with more established and conventionalized word associations, as evidenced by indices such as TTR-bigram (Arnon & Christiansen, 2017; Bestgen & Granger, 2014; Durrant & Schmitt, 2009; Kyle et al., 2015). Furthermore, the predominant storage of linguistic information in the form of multi-word expressions or chunks/n-grams has led researchers to extensively investigate their role in L2 research, emphasizing the significance of rapid chunking of language input for language proficiency levels (Verspoor et al., 2012; Ellis, 2014; Christiansen & Chater, 2016; Saito, 2020). Notably, Liu (2008b)’s study based on 20 languages revealed that approximately 50% of syntactic dependencies consist of adjacent words, specifically bigrams, further highlighting the relevance of bigrams or word co-occurrences in capturing syntagmatic associations. These associations play a critical role in linguistic pattern recognition and language learning within the usage-based perspective of language development (Ellis et al., 2016; Schmid, 2020).

In addition, language processing models based on spreading activation, a fundamental concept in cognitive psychology (Collins & Loftus, 1975), necessitate the representation of language as a network (Hudson, 2007). Consequently, language can be viewed as a cognitive network (Cong & Liu, 2014). Linguistic networks based on word associations have the capacity to model various levels of language system architecture (Liu & Cong, 2014). Word co-occurrence networks, constructed using bigrams, have proven valuable in capturing the intrinsic relationships among numerous words within a relatively short period, providing an advantageous approach to revising and extending theories of lexical acquisition (Meara, 2009; Chen, 2021).

To illustrate the advantages of the bigram-network approach, consider the following example, a dialog between a native speaker (NS) and a non-NS (NNS):

NS: Are you a cat person or a dog person?

NNS: I’m a dog person. I really like dogs.

Traditional indices based on word frequency treat the text as a collection of isolated words, disregarding the meaningful relationships between words in their specific order. For both the NS and NNS texts, the TTR value is 0.7778. However, by analyzing bigrams, which capture adjacent word pairs, we can uncover the syntagmatic relationships between words. In the given example, bigrams like “cat person” and “dog person” are deeply entrenched in the mental lexicon of NSs and hold significant importance in language communication and language learning. These bigrams can exemplify the alignment or priming effects observed between the NS and NNS, as evidenced by the NNS’s repetition of the bigram “dog person”. Such alignment effects are known to have significance in L2 learning (Wang & Wang, 2015), suggesting that the NNS is attuned to and influenced by the language patterns and choices of the NS. The presence of alignment or priming effects can play a crucial role in facilitating language acquisition and the development of linguistic proficiency in L2 learners (Chen, 2021).

Furthermore, the lexical network approach offers a more comprehensive understanding beyond the analysis of individual bigrams. In the aforementioned example, both the NS and the NNS produced texts with TTR-bigram values of 1, indicating no apparent differences at first glance. However, upon closer examination of their respective lexical networks, distinct patterns emerge, as evident from network parameters such as node degrees and network density. Notably, in the NS’s lexical network, the shortest path length between “cat” and “dog” is 2, suggesting that the lexical network provides insights into the practical usage patterns of words in contexts, going beyond mere sequential information in sentences.

Therefore, this study aims to investigate the effectiveness of the lexical network approach in differentiating L2 Chinese proficiency levels, comparing it with traditional lexical complexity indices. It is worth noting that research on the complexity of L2 Chinese based on CDST has only emerged in recent years, with relatively limited studies in this area. Hu (2021) found that traditional syntactic complexity measures based on clauses and T-units have limited explanatory power for writing performance, while collocation-based phrase-level complexity measures can effectively predict writing proficiency. Additionally, studies on traditional lexical complexity indices have demonstrated their relevance under specific proficiency levels, registers, or language backgrounds, including measures of lexical diversity (Uber index, Guiraud index) and lexical sophistication (average word length (AWL), low-frequency words, lexical profile) (Wu et al., 2019; Qi & Liao, 2019; Chen & Xu, 2019; Liao, 2021). Specifically, this study will carefully select lexical network indices to compare with those lexical-complexity indices (including TTR-bigram and lexical density (LD), refer to Section “Network parameters and lexical complexity indices”). This study aims to address the following three research questions:

RQ 1: Are L2 Chinese lexical networks complex networks?

RQ 2: Can lexical network parameters distinguish different proficiency levels in L2 Chinese?

RQ 3: Are lexical network parameters more effective than traditional linguistic complexity indices?

Materials and methods

The corpora profile

We use the Guangwai-Lancaster Chinese Learner Corpus (GLCLC), a 1.2 million-word balanced L2 learner corpus of Mandarin ChineseFootnote 1, as our data source. This corpus consists of spoken and written texts produced by 1492 Chinse L2 learners of three proficiency levels (i.e., beginner, intermediate, and advanced levels) from different native language backgrounds.

As shown in Table 1, we chose to study L2 learners from two different language-family backgrounds: Thai and Romance languages (including French and Spanish)Footnote 2. We also included native Chinese texts from the Lancaster Corpus of Mandarin ChineseFootnote 3 for comparison. Since text genres can influence the quantitative results, we chose narrative texts from the written sub-corpus, with topics on my life, my story, or a memory trip. To ensure comparability between different language proficiencies, each sub-corpus consists of about 10,000 Chinese characters. Notably, in the subsequent analyses, we designate Thai/Romance 1, 2, and 3 to represent beginner, intermediate, and advanced levels, respectively.

Table 1 Profile of the sub-corpora chosen from the Guangwai–Lancaster Chinese Learner Corpus.

Network parameters and lexical complexity indices

After selecting learner texts, we used NlPIR-ICTCLASFootnote 4 to segment words and annotate part of speech. The network parameters were calculated using PajekFootnote 5. To compare the classification effect of network parameters, we also calculated lexical complexity indices using CTAPFootnote 6. All network parameters and lexical complexity indices are briefly introduced in the following:

Network parameters

The network parameters that we are most interested in are those closely related to language proficiencies (see Jiang et al., 2019; Hao et al., 2021; Chen, 2021), including clustering coefficient (C), average path length (<l>), and clusters (cl), etc. For comparison, we also investigated some common network parameters, such as network size (n), number of edges (m), network density (ND), and network centrality (NC). Overall, we calculated a total of seven network parameters.

Figure 1 depicts two bigram networks constructed based on the dialog example provided in the Introduction section:

NS: Are you a cat person or a dog person?

NNS: I’m a dog person. I really like dogs.

Fig. 1: Examples of lexical networks.
figure 1

The left plot represents the lexical network of the native speaker, and the right plot represents the lexical network of the non-native speaker.

In Fig. 1, the left plot represents the bigram network constructed from the text produced by the NS, while the right plot represents the bigram network from the text produced by the NNS. It is important to note that all the words in the network have been lemmatized to ensure consistent representation and analysis. However, it should be mentioned that this lemmatization process is not applicable to Chinese words, as they do not undergo inflectional changes in the same way as many other languages.

Network size (n) refers to the number of nodes in the network, which represents the words (types) in this context. In Fig. 1, both the NS’s and NNS’s networks have seven nodes.

The number of edges (m) represents the connections between the nodes, specifically the number of distinct bigram types. In Fig. 1, the NS’s network has 8 edges, while the NNS’s network has seven edges.

ND is a measure ranging from 0 to 1 that indicates the proportion of realized connections out of all possible connections in the network. For an undirected network with n nodes and m edges, the density is calculated as 2m/(n * (n − 1)). It provides insights into the overall density of the network. In language networks, sparsity is commonly observed, indicating a low probability of connections between random words. In this case, the ND values for the NS and NNS networks are 0.3809 and 0.3333, respectively. It seems that the ND value decreases in L2 Chinese syntactic development (see Hao et al., 2021).

The centrality of a node provides insights into its significance within the network. In this study, we focus on the most commonly used centrality measure, degree centrality, which represents the number of connections a node has. In Fig. 1, the node “person” has degrees of 3 and 1 in the NS and NNS networks, respectively. The centrality of the entire network is referred to as network centralization (NC), and it is roughly calculated as (Kmax/n) − ND, where Kmax represents the largest degree in the network. Degree centralization specifically indicates the strength of hubs or central nodes within the network. In language networks, degree centralization provides information about the prominence of nodes with high degrees. A higher degree of centralization indicates stronger combinational abilities of these nodes, whereas a lower degree of centralization suggests weaker combinational abilities. The NC value shows a growing trend in L2 syntactic development (see Hao et al., 2021).

The clustering coefficient (C) is a measure of how likely it is that nodes in a network form clusters or groups. It quantifies the probability that two neighbors of a node are also connected. For a node i with ki neighbors, the maximum number of edges between these neighbors is ki(ki − 1)/2. If Ei is the actual number of edges between these neighbors, then the clustering coefficient Ci of node i is 2Ei/ki(ki − 1). The clustering coefficient C of the whole network is the average of Ci for all nodes. A study based on written Chinese lexical networks reveals that the C value decreases as the language develops (Chen et al., 2018), which is consistent with the findings in L2 syntactic development (see Hao et al., 2021).

The average path length <l> measures the mean of the shortest distances between any two nodes in the network. For a pair of nodes i and j, the distance Lij is the number of edges in the shortest path connecting them. In Fig. 1, both the NS and NNS networks have a distance of 2 between the nodes “a” and “person”. It is important to note that, in general, the average path length values are typically not greater than 6. For instance, Chen et al. (2018) revealed an average path length of 4.746 for modern Chinese. This suggests that, on average, it requires a relatively small number of steps or connections to navigate between words within the language network.

The average path length is a valuable measure for assessing the efficiency and interconnectedness of a network. Smaller average path lengths signify higher connectivity and accessibility between nodes, facilitating efficient information flow and communication within the network. However, it should be noted that as the language system becomes more complex, the <l> value seems to increase to some extent, as evidenced by Chen et al. (2018). This implies that in more intricate lexical networks, the average path length may be slightly larger, indicating a potentially increased number of steps needed to navigate between words or concepts.

If a network has a small average path length <l>(<l>–<lrnd>), and a large clustering coefficient (C), then we can call it a Small-World network. However, this definition is rather qualitative because it is difficult to decide whether the observed value is small or large. In general, it requires comparing the observed <l> and (C) to that of a stochastic network. The stochastic network model used in this paper is the ER model proposed by Erdos & Rényi (1960). Humphries & Gurney (2008) defined a quantitative parameter, the S-Small-worldness, for measuring the Small World feature. The formula is as follows:

$$S = \frac{{C/C_{rnd}}}{{\langle{l}\rangle/\langle{l}_{rnd}\rangle}}$$

In this formula, C denotes the network clustering coefficient, Crnd represents the clustering coefficient of the ER random network with the same number of nodes and the same average degree; <l> denotes the average minimum path, and <lrnd>represents the average path length of the ER random network. When the S is greater than 1, the network can be considered as a small-world network. Moreover, if the network is also a scale-free network (as revealed by the power-law distribution of the degrees of nodes), it can be called a complex network (Chen et al., 2018).

A cluster in a network refers to a group of nodes that are highly connected to each other within the group, forming a cohesive substructure. Clustering is the process of identifying and delineating these groups or communities within a network. The size of a cluster, represented by the cluster value (Cl), can indicate the extent of interconnectivity and cohesion within the linguistic network. In the context of lexical networks, a larger cluster value typically suggests a richer linguistic conceptual system. This is because a larger cluster implies that there are more tightly-knit groups of words or concepts that are interconnected and share strong associations within the network. These clusters represent cohesive semantic or syntactic domains, indicating a greater depth and complexity in the organization and structure of language.

Lexical complexity indices

The Uber index (U), proposed by Dugast (1978), is commonly used to measure lexical diversity or variation in L2 development studies. It is computed as follows:

$$U = \left( {\frac{{\left( {\rm{log}}\,{\rm{Tokens}} \right)^2}}{{{\rm{log}}\,{\rm{Tokens}}} - {{\rm{log}}\,{\rm{Types}}}}} \right)$$

The Guiraud index, another commonly used index to assess the lexical richness or diversity of a text (Guiraud, 1954), is calculated by dividing the number of unique words (types) in the text by the square root of the total number of words (tokens):

$${Guiraud} = \frac{\rm{Types}}{{\sqrt {\rm{Tokens}} }}$$

The Guiraud index takes into account the impact of text length on lexical diversity. By dividing the number of unique words by the square root of the total words, it provides a normalized measure that allows for fair comparisons across texts of different lengths. A higher Guiraud index indicates a greater lexical diversity, suggesting a wider range of vocabulary used in the text.

AWL is the number of syllables per word in a text. It is obtained by dividing the total number of syllables by the total number of words in the text. AWL can indicate the sophistication of a text.

LD measures the linguistic sophistication of a text based on the proportion of content words (lexical units) and function words (grammatical units). The formula for calculating the LD is to divide the number of lexical items by the total number of words in the text.

A hapax legomena (HL) is a word that appears only once in a text or corpus. HL refers to the count of such words in a text or corpus.

Long word percentage (LWP) is the ratio of words with three or more syllables to the total number of words in a text or corpus.

The type-token ratio of bigrams (TTR-Bigram), or simply the ratio of the number of unique bigrams to the total number of bigrams in a text or corpus, is used to measure the diversity or variation of bigrams within the text.

Statistical analysis

We used hierarchical clustering with SPSS 20 to test whether the indices described above can differentiate L2 learners’ language proficiency levels. The dendrograms based on Euclidean distance show the levels of similarity at which the clusters were formed.

We also computed the information gain of each index to assess how well a feature predicts language proficiency. Information gain is the amount of information obtained about the target variable (proficiency) when a feature (network parameter or lexical complexity index) is known. It measures the reduction in uncertainty about the target variable after splitting the data by the feature (see Witten & Frank, 2005). This reduction is quantified by comparing the entropy of the target variable before and after the split. The difference in entropy is the information gain.

Here is the formula for computing information gain (taking the index of AWL as an example):

$${{{\mathrm{IG}}}}\left( {{{\mathrm{X}}}} \right) = {{{\mathrm{H}}}}\left( {{{\mathrm{Y}}}} \right) - {{{\mathrm{H}}}}\left( {{{{\mathrm{Y}}}}\left| {{{\mathrm{X}}}} \right.} \right)$$

where IG(AWL) is the information gain of feature AWL, H(Y) is the entropy of the target variable (language proficiency), and H(Y | AWL) is the conditional entropy of the target variable given feature AWL (i.e., the entropy of language proficiency after splitting the data based on the values of the Average Word Length index).

The information gain tells us how much information about the target variable (language proficiency) is gained by knowing the value of the Average Word Length index. A higher information gain indicates that the Average Word Length index is more useful in predicting language proficiency.

Results

We report the results of sub-corpora classification based on the lexical network parameters and lexical complexity indices. Further analysis was also conducted using information gain measures of each predictive variable.

The Lexical network approach results

We constructed and measured the seven word-co-occurrence networks with Pajek. Figure 2 presents the lexical network plots, with weighted nodes based on degree centralization and colored node groups based on degree partitions.

Fig. 2: Lexical networks based on the learner corpora.
figure 2

(Thai/Romance 1, 2, and 3 correspond to beginner, intermediate, and advanced levels, respectively).

As depicted in Fig. 2, the lexical networks of all seven sub-corpora, similar to the syntactic networks of L2 learners (Jiang et al., 2019), exhibit characteristics of complex networks, which differ significantly from the tree-like networks observed in young children (less than 2 years old), as described by Corominas-Murtra et al. (2009). Additionally, it appears that higher-level networks tend to have fewer hub words. To verify this, we conducted a count of words with a degree of at least 100 for each network. The results confirm this observation to some extent, with Thai 1 to 3 containing 8, 7, and 7 such words and Romance 1 to 3 containing 9, 8, and 7 such words. In contrast, Native Chinese possess only 6 such words.

Furthermore, the S-Small-worldness values in Table 2 corroborate that all seven networks in Fig. 2 are small-world networks. As these networks also exhibit scale-free characteristics (as evidenced by the power law distributions of node degrees), they can be classified as complex networks. Specific network parameters are detailed in Table 2.

Table 2 Network parameter results.

The seven network parameters in Table 2 reveal some patterns regarding language proficiency level. As learners’ proficiency level increases, they produce more nodes, edges, and lexical community clusters (as evidenced by n, m, and Cl). At the same time, the global network expands (as indicated by average path length), and the local network becomes spare (as indicated by network density, network centralization, and clustering coefficient). It should be noted that corpus size can significantly influence these network parameter patterns. Since the corpus sizes are all around 10,000 Chinese characters in this study, we may infer that as L2 learners’ language proficiency increases, their vocabulary increases at the same time, then their bi-gram types also increase. However, despite the overall expansion of the network, the local network becomes sparse due to the limitations imposed by the corpus size, leading to fewer local connections between words. As the network grows, the immediate connections between words within local neighborhoods become insufficient to sustain high local network density and clustering, ultimately causing a decrease in these measures. This suggests that the network’s local structure becomes less interconnected and clustered as the language proficiency of L2 Chinese learners increases, possibly due to the limited size of the analyzed corpus.

The above analysis suggests that the lexical network parameters can be used to classify L2 Chinese learners of different proficiencies. Figure 3 shows the results of hierarchical clustering with standardized lexical parameter values.

Fig. 3: Hierarchical clustering results based on lexical network parameters.
figure 3

(Thai/Romance 1, 2, and 3 correspond to beginner, intermediate, and advanced levels, respectively).

Figure 3 shows that the clustering results are good, with each proficiency level clustered together and finally grouped with Native Chinese. To determine which parameter variable contributes most to the hierarchical clustering, we calculated the information gain of each variable. The results are shown in Table 3.

Table 3 Information gains of the network-based indices.

Based on the information gain values in Table 3, network size (n), number of edges (m), ND, and clustering coefficient (C) all contribute significantly to the clustering, performing better than NC and average path length (<l>). As for clusters (Cl), it contributes the least to the clustering.

The lexical complexity approach results

To compare the classification effect of network-based indices with traditional indices, we measured seven lexical complexity indices, including chunk-based indices, i.e., TTR-bigram, as displayed in Table 4.

Table 4 Values based on the Lexical complexity indices.

As indicated in Table 4, intermediate-advanced L2 learners demonstrate larger lexical diversity (as indicated by U and Guiraud indices) and bigram diversity (TTR-bigram), along with shorter word length (AWL), and an increased number of Hapax words but a decreased percentage of longer words (longer words percentage) compared to beginners. However, there are no significant changes in lexical density. Moreover, when compared to Native Chinese, intermediate and advanced L2 learners exhibit more similar linguistic features, diverging from beginners.

Based on the seven lexical complexity indices, we used the hierarchical clustering method again to classify the corpora. The clustering results are presented in Fig. 4.

Fig. 4: Hierarchical clustering results based on lexical complexity indices.
figure 4

(Thai/Romance 1, 2, and 3 correspond to beginner, intermediate, and advanced levels, respectively).

The clustering results in Fig. 4 show that only beginners and Native Chinese are well grouped, while intermediate and advanced learners are grouped based on language backgrounds but not language proficiency. This means that the seven lexical complexity indices work well for low-level and extremely high-level learners (Native Chinese) but not for distinguishing intermediate and advanced L2 learners.

To assess which lexical complexity index has stronger predictive power, we also calculated the information gain of each variable, as shown in Table 5.

Table 5 Information gains of the lexical complexity indices.

Based on the information gain values in Table 5, it is evident that lexical diversity (Guiraud) and bigram diversity (TTR-bigram) exhibit the strongest predictive power, while lexical sophistication (as indicated by AWL and LWP) shows a slightly weaker predictive ability. On the other hand, lexical diversity indices U, LD, and HL demonstrate the least predictive power among the analyzed linguistic features.

Discussions and conclusions

Language is a complex system with a probabilistic nature and complexity at the lexical, syntactic, and semantic levels (Cong & Liu, 2014). Traditional linguistic research methods often struggle to capture the intricacies of language systems. However, the complex network approach offers a potential solution. Previous applied-linguistic studies have focused primarily on specific semantic or syntactic structures in L2 complex systems (Ellis, Larsen‐Freeman (2009b); Ellis, 2014; Fogal, 2020). Researchers have found that the syntactic networks of L2 English learners exhibit small-world and scale-free properties similar to those of other complex networks (Jiang et al., 2019). Nevertheless, the lexical networks of L2 Chinese learners have received limited attention in this area of research.

This study explored word co-occurrence networks, or lexical networks, to measure L2 Chinese development, especially lexical development. We compared the clustering results based on lexical network parameters with traditional lexical complexity indices. The findings can be summarized as follows. Firstly, the L2 Chinese lexical networks are typical complex networks. Secondly, the lexical-network-based method can accurately classify the different proficiency levels, i.e., Beginner, Intermediate, and Advanced, as well as Native Chinese. Among the individual network parameters, network size (n), number of edges (m), ND, and clustering coefficient (C) have stronger predictive power than the others according to information gain values. Thirdly, the lexical-complexity-based method can distinguish beginners from intermediate-advanced learners, as well as native Chinese. However, it can hardly distinguish the intermediate from the advanced learners. Seen from the information gain values, lexical diversity indices Guiraud and bigram diversity indices TTR-bigram predict better than the others. Lastly, the classifications based on two different approaches show that the lexical-network-based method performs better than the lexical-complexity-based one. Moreover, it seems that the lexical-network-based method is not influenced by the learners’ native language backgrounds, while the other method may be affected at the intermediate and advanced levels.

Lexical complexity or richness is a multifaceted concept that encompasses lexical diversity, lexical sophistication, lexical density, and lexical errors (Read, 2000). Lexical diversity measures the variety of active vocabulary in a text, while lexical sophistication measures the proportion of relatively unusual or advanced words in a text. Our findings indicate that among the various aspects of lexical richness, the lexical diversity indices, as represented by Guiraud and TTR-bigram, exhibit greater effectiveness compared to the lexical sophistication indices (AWL and LWP) and the LD. These results suggest that measures of lexical diversity provide more valuable insights into the linguistic richness of the text or corpus under consideration. However, based on the lexical network classification results, we also propose that the co-occurrence relationships between words or word associations (as revealed by network density, clustering coefficient, and network cluster, etc.) are more informative than simple lexical frequency features.

Both the lexical-complexity-based approach and the lexical-network-based approach demonstrate that, besides isolated words, bigrams and word associations are crucial linguistic features for distinguishing language proficiency levels. Furthermore, while lexical networks are constructed based on bigrams, the network parameters encompass much deeper information beyond mere sequential combinations. This is evident in the analysis of word clustering and word connections, which provide valuable insights into the underlying structure and organization of the lexical network. Therefore, we argue that the framework of CDST should consider both lexical frequency features and lexical network features to better understand lexical richness and L2 development.

Recently, the multi-layer complex network approach has also risen in language cognition studies (Zaharchuk & Karuza, 2021), e.g., L1 development, L2 development, psycholinguistic relevant studies, etc. Based on these studies, we can see that the multi-layer complex network approach may offer a possible solution to L2 development studies in a more consistent and integrated way. Moreover, since the multi-layer network approach can simultaneously model the interactions between language users as well as between language units, the critical social structure effects in language learning and development (Duff, 2019) can be scrutinized. Taking the dialog between NS and NNS aforementioned for an example, a multi-level system can not only model the dynamic linguistic structures but also capture the interactions between the two speakers, probing into language acquisition mechanisms such as alignment effects (Wang & Wang, 2015; Chen, 2021). Furthermore, we can even analyze the role of generative AI (such as Chat-GPT) in facilitating language learning, extending the effect of data-driven learning.

In conclusion, the complex network approach provides a promising avenue for understanding the complexities of L2 development. Through data-driven analysis, this approach can reveal global and local patterns in the linguistic system of L2 learners, shedding light on the development of specific aspects of language use. Moreover, the emerging multi-layer networks approach can provide an even greater depth of analysis, offering a more comprehensive understanding of the complexities and dynamics of the multi-level language system. Future studies should continue to explore and refine these methods, as they hold great potential for advancing our understanding of L2 development.