Introduction

Task-based language teaching (TBLT), an important field in second language (L2) instruction, has been drawing attention from many researchers. Multiple factors involved in TBLT can affect learners’ L2 task performance, among which the pre-task planning (PTP) factor plays an indispensable role in the improvement of learners’ task performance operationalized as measures of complexity, accuracy, or fluency (CAF). In fact, given the importance of PTP in TBLT, many researchers (e.g., Robinson, 2011; Skehan, 1998; Skehan and Foster, 2001) have regarded PTP as a subcategory of task complexity and contended that manipulating it can promote learners’ language production in terms of CAF measures.

The necessity of exploring PTP-related themes in L2 learning can be demonstrated from both theoretical and pedagogical perspectives. Theoretically, researchers have suggested that L2 learners can benefit from PTP. For example, Kellogg (2008) pointed out that writing is a complex process during which learners consume a large amount of attentional resources to recursively deal with the online planning process, to generate sentences, and to review written texts. Kellogg contended that PTP can be helpful for learners to cope with such a process. Furthermore, Abrams and Byrd (2016) argued that PTP can make linguistic resources and ideas more accessible to learners when writing so that their writing quality is expected to be improved, and such an advantage of PTP can spread throughout different phases of the writing process. In contrast, some researchers have claimed that PTP may not always be beneficial. For instance, Ur (1996) posited that individual differences regarding how to approach a writing task exist in a way that some learners prefer to write immediately and plan during writing, while others prefer to do PTP prior to writing. Similarly, Ortega (1999) pointed out that writing tasks may play a role in determining whether or not PTP is beneficial to learner writing, and the positive effects of PTP emerge when learners are asked to do a challenging task.

Pedagogically, mixed findings were reported with respect to the effects of PTP on learner writing in terms of CAF measures, with studies reporting that PTP exerted positive effects on the measures of learner writing (e.g., Abdi Tabari, 2020), studies reporting that PTP had no impact on learner writing (e.g., Johnson et al., 2012), and studies reporting that PTP had negative effects on learner writing (e.g., Ong and Zhang, 2010). Taken together, these conflicting views and research findings further necessitate the significance of exploring the relationship between PTP and learner writing. In addition, given the constant influence of socio-cultural theory (Vygotsky, 1978) on the L2 writing domain, researchers (e.g., Amiryousefi, 2017; Li and Zhang, 2022; Kang and Lee, 2019) have begun considering the role of collaboration in PTP and investigating whether such planning condition could be superior to other conditions. The results of this strand of studies, as reviewed in the next section, were generally mixed. Moreover, to the best of my knowledge, there has been little research examining the effects of combining collaborative PTP with individual PTP on learner writing and whether such effects can be transferred to new pieces of writing. Thus, this study aimed to address this research gap.

Literature review

This section points out the significance of the study by reviewing studies related to pre-task planning and L2 speaking and writing, collaboration in L2 writing, and individual pre-task planning and collaborative pre-task planning. Then two research questions are raised based on the review.

Pre-task planning regarding L2 speaking and writing

According to Kellogg's (1996) overload hypothesis, the process of doing a task can be viewed as a linear sequence, and PTP may be beneficial for learners to complete the task as providing them with planning time can help reduce their cognitive load consumed when they start the task. Based on this hypothesis, many researchers have extensively examined the effects of PTP on learners’ oral performance. The findings of studies about the measures of fluency and complexity showed that PTP was conducive to the improvement of L2 learners’ oral performance (e.g., Foster and Skehan, 1996; Ortega, 1999; Yuan and Ellis, 2003); however, the studies regarding the measure of accuracy revealed mixed findings, with some reporting positive effects of PTP (e.g., Foster and Skehan, 1996; Tavakoli and Skehan, 2005), while others reporting no significant effects (e.g., Yuan and Ellis, 2003).

In contrast, the effects of PTP on learners’ L2 writing quality have produced a more complicated picture. That is, some studies found that PTP had negative effects on both lexical complexity and fluency measures of learners’ writing (e.g., Ong and Zhang, 2010), while other studies reported positive effects on these two measures (e.g., Abdi Tabari, 2020). For the measure of syntactic complexity, most studies demonstrated positive effects of PTP (e.g., Abdi Tabari, 2021; Abdi Tabari and Wang, 2022; Ellis and Yuan, 2004; Ghavamnia et al., 2013), while some studies indicated no effects (e.g., Johnson, 2011; Rahimi and Zhang, 2018; Rahimpour and Safarie, 2011; Rostamian et al., 2018). The reasons for the mixed findings might be that, according to Ellis (2022), some options involved in the studies were different from each other, such as length of planning time, access to planning notes, use of writing tasks, proficiency level of learners, etc. As a type of individual planning, PTP may have some inherent limitations in improving learners’ writing quality as it does not take into consideration circumstances where learners need to collaborate with each other to optimize their existing linguistic resources so that they can be more likely to achieve better quality.

Collaboration in L2 writing

Since the concept of collaboration was introduced to the L2 writing domain, many researchers have explored its effects on writing quality. There are some potential advantages of collaborative writing that can be echoed by the principles of social constructivism and interaction hypothesis. According to social constructivism, which was proposed by Vygotsky (1978), collaborative writing may be beneficial to student writing in that it can provide students with opportunities to elicit scaffolding from social interaction. Such interaction allows students to co-construct knowledge beyond their individual abilities. Similarly, based on the interaction hypothesis (Long, 1983; Swain, 1985, 1993), collaborative writing may be helpful for student writing since meaning negotiation between students can make it more likely for them to produce writing output with comprehensible language and accurate grammar.

Most studies on collaborative writing were product-oriented with a one-shot design (i.e., learners under either collaborative writing conditions or individual writing conditions were asked to produce one single written text). The results of the one-shot design studies, which compared collaboratively written texts to individually written ones, indicated that the former type of text was more accurate than the latter type (e.g., Fernández Dobao, 2012; McDonough et al., 2018a; Wigglesworth and Storch, 2009), with no such positive effects observed in terms of fluency, complexity, or rubric scores. Moreover, some studies even found that individually-written texts were better than collaborative-written texts in measures of fluency (Fernández Dobao, 2012) and complexity (McDonough et al., 2018b). In addition, the other strand of studies examined whether there was any difference in the texts of posttests written individually between two groups of learners: the group of learners under the experimental condition of collaborative writing intervention and the group of learners under the control condition of individual writing. The results showed that the collaborative writing intervention could help learners individually compose texts with higher rubric scores than the control condition (e.g., Alammar, 2017; Hsu and Lo, 2018; Sagban, 2016).

Despite the potential merits of collaborative writing, such as improving learners’ writing accuracy and rubric scores, cultivating learners’ reflective thinking ability (Storch, 2012), and offering them opportunities to use newly acquired knowledge through scaffolded interaction and co-construction (Hirvela, 1999; Swain and Lapkin, 1998), there are drawbacks as well. For example, one issue with one-shot designs in L2 writing instruction is that it is difficult to ensure each learner under collaborative conditions contributes equally to a collaboratively written text and any positive effects generated from producing such a text can still be sustained in an individually written text. Additionally, learners usually need to spend much time producing a collaboratively written text, which is not realistic to implement in tertiary-level L2 writing classrooms where time is limited (Neumann and McDonough, 2015). During the collaborative review stage, learners can hardly benefit from peer interaction due to the domination of product-oriented characteristics of the stage (Storch, 2005). In this case, pre-task planning with collaboration between learners might be a suitable option for L2 writing learners as it seems to preserve the merits of collaborative writing while addressing its drawbacks.

Individual pre-task planning vs. collaborative pre-task planning

Based on the triadic componential framework proposed by Robinson (2005, 2007), ±planning time, a resource-dispersing variable in the category of task complexity, has been drawing much attention from both L2-speaking researchers (see Johnson and Abdi Tabari, 2022) and L2 writing researchers (see Abdi Tabari, 2022). In contrast, ±few participants, a participation structure variable in the category of task condition of the framework, has been receiving insufficient attention. Though both the variables of ±planning time and ±few participants can be related to pre-task planning, they differ from each other in the number of participants involved in the planning. The variable of ±few participants has been widely operationalized as individual pre-task planning (IP) and collaborative pre-task planning (CP). Several studies have examined whether there were any different effects between the two planning conditions on English as a Foreign Language (EFL) learners’ writing quality (e.g., Ameri-Golestan and Nezakat-Alhossaini, 2017; Amiryousefi, 2017; Li and Zhang, 2022, 2021; Kang and Lee, 2019; McDonough et al., 2018b; McDonough and De Vleeschauwer, 2019), with mixed findings reported. Specifically, Ameri-Golestan and Nezakat-Alhossaini’s (2017) study demonstrated that there was no significant difference between students under the CP condition and students under the IP condition in writing quality of both the posttest and the delayed posttest. Amiryousefi (2017) investigated the effects of teacher-monitored CP, student-led CP, and IP on CAF measures of student writing. The results revealed that each of the three planning conditions had its own benefits as it promoted different dimensions of student writing. Li and Zhang (2021) found that the CP condition, operationalized as small-group student talk, led to higher holistic writing quality and higher improvement in content, organization, vocabulary, and language use than the IP condition; however, this type of positive effect was not found in mechanics.

Similarly, Li and Zhang’s (2022) study demonstrated that students under the CP condition had higher overall quality of argumentation than students under the IP condition, while such an advantage of the CP condition was not found in some dimensions related to argumentation. McDonough et al.’s (2018b) study indicated that students under the CP condition could write more accurate texts that received higher ratings than students under the IP condition. In contrast, McDonough and De Vleeschauwer’s (2019) study showed that IP could help students get higher analytic ratings than CP, whereas CP could help students get higher accuracy than IP. Kang and Lee’s (2019) study revealed that CP was more useful than IP in improving fluency and syntactic complexity of student writing, but not in improving accuracy. Overall, based on interaction hypothesis (Long, 1983) and sociocultural theory (Vygotsky, 1978), CP may be more helpful than IP for the improvement of student writing as collaborative learning activities, such as CP, could help students perform better in tasks imposing high cognitive load than individual learning activities, such as IP (Paas and Sweller, 2012); however, this assumption is not proven in the small number of studies reviewed above since CP was found to be beneficial for only some dimensions of student writing, but not for others. Moreover, there were some studies showing that the possible advantage of CP over IP did not emerge even when complex tasks were involved (e.g., Kang and Lee, 2019). Taken together, the findings of the reviewed studies necessitate further research into the two planning conditions.

Multiple factors may contribute to the inconsistent and, at times, contradictory findings, such as participation structure (i.e., the number of students participating in collaborative planning as a group), student proficiency levels, writing tasks, etc. In addition, since the number of studies examining the effects of CP and IP is relatively small, L2 writing researchers and teachers should be cautious about the results of the studies. More importantly, based on existing research, there has been no study yet exploring the effects of a joint combination of CP and IP on students’ L2 writing quality. Indeed, as Ellis (2022) pointed out, exploring such combination effects can shed more light on the instructional practice of applying planning to L2 writing pedagogy. In addition, some studies (e.g., Kang and Lee, 2019; McDonough et al., 2018a; Xie and Zhu, 2023) examining the effects of CP and IP on student writing only focused on the writing task in which different planning conditions were implemented, without considering the transfer effects of such conditions on a new writing task. According to Long (2015), transfer effects of a task occur when learners’ performance on a certain task can be transferred to other tasks. The instructional value of practicing with a task should be called into question if the transferring effect of the task is not explored (James, 2008). Additionally, according to Kolb’s (1984) Experiential Learning Theory, learners need to deeply conceptualize the task they do and actively experiment with it by doing a new task to optimize what they have learned during the process. For the present study, assigning learners to do the pedagogic writing task may facilitate their understanding of the different planning conditions to make it more likely for them to conceptualize the conditions, and asking them to complete a new writing task may allow them to apply and experiment with what they have learned from the pedagogical task. Therefore, examining the transfer effect of tasks in this study holds practical significance. Taken together, the current study aimed to address the aforementioned research gap by investigating the following research questions:

  1. 1.

    What are the effects of four types of planning conditions, namely CP, IP, IP and CP, and NP on Chinese EFL learners’ L2 writing in terms of complexity, accuracy, and fluency?

  2. 2.

    Is there a difference between the four groups in terms of transferring effects from the pedagogic task to a new task?

Methods

Design

This study adopted a between-group quasi-experimental method design to address the research questions. This design was used because the main purpose of the study was to examine whether students under the four different planning conditions (i.e., CP, IP, IP and CP, NP) perform differently in terms of CAF dimensions of the pedagogic writing task and whether there was any learning transfer from the pedagogic task to a new writing task.

Participants

The participants of this study were 120 EFL freshmen students who enrolled in a College English course at a university located in southwest China. The students met for two sessions per week, with each session lasting 90 min. Based on the English teaching and learning curriculum approved by the university, two teachers holding master's degrees in English applied linguistics taught the students of the four classes using the same teaching materials, with each of them teaching two classes. The students majored in obstetrics, medical examination, health therapy, and stomatology. All students were between 18 and 21 years old and had learned English for almost 9 years. None of them had any experience of studying abroad and living in English-speaking countries.

To ensure all the students’ English proficiency was comparable at the beginning of the study, their scores on English tests in the National College Entrance Exam (NCEE) were collected. All students must take the NCEE in order to be enrolled in universities, and studies have shown that this test can be used to gauge students’ English proficiency (Cheng and Qi, 2006). Based on the scores, their English proficiency was at the B1 level of the Common European Framework of Reference for Languages. A one-way ANOVA was run to examine their English proficiency after collecting the scores of the test. When the study started, the students were preparing for College English Test-Band 4 (CET-4), which is a nationally standardized English test in China. Passing CET-4 can secure the students a university diploma and enhance their competitiveness in job-hunting after graduation. Meanwhile, as pointed out by Erkan and Saban (2011), the students often have difficulty in improving their writing quality when preparing for the CET-4. Thus, they expressed strong motivation to participate in the study in the hope that they could find effective ways of enhancing their writing quality.

Task

The genre of argumentative writing was used as the writing task in this study for two reasons. First, this type of genre is widely adopted in both domestic and international English proficiency tests for Chinese tertiary students (Huang and Zhang, 2020), so it is hoped that asking them to write with the genre could help arouse their learning interest and promote their learning motivation. Second, L2 students’ writing proficiency can be relatively accurately measured through the argumentative writing genre (Teng and Zhang, 2020). To avoid the influence of topic bias on student writing quality, the two writing topics (see Appendix 1) were selected from the database of CET-4 and were familiar to students since they were closely related to their daily lives. Students were offered 30 min to write at least 150 words on each topic. Since there were four groups of students in this study and all of them were under different planning conditions, they were provided different writing task instructions in Chinese prior to doing the pedagogic task based on which group they were in.

Procedure

This study was conducted in EFL freshmen’s regular classes. The students in all groups produced one written text in one session. This study lasted 2 weeks, with four sessions in each week. Thus, this study included eight sessions in total since it asked the students in each of the four groups to produce two texts. After securing the consent of the students and their teachers, the first researcher went into the teachers’ scheduled classes to collect the students’ written texts. The students in this study were divided into four groups according to four different planning conditions: CP, IP, IP and CP, and NP. Each student in the four groups was required to produce two written texts, with one for the pedagogic task and another for the new task. Students in four intact EFL classes were recruited to participate in this study, and each class was randomly assigned one of the four planning conditions. Following previous studies (e.g., Crookes, 1989; Foster and Skehan, 1996; Kang and Lee, 2019), no specific instructions were provided concerning the use of planning time. The students in this study self-selected their partners to collaboratively plan their texts so that the results of the study could be compared to other studies (e.g., Kang and Lee, 2019; McDonough et al., 2018b). Also, in line with previous studies (e.g., Ellis and Yuan, 2004; McDonough et al., 2018b; Ong and Zhang, 2010; Rostamian et al., 2018), the students were given ten minutes of planning time prior to writing.

Specifically, the students under CP condition were asked to plan with their self-selected partners for ten minutes. During that time, they could discuss the writing topic by taking notes, but the notes were taken away to ensure a more precise assessment of their writing quality since they might directly copy language from the notes. The discussions were only allowed within a pair. After the students finished their discussions, they were asked to separate to individually complete their texts on paper in 30 min with no access to dictionaries or other relevant materials. The students under the IP condition undertook a similar process to the students under the CP condition. The only difference was that they did not have a partner. The students under the IPCP condition also had 10 min in total to plan, and the 10 min were evenly distributed between the two conditions, with 5 min for each condition. When the students were doing IPCP in class, the teacher provided clear instructions to ensure that they strictly followed the time allocation and planning sequence of the two planning conditions. It should also be noted that the reason for sequencing the two conditions in this order was to maximize the exploration of the instructional value of the IP condition by excluding any possible intervening effects of the CP condition. After all, the students might be likely to use what they have learned under CP conditions to do IP if they are allowed to do CP first. For the students under NP condition, they were asked to start to write immediately after being presented with the writing topics, and they had 30 min to finish their writing. Moreover, to explore whether there were any transfer effects of the planning conditions to a new writing task, this study asked all the students to complete a second writing task (i.e., a new task) one week after they completed the first task (i.e., a pedagogic task), without any planning time provided.

Measures

In this study, CAF measures were adopted to analyze students’ writing quality. The CAF measures play a crucial role in capturing learners’ performance on a certain task (Norris and Ortega, 2009; Skehan, 1998), and researchers in the L2 writing domain have been using the triad of the measures to gauge students’ writing quality. The complexity measure was categorized into lexical and syntactic complexity dimensions with a variety of metrics because of its multi-faceted nature (Housen et al., 2012). Lexical complexity was gauged from lexical sophistication and diversity, and two metrics were considered in total. The first metric was the logarithmic word frequency (log frequency) of all words (LFAW) in the academic section of the Corpus of Contemporary American English (COCA) (Davies, 2008), which was used to reflect lexical sophistication; the second metric was all words hypergeometric distribution diversity (AW HD-D), which was used to reflect lexical diversity. Calculated through the tool for the automatic analysis of lexical sophistication (TAALES) (Kyle et al., 2018), COCA LFAW was chosen because large corpora are often referred to enable researchers to compare the frequency of lexical information in student writing against the frequency of lexical information in corpora (Johnson, 2017). A lower lexical log frequency value indicates higher lexical sophistication (Yoon, 2018; Yoon and Polio, 2017). Calculated through the tool for the automatic analysis of lexical diversity (TAALED) (Kyle et al., 2021), AW HD-D was chosen because it can be more effective to minimize the effect of essay length than many other diversity metrics (Zenker and Kyle, 2021).

This study used Lu’s (2010) L2 syntactic complexity analyzer (L2SCA) to analyze the syntactic complexity of student writing. In line with Norris and Ortega (2009), a total of four indices in the L2SCA were used in this study: 1. the mean length of T-units (MLT), defined as an independent clause together with any dependent clause(s) attached, 2. dependent clause per T-unit (DC/T), 3. the coordinate phrases per clause (CP/C), and 4. complex nominal per clause (CN/C). These four indices were chosen because researchers (e.g., Housen et al., 2012; Johnson, 2017) have clarified the multiple components of syntactic complexity and have pointed out that the assessment of syntactic complexity should incorporate general, subordination, coordination, and phrasal complexity measures. Specifically, in this study, the general complexity measure is referred to as the index of MLT, the subordination measure is referred to as the index of DC/T, the coordination measure is referred to as the index of CP/C, and the phrasal measure is referred to the index of CN/C.

Following previous studies (e.g., Karim and Nassaji, 2020; Vasylets and Marín, 2021), this study used an error ratio to investigate students’ writing accuracy. An error ratio is calculated by all errors in an essay divided by the total number of words, and multiply 100. Following Vasylets and Marín (2021) study, this study took into account the errors of grammar and vocabulary based on Standard English. The errors of capitalization, spelling, and punctuation were not counted (Wigglesworth and Storch, 2009). To ensure inter-rater reliability regarding error coding, the first and the second authors independently hand-coded 10% of the dataset. After obtaining a reliability coefficient above 0.85, the first author hand-coded the remaining data. Finally, fluency was assessed by the total number of words composed by the students within the 30-min time limit.

Results

This section consists of two parts, with the first part reporting the results of the first research question and the second part reporting the results of the second research question.

Research question 1

The first research question asked whether different planning conditions of CP, IP, IP and CP and NP had differential effects on the students’ writing quality manipulated as CAF. To answer this question, several one-way ANOVAs were conducted to test if significant differences existed. Table 1 presents the results for CAF measures. In terms of syntactic complexity, significant differences were found among the planning conditions for MLT [F(3, 116) = 11.44, p = 0.000], DC/T [F(3, 116) = 5.56, p = 0.001], CP/C [F(3, 116) = 40.46, p = 0.000], and CN/C [F(3, 116) = 8.97, p = 0.000], respectively. Tukey’s HSDs were used to determine the location of significance between the planning conditions. With regard to MLT, the analysis revealed that the mean length of T-units composed by the students under CP (M = 14.10, SD = 1.83) was longer than the students under IP (M = 13.13, SD = 2.96), the students under IPCP (M = 12.64, SD = 1.95), and the students under NP (M = 11.67, SD = 2.15). For DC/T, the analysis demonstrated that the students under CP wrote more dependent clauses per T-unit (M = 0.53, SD = 0.14) than the students under IP (M = 0.35, SD = 0.20) and the students under NP (M = 0.39, SD = 0.19). Regarding CP/C, the analysis indicated that the students under CP produced more coordinate phrases per clause (M = 0.52, SD = 0.13) than the students under IP (M = 0.23, SD = 0.13), the students under IPCP (M = 0.23, SD = 0.11), and the students under NP (M = 0.20, SD = 0.14). For CN/C, the students under CP composed more complex nominals per clause (M = 1.46, SD = 0.28) than the students under IP (M = 1.18, SD = 0.32), the students under IPCP (M = 1.14, SD = 0.27), and the students under NP (M = 1.09, SD = 0.33).

Table 1 Descriptive statistics, results of ANOVA and post-hoc Tukey test for CAF measures of Task 1.

With respect to lexical complexity, accuracy, and fluency, one-way ANOVAs did not show any significant differences. Specifically, no significant differences were found in the LFAW [F(3, 116) = 0.54, p = 0.656] and AW HD-D [F(3, 116) = 2.47, p = 0.066] of lexical complexity. For LFAW and AW HD-D, the students under CP had a mean value of 2.96 (SD = 0.10) and a mean value of 0.78 (SD = 0.03). The students under IP had a mean value of 2.99 (SD = 0.10) and a mean value of 0.77 (SD = 0.04). The students under IPCP had a mean value of 2.97 (SD = 0.09) and a mean value of 0.78 (SD = 0.04). The students under NP had a mean value of 2.98 (SD = 0.09) and a mean value of 0.76 (SD = 0.03). Also, no significant differences were found in accuracy measured by error ratio [F(3, 116) = 2.34, p = 0.077] and fluency measured by total number of words [F(3, 116) = 1.53, p = 0.212]. For error ratio and total number of words, the students under CP had a mean value of 6.08 (SD = 3.00) and a mean value of 174.37 (SD = 30.09). The students under IP had a mean value of 5.67 (SD = 3.08) and a mean value of 172.47 (SD = 28.54). The students under IPCP had a mean value of 4.88 (SD = 2.50) and a mean value of 164.73 (SD = 21.91). The students under NP had a mean value of 6.72 (SD = 2.38) and a mean value of 162.03 (SD = 24.17).

Research question 2

The second research question explored whether there was any difference between the four groups in terms of transferring effects from the pedagogic writing task (task 1) to the new writing task (task 2). A series of one-way ANOVAs were performed to answer this question. Table 2 presents the results for CAF measures. Similar to the results of the first research question, significant differences were found among the planning conditions in all four indices of syntactic complexity: MLT [F(3, 116) = 12.33, p = 0.000], DC/T [F(3, 116) = 8.41, p = 0.000], CP/C [F(3, 116) = 6.10, p = 0.001], and CN/C [F(3, 116) = 12.52, p = 0.000]. Tukey’s HSDs were conducted to determine the location of significance between the planning conditions. With respect to MLT, the analysis indicated that the mean length of T-units produced by the students under CP (M = 14.86, SD = 3.45) was longer than the students under IP (M = 12.13, SD = 1.88), the students under IPCP (M = 12.34, SD = 2.12), and the students under NP (M = 11.44, SD = 1.31). For DC/T, the analysis revealed that the students under CP wrote more dependent clauses per T-unit (M = 0.64, SD = 0.27) than the students under IP (M = 0.41, SD = 0.18), the students under IPCP (M = 0.43, SD = 0.18), and the students under NP (M = 0.43, SD = 0.20). For CP/C, the analysis showed that the students under CP produced more coordinate phrases per clause (M = 0.31, SD = 0.07) than the students under IP (M = 0.21, SD = 0.12), the students under IPCP (M = 0.21, SD = 0.12), and the students under NP (M = 0.22, SD = 0.11). For CN/C, the students under CP composed more complex nominals per clause (M = 0.94, SD = 0.25) than the students under IP (M = 0.67, SD = 0.14), the students under IPCP (M = 0.69, SD = 0.13), and the students under NP (M = 0.78, SD = 0.22).

Table 2 Descriptive statistics, results of ANOVA and post-hoc Tukey test for CAF measures of Task 2.

With regard to lexical complexity, accuracy, and fluency, one-way ANOVAs did not demonstrate any significant differences. Specifically, no significant differences were found in the LFAW [F(3, 116) = 2.27, p = 0.084] and AW HD-D [F(3, 116) = 2.36, p = 0.075] of lexical complexity. For LFAW and AW HD-D, the students under CP had a mean value of 2.98 (SD = 0.08) and a mean value of 0.80 (SD = 0.03). The students under IP had a mean value of 3.01 (SD = 0.09) and a mean value of 0.78 (SD = 0.03). The students under IPCP had a mean value of 2.98 (SD = 0.07) and a mean value of 0.80 (SD = 0.03). The students under NP had a mean value of 2.96 (SD = 0.09) and a mean value of 0.80 (SD = 0.03). Also, no significant differences were found in accuracy measured by error ratio [F(3, 116) = 1.74, p = 0.163] and fluency measured by total number of words [F(3, 116) = 0.543, p = 0.654]. For error ratio and total number of words, the students under CP had a mean value of 7.45 (SD = 3.17) and a mean value of 163.40 (SD = 30.36). The students under IP had a mean value of 6.84 (SD = 3.51) and a mean value of 166.23 (SD = 22.35). The students under IPCP had a mean value of 6.92 (SD = 2.64) and a mean value of 171.67 (SD = 25.39). The students under NP had a mean value of 5.66 (SD = 3.12) and a mean value of 166.17 (SD = 24.07).

Discussion

Through examining the effects of four planning conditions on students’ writing quality, the findings of the study showed that the students under the CP condition composed writings with higher syntactic complexity than the students under the IP condition for both the pedagogic task and the new task, which was in line with the findings of some former studies (e.g., Kang and Lee, 2019). However, the findings were inconsistent with the studies of McDonough and De Vleeschauwer (2019) and McDonough et al. (2018a, 2018b), whose findings revealed no significant difference between CP and IP in syntactic complexity. The reason for the inconsistency might lie in the difference in text types that entailed different levels of task complexity. Specifically, in the studies conducted by McDonogh and colleagues, the researchers required students to offer solutions to a social problem, which might be viewed as a less complex task. In contrast, the present study asked students to critically analyze a social phenomenon by presenting its benefits and challenges, which might be seen as a more complex task. According to Zhan et al. (2021), Chinese EFL students tend to produce writings of higher syntactic complexity when they do a more complex writing task. Furthermore, the inconsistent findings also revealed that different genres or text types might play a role in the effects of planning conditions. In addition to text types, the other possible explanation about the significant advantage of CP in the syntactic complexity of student writing is the unstructured pre-writing task used in this study. According to some recent studies (e.g., Do, 2023; Ebadijalal and Moradkhani, 2023; Pospelova, 2021), students doing unstructured pre-writing tasks under CP condition usually pay greater attention to writing content with which they can brainstorm ideas for their subsequent writing. With more ideas generated by this way, it is likely for students to use more subordinations to express ideas in texts, leading to higher syntactic complexity (McDonough et al., 2018b).

For lexical complexity, the results of the study were not consistent with the results of Xie and Zhu's (2023) study, in which IP led to higher lexical complexity than CP. The reason might be that in Xie and Zhu’s study students were required to complete a continuation task, which is a less cognitively complex task than the argumentative task in the present study. Although collaboration might make it more likely for students to share their linguistic knowledge to improve lexical complexity, this likelihood could be offset by students’ investment of attentional resources in transactional activities during CP (Kirshcner et al., 2011), particularly when students are doing a less cognitively complex task, such as the one in Xie and Zhu’s study. Furthermore, students in Xie and Zhu’s study could refer to the reading material while they did the continuation task and they were also encouraged to aligned linguistically with the material. In this way, it is possible that students under the IP condition were more likely to focus on the lexical knowledge in the reading material to improve the lexical complexity of their writings than students under the CP condition whose focus on lexical knowledge might be distracted by pair discussions on other aspects of writing, such as content, organization, task management, etc. (McDonough et al., 2018a). Taken together, students under the IP condition in Xie and Zhu’s study composed the continuation task with higher lexical complexity than students under CP condition.

In terms of writing accuracy, the findings of the current study were in line with the findings of McDonough et al. (2018b), Kang and Lee (2019), and Xie and Zhu (2023) in that CP did not lead to higher writing accuracy than other planning conditions, such as IP or NP; however, the findings of the current study differed from the findings of Li and Zhang (2021), McDonough et al. (2018a), and McDonough and De Vleeschauwer (2019), which reported that CP was helpful for students in improving their writing accuracy. Several explanations might be considered to discuss the findings of the present study. First, according to McDonough et al. (2018b), the positive effects of collaboration on accuracy occur only when students collaboratively compose texts rather than when students collaboratively plan for texts prior to composing. In the current study, the students under different planning conditions composed their writings individually, which might offset any potential advantages of CP. For example, the students under the CP condition might have been in a better position to improve their writing accuracy through collaborative processing of linguistic issues; however, asking them to compose their writings individually after planning could have made it more difficult for them to enhance writing accuracy. Second, group size in CP condition might be one reason that accounts for the different findings between the current study and Li and Zhang’s (2021) study. Specifically, in Li and Zhang’s study, they assigned six students as a group, and the group size was larger than the present study in which a group was only composed of two students. It is possible that a group of six students could gather more knowledge from their linguistic repertoire to improve writing accuracy than could a group of two students. Third, the type of pre-writing tasks used in the studies by McDonough et al. (2018a) and McDonough and De Vleeschauwer (2019) as compared to the present study may also contribute to the divergent findings in students’ writing accuracy. In the two studies conducted by McDonough and colleagues, the researchers adopted a structured pre-writing task in which students were provided with a task handout with guiding questions and explicit instructions to help them generate, select, and organize ideas. This type of task may alleviate students’ cognitive load while writing so that they can focus more on the improvement of their writing accuracy.

In contrast, the present study used an unstructured pre-writing task in which students were simply asked to plan the writing task with their partners by taking notes that would be taken away when they started writing. As McDonough et al. (2018a) claimed, a structured pre-writing task may contribute to students’ writing accuracy through exerting a relatively more direct impact on students’ subsequent writing. Similarly, Kim et al. (2022) found that a structured pre-writing task elicited more episodes from students during CP than did an unstructured pre-writing task, and these episodes were found to predict higher task scores assessed across categories such as content, organization, language, and mechanics. Fourth, the focus of student discussion during CP can also play a role in the different findings regarding writing accuracy. According to some studies investigating EFL students’ talk during CP (e.g., Li et al., 2020; Neumann and McDonough, 2015), students discussed writing content most frequently. The current study also found evidence of such frequency after examining the students’ notes taken during CP. Thus, it can be assumed that students under the CP condition in the current study focused their discussion more on the contents of the subsequent writing task, with insufficient discussion on linguistic issues.

For fluency, the results of the study were differential from Kang and Lee’s (2019) and Mohammadi et al. (2023) studies. In Kang and Lee’s study, it was found that students under the CP condition composed writings with greater fluency than students under the IP condition. The reason for the differential findings between Kang and Lee’s study and the current study may be that student participants in their study were 8th graders with limited English proficiency, while student participants in the current study were undergraduates. As posited by Kang and Lee, student participants in their study benefited more from CP with respect to fluency because their limited English proficiency increased the likelihood of collaboration on how to organize their ideas into words. They could, therefore, write more fluently than undergraduate students. In addition, the divergent findings between Mohammadi et al. (2023) study and the current study were mainly due to the differences in the design of prewriting task and the provision of planning time. In their study, Mohammadi et al. provided students under the CP condition with a worksheet for their planning, and the students were asked to list as many ideas as they could on the worksheet within 15 min of planning time. The study also did not specify whether the students were allowed to keep their worksheets during writing. It is possible that, with both the available worksheet and extended planning time, the students in that study can write more fluently. In contrast, the current study provided students with only 10 min to collaboratively plan their writing without any worksheets, which, not surprisingly, led to the finding of an insignificant effect of CP on fluency.

Conclusion

The purpose of this study was to explore the effects of four types of planning conditions (i.e., CP, IP, IPCP, and NP) on EFL learners’ L2 writing quality and to investigate whether planning conditions could influence learning transfer from the pedagogic task to the new task. The results indicated that the learners under CP condition produced writings with higher syntactic complexity than the learners under IP, IPCP, and NP conditions for both tasks. In this section, the limitations of the study are discussed, followed by the implications.

Limitations

This study is not without limitations. First, this study did not investigate whether individual differences such as motivation, self-efficacy, language aptitude might influence the observed advantage of CP across both the pedagogic task and the new task. Future studies should consider using validated instruments (e.g., questionnaires, interviews) to delve into the potential mediating role of individual differences in the effects of planning conditions. Second, this study was only concerned with the planning implementation regarding participatory structure manipulated as individual and collaborative planning. Future studies could explore the relationships between other implementation options and students’ writing quality. Specifically, options worth exploring include time allocation to planning, focus of planning, access to planning notes, guidance for planning, and language use during planning (Ellis, 2022). For example, it might be interesting to explore the potential different effects of guided and unguided planning on student writing and students’ perceptions of the two types of planning tasks. Third, this study did not investigate how students under different planning conditions actually used the planning time and how they perceived the roles of planning conditions in learning transfer from the pedagogic task to the new task. Future studies could address these questions through think-aloud protocols and self-report questionnaires, with the former being used to identify students’ cognitive, emotional, and motivational processes while planning and the latter being used to identify students’ perceptions of planning tasks (Panadero, 2023). Finally, the student participants in this study were Chinese EFL students with intermediate proficiency levels, so researchers and L2 writing teachers should be cautious about generalizing the results of this study to students with other proficiency levels or in different instructional contexts.

Implications

This study failed to observe advantages of CP in the majority of CAF measures except for syntactic complexity. Like any experimental studies in the field, this study has the following implications. First, if students under CP condition are asked to do unstructured pre-writing task, teachers should be involved in the task to provide assistance when necessary. Also, the help should be tailored based on teachers’ observations or judgements of students’ individual differences. With teachers’ involvement and tailored assistance, it is expected that students can benefit more from such task. Second, while implementing CP in L2 writing class, teachers may want to use writing tasks or genres with which students can tack advantage of CP. Referring to Tavakoli and Rezazadeh (2014), an argumentative writing task might impose more cognitive load on learners than a narrative writing task. This is particularly the case during CP stage (Franken and Haslett, 2002) as an argumentative task can elicit high content interactivity that requires student under the CP condition to synthesize much discrete information (Sweller, 1994). In such cases, it was likely for the students to feel concerned or even anxious when doing an argumentative task, which might, in turn, offset any potential advantages of CP. Third, pre-task planning, implemented individually or collaboratively, should be used with caution by writing teachers in EFL context. Students’ acceptance and familiarity with planning should be thoroughly examined prior to applying it to writing instruction. In other words, students may not want to include such planning in their writing process if they are used to online planning. For those who are willing to incorporate it into their writing process, they may need more assistance from teachers as they might lack experience with it.