Introduction

In the artificial intelligence era, natural language recognition, comprehension, and processing benefit numerous educational technologies, such as machine translation and automated writing evaluation (Huawei and Aryadoust 2023; Li 2023). Dominantly used for language education, automated writing evaluation tools are programs and platforms providing feedback and scoring mainly according to grammar and vocabulary based on textual materials (Liu et al. 2021). They bring multifarious benefits, improving students’ learning experiences (Yao et al. 2021), enhancing teachers’ and students’ feedback literacy (Li 2023; Link et al. 2022), allowing prompt feedback, and reducing teachers’ workload (Wang et al. 2022). Students get personalized recommendations from automated writing evaluation tools to enhance their language fluency, learning interests, and motivations (Guo et al. 2022). These tools complement traditional modes like peer assessment in foreign language education (Yao et al. 2021). Automated, teacher, and peer feedback show different focuses and influence language learning outcomes differently (Tian and Zhou 2020). Researchers are keenly interested in elucidating the effectiveness of automated writing evaluation tools and encouraging their applications in teaching practice. Such studies enrich the literature on assisting foreign language writing and teaching with automated writing evaluation tools (Li 2021).

Particular tools, such as Grammarly, Pigai, and Criterion, have contributed to related topics more efficiently than researching the general concept of automated writing evaluation (Ding and Zou 2024). Yang et al. (2023) set an example to investigate a famous artificial intelligence-empowered automated writing evaluation platform named Pigai in China and suggested students’ improvements in the grammatical accuracy of English writing. In line with their methods, this study is motivated by the fact that Grammarly is a relatively less investigated automated writing evaluation tool and that research interest rises with the popularity of artificial intelligence-empowered learning tools (e.g., Chang et al. 2021; Fitria 2021). Founded in 2009, Grammarly has attracted over 70,000 business teams and 3000 educational institutions, collaborating with numerous stakeholders in educational and business sectors; it is also embracing artificial intelligence technologies and empowering its automated writing evaluation functions with pioneering techniques (Grammarly 2024; Ding and Zou 2024). Therefore, this platform has become increasingly influential among automated writing evaluation tools.

Researchers have probed into how Grammarly provides corrective feedback and students’ perceptions (Koltovskaia 2020). However, few studies have thoroughly explored Grammarly as an automated writing evaluation tool in foreign language writing from the perspective of technology acceptance. Technology acceptance studies demonstrate their significance in revealing factors influencing educational technology use, which provides suggestions for teachers, educational administrators, and researchers in the digital era (e.g., Zhai and Ma 2022; Lin and Yu 2024a). Among various predictors of technology acceptance, perceptual and systemic indicators are popular in recent studies (Ismatullaev and Kim, 2024). For example, technological simplicity and systemic reliability significantly and positively predicted technology acceptance (Kikawa et al. 2022; Papakostas et al. 2023); perceived risks and support significantly impacted the intentions to use educational technologies (Al-Adwan et al. 2022; Lin and Yu 2023). However, various perceptual and systemic factors have yet to be specified and incorporated into acceptance and adoption research on automated writing evaluation tools. Hence, it is imperative to consider these aspects and elucidate technology acceptance and use of specific automated writing evaluation tools.

This study adopts structural equation modeling techniques to explore the acceptance and use of automated writing evaluation, with Grammarly as an example. We incorporate trust in feedback, peer influence, interactivity, personal investment, willingness for e-learning, and instructional support into the traditional Unified Theory of Acceptance and Use of Technology (UTAUT) model. We intend to establish how these innovative predictors introduced in this study can predict and explain technology acceptance in Grammarly utilization. The interrelationships among the variables can be tested through statistical approaches. We complement those relationships with learners’ subjective evaluations of the advantages and disadvantages of Grammarly. In what follows, we first review the related literature on automated writing evaluation and UTAUT, followed by a proposed model containing new constructs and traditional variables (Section “Literature review”). Research methods will be described in Section “Methods”. In the Section “Results”, the path analysis will help test hypotheses among these variables, which will be informed by thematic analysis of the qualitative data. The extended Section “Discussion” will shed light on future language education assisted by automated writing evaluation tools and improve the functionalities and user interface designs of such platforms. Section “Conclusion” summarizes the major findings, limitations, and implications for future research and teaching based on our findings.

Literature review

Technology acceptance research based on automated writing evaluation tools

Technology acceptance research has been popular in various areas, including education, while the existing literature has yet to fully explore higher education students’ adoption of automated writing evaluation tools despite their predominant advantages. On 19 April 2024, we conducted a literature search on the Web of Science Core Collection based on the following keywords in “Topic”: “automated writing evaluation” OR “automated writing scoring” OR “automated writing feedback”. The literature search led to 153 results initially analyzed through a keyword visualization and clustering analysis based on VOSviewer, a popular bibliometric study program created by van Eck and Waltman (2010). Figure 1 displays all 562 keyword items in the literature search outcomes. According to the occurrence frequencies, the most frequently used ten keyword items included automated writing evaluation (AWE) (occurrences = 100), feedback (occurrences = 44), students (occurrences = 38), English (occurrences = 35), accuracy (occurrences = 33), writing (occurrences = 22), quality (occurrences = 22), teacher (occurrences = 17), written corrective feedback (occurrences = 17), and impact (occurrences = 16).

Fig. 1
figure 1

A keyword item visualization map based on the existing literature related to automated writing evaluation.

The top ten keywords indicated that research on automated writing evaluation tools concentrated on its primary function of feedback. Such tools were designed to provide corrective feedback mostly on English writing and improve writing quality by identifying grammatical errors, adjusting sentence structures, and suggesting diverse synonyms (Zhai and Ma 2023; Gao 2021). It seemed that more studies took the students’ perspectives in English learning contexts (e.g., Wei et al. 2023; Yang et al. 2023), while teacher involvement and perspectives (e.g., Link et al. 2022) were less investigated. Moreover, the top keywords suggested one of the most crucial criteria in automated writing assessment, i.e., “accuracy,” which was applied to the quality of automated feedback and learner revisions (Kloppers 2023; Saricaoglu and Bilki 2021). Overall, examining the impacts of automated writing evaluation tools, researchers emphasized the quality assessment and assurance of foreign language writing (Ding and Zou 2024; Geng et al. 2024).

Based on the connections extended from the top keyword items, we identified four research trajectories of the existing literature. Firstly, the tools were mostly incorporated into formative assessment to evaluate essay writing, which was represented by keywords like “formative assessment” and “formative feedback” (e.g., Foster 2019; Roscoe et al. 2017). Secondly, they could be investigated along with “syntactic complexity,” “genre,” and lexical features (e.g., Fan 2023; Lim et al. 2023). Thirdly, it was studied primarily in second and foreign language learning contexts, which was reflected in “EFL writing,” “EFL learner,” and “L2 writing” (e.g., Geng et al. 2024; Nguyen 2023). Fourthly, researchers grounded their studies on technology acceptance to explore many external factors associated with tool adoption, such as “computer self-efficacy,” “expectation,” “acceptance,” and “adoption” (Wei et al. 2023; Li 2021). However, studies on these tools based on technology acceptance were still scarce, and the factors influencing the adoption of automated writing evaluation tools needed more extensive investigations.

A closer systematic literature filtration and the extended search disclosed that the existing studies on technology acceptance of automated writing evaluation tools probed into multifarious influencing factors. Individual, socio-environmental, educational, and systemic factors significantly explained students’ acceptance and adoption of automated writing evaluation tools based on the traditional Technology Acceptance Model with substantial explanatory power (53.5%–75.6% in Zhai and Ma 2022). Consistently, Li et al. (2019) examined computer anxiety and self-efficacy that enhanced the explanatory power of the traditional Technology Acceptance Model. In another article, Li (2021) added confirmation and satisfaction to extend the psychological mechanisms behind accepting and adopting automated writing evaluation tools among students in China. Focusing on Grammarly use, Chang et al. (2021) confirmed that students with its assistance outperformed others with traditional intervention. Perceived motivation and enjoyment also positively influenced students’ acceptance and adoption of automated writing evaluation tools (Nunes et al. 2022). Despite the popularity confirmed by numerous studies and reviews on such tools (Huawei and Aryadoust 2023; Wei et al. 2023; Huang et al. 2024), research on students’ perceptions has yet to sufficiently clarify the influencing factors of acceptance and adoption.

Unified theory of acceptance and use of technology (UTAUT)

The Unified Theory of Acceptance and Use of Technology was one of the prevalent technology acceptance models widely extended in educational research. This model was based on eight technology acceptance models, enhancing the explanatory power for technology acceptance in various contexts (Venkatesh et al. 2003). Lin and Yu’s (2023) study extending the Technology Acceptance Model suggested that previous research on technology acceptance modeling was considered to combine two primary directions: (1) incorporating new constructs into established technology acceptance models and (2) extending technology acceptance models to new contexts, such as technologies, educational levels, and learning contents (e.g., Strzelecki 2024; Patil and Undale 2023; Budhathoki et al. 2024). Extensive research enhanced the explanatory power of the technology acceptance models with new constructs (Al-Ghaith 2015; Lin and Yu 2023). Meanwhile, bringing specific external factors into technology acceptance models made contributions despite the cost of models’ moderate explanatory power (e.g., Deng and Yu 2023; Wang et al. 2022; Lin and Yu 2024a). Researchers could delve into deeper and finer mechanisms of technology acceptance and innovatively explore these models from various areas.

Five critical variables in UTAUT were explored in this study to extend this model to higher education students’ Grammarly utilization. Definitions of these traditional variables could be adapted from previous literature (e.g., Chang 2012): Performance expectancy (PE) refers to the degree to which higher education students believe that using Grammarly helps them with foreign language writing; effort expectancy (EE) refers to higher education students’ perceived ease of using Grammarly to assist their foreign language writing; facilitating conditions (FC) refer to external conditions (such as institutional, technological, and relational ones) that allow higher education students to use Grammarly to assist their foreign language writing; behavioral intentions (BI) refer to higher education students’ intentions and motivations to use Grammarly to assist their foreign language writing; actual use behavior (AUB) refers to higher education students’ actual behavior of using Grammarly to assist their foreign language writing.

Traditional hypotheses in UTAUT were tested in various educational technologies. Proposed by Venkatesh et al. (2003), UTAUT contained six predicting variables with four additional moderators. According to their evaluation, the unified model outperformed its components, with the adjusted R2 value reaching 69% (Venkatesh et al. 2003). Al-Adwan et al. (2022) incorporated three new constructs into the classical model to explore students’ learning management system use, where the extended model yielded substantial explanatory power (R2 = 81.8%). Research on the traditional UTAUT model in new contexts could also clarify the acceptance and use of different educational technologies. Abbad’s (2021) research on e-learning system acceptance was grounded on UTAUT without specifying the moderating effects, recording the explanatory power of 47.7% ~ 59.9%. The model was adopted to identify a disconfirmation effect in higher education students’ acceptance of an early warning system, revealing the perceived challenges in applying such a technology (Raffaghelli et al. 2022). The extended UTAUT (UTAUT2) model was recently validated in explaining higher education students’ ChatGPT utilization, with substantial explanatory power (Strzelecki 2024). In contrast, research extending UTAUT was still scarce in automated writing evaluation tools like Grammarly. We adapted the following four hypotheses from the traditional model to our research context:

H1: Higher education students’ performance expectancy significantly predicts their behavioral intentions to use Grammarly.

H2: Higher education students’ effort expectancy significantly predicts their behavioral intentions to use Grammarly.

H3: Higher education students’ behavioral intentions to use Grammarly significantly predict their actual use behavior.

H4: Higher education students’ facilitating conditions for using Grammarly significantly predict their actual use behavior.

External variables extending the Unified Theory of Acceptance and Use of Technology

Trust in feedback (TF)

In previous studies on automated writing evaluation and feedback quality, trust in feedback refers to the degree to which learners believe automated feedback tools provide valuable comments and suggestions (e.g., Ranalli 2021). Adapted to our research context, this concept measures how higher education students believe that suggestions provided by Grammarly are suitable and assistive for their foreign language writing tasks. Trust was derived from information system quality that largely accounted for users’ perceived usefulness of and satisfaction with certain information technologies and systems (Kassim et al. 2012). A meta-analytical review disclosed previously established positive effects of trust on satisfaction, enjoyment, and continuance intentions (Mishra et al. 2023).

Trust in feedback was investigated in many contexts. Vorm and Combs investigated intelligent systems to establish the role of user trust in the Technology Acceptance Model (2022); Dautzenberg and Voß’s study focused on trust in automation (2022). Previous researchers linked trust perceived in human-computer interaction with the perceived usefulness of technologies (e.g., Mou et al. 2017). This relationship revealed how the human-computer interaction indicator was associated with the expected benefits of using specific technologies. However, the effect has yet to be extensively validated in automated writing evaluation tools. Rationales existed that learners’ trust in automated feedback could reflect how useful students considered Grammarly, which determined whether they were willing to adopt it in learning. To test this relationship in Grammarly use, this study proposed the following hypothesis:

H5: Higher education students’ trust in feedback provided by Grammarly significantly predicts their performance expectancy.

Peer influence (PINF)

Social influence refers to the degree to which people related to technology users think they should use certain technologies (Venkatesh and Morris 2000). Peer influence could be regarded as a sub-concept of social influence, which was seldom investigated in technology acceptance research (e.g., Yu et al. 2021). Yu and Yu (2019) specified social influence from its sources, included superior and peer influence to extend the Technology Acceptance Model, and suggested that superior and peer influence significantly predicted four traditional technology acceptance variables. Trivedi et al. (2022) found that peer influence significantly moderated the impacts of performance, perceived usefulness, and self on behavioral intentions to use social media among higher education students. Peer influence significantly moderated the impacts of perceived usefulness and ease of use on attitudes toward e-learning technology adoption in vocational education and training (Chatterjee et al. 2021).

In this study, the definition of peer influence was adapted from previous studies (Yu and Yu 2019), which considered the impact of higher education peers on students’ Grammarly use. The role of peer influence could be supported by recent research on technology acceptance, but it was not investigated in the context of automated writing evaluation tools like higher education students’ Grammarly utilization. As peer collaboration and communication demonstrated their increasing significance and competition became fierce in educational contexts (Horta and Li 2023), it was imperative to explore how peer influence played a role in technology acceptance. Therefore, we would like to test its relationships with the traditional variables in UTAUT by the following hypotheses:

H6: Peer influences on higher education students significantly predict their performance expectancy of using Grammarly.

H7: Peer influences on higher education students significantly predict their effort expectancy of using Grammarly.

H8: Peer influences on higher education students significantly predict their facilitating conditions of using Grammarly.

Interactivity (INT)

Perceived interactivity highlights the extent to which users feel that they have the ability to influence the functionalities and outcomes of information systems (Chang et al. 2018). Adapted from this definition, interactivity is defined in this study as the degree to which Grammarly efficiently responds to users’ operations on the platform and users control Grammarly through proper operations to employ the expected functions. Interactive learning environments and tools gained solid theoretical foundations in experiential and simulation-based learning (Chang et al. 2023), while interactive features of educational technologies were still in development (Kolb 2014). Go et al. (2020) extended the traditional Technology Acceptance Model to an interactive one, suggesting a significant role of perceived interactivity in technology acceptance regarding consumption and commerce. More related to educational research, AL-Sayid and Kirkil (2023) delved into eight fine-grained human-technology interaction factors and established their non-linear relationships with the traditional Technology Acceptance Model. Interactive designs for educational technologies could facilitate learners’ use and enhance their engagement in learning (e.g., Kennewell et al. 2008). Additionally, interactive learning modes enhanced students’ performance and efficacy (Hwang and Chen, 2023).

Interactive information systems could provide timely feedback for learners in response to their operations. In this way, learners might find it easier to fulfill the expected functions by controlling the platforms. Additionally, the systemic, functional, and content features could be associated with technological and environmental conditions required for specific tasks like writing evaluations, influencing students’ satisfaction with e-learning systems (Wu et al. 2010). Although the rationales existed for the impact of interactivity on perceived ease of using technologies, the relationship was underexplored through empirical studies on automated writing evaluation tools. With a keen interest in interactive designs for educational technologies, we intended to explore the role of interactivity through the following hypothesis:

H9: The interactivity of Grammarly significantly predicts higher education students’ effort expectancy of using it.

Personal investment (PSI)

Personal investment refers to the degree to which people are willing to put their resources (such as time and money) into particular tasks and items (Wang et al. 2022). This concept was adapted to our research context to examine how much students would devote to using Grammarly. This concept was derived from the Personal Investment Theory, which was proposed to explain students’ activities in foreign language learning regarding motivation, goal-achieving, and expectation (King et al. 2019). It was also associated with commitment to academic goals in the Goal-setting Theory (Locke and Latham 2015). This concept was studied in some empirical research from the perspective of technology acceptance and educational technologies. Wang et al. (2022) adopted personal investment as an external variable to extend the Technology Acceptance Model, suggesting that it significantly contributed to students’ perceived usefulness and behavioral intentions to use a mobile learning application. Taking the perspective of investment, researchers revealed that learners with different strategies in mobile-assisted language learning gained different returns on learning investment (Li and Wang 2024). However, inconsistent findings suggested that personal investment might not predict learners’ continuance intentions in multimodal language learning (Huang et al. 2024).

Digital tools and online resources have become predominant intellectual property nowadays. Willingness to pay for online and digital learning styles demonstrated its importance (Berhan et al. 2023). It might determine what functions students could use with digital learning tools and how students’ learning could benefit from paid digital learning tools and resources. However, this concept was not thoroughly investigated in technology acceptance research of automated writing evaluation tools. With artificial intelligence integrated into automated writing evaluation tools, the paid functions of digital resources and tools could widen the differences in e-learning outcomes. Therefore, this study would investigate the following hypothesis:

H10: Higher education students’ personal investment in Grammarly significantly predicts higher education students’ effort expectancy of using it.

Willingness for e-learning (WEL)

Previous investigations enriched the role of willingness in various contexts, such as willingness to communicate in digitalized learning environments or adopt innovative learning technologies (Zadorozhnyy and Lee 2023; Deng and Yu 2023). According to previous definitions, willingness reflected the attitudes and intentions toward specific tasks (e.g., Deng and Yu 2023). We intended to explore willingness for e-learning, as in Patil and Undale (2023). Educational technologies urged a transformation of learning to digital learning environments. However, students showed inconsistent attitudes toward digital learning tools and e-learning environments. Based on previous studies, willingness for e-learning was explored in our context, measuring higher education students’ voluntariness to adopt a series of tools to form a digitalized workflow for learning.

Numerous studies observed students’ proactive use of digital learning tools and its multifarious influencing factors, especially during the COVID-19 pandemic (Johnson et al. 2021; Yu and Yu 2023). In contrast, students might resist transforming their learning to new flows and media, even if these educational technologies and resources could greatly facilitate learning (Aboagye et al. 2021). Previous research also explored similar ideas in various concepts, for example, “personal innovativeness” in Deng and Yu’s (2023) investigation of TikTok use in higher education, despite slight differences between their concept and willingness for e-learning in this study. Digital learning readiness might account for the conditions of e-learning, which was investigated in association with technology adoption intentions (Rafiee and Abbasian-Naghneh 2021). In the rapid transformation to a technology-enhanced education era, willingness for e-learning from the perspective of technology acceptance research could allow researchers to dive into learners’ attitudes. The inconsistent findings in the existing literature and the scarce evidence in automated writing evaluation tool applications might confound further practice of learning and teaching. Therefore, we would test the following hypothesis:

H11: Higher education students’ willingness for e-learning significantly predicts their facilitating conditions for using Grammarly.

Instructional support (IS)

Instructional support measures students’ accessible instruction and support regarding specific tasks (Mai et al. 2024; Gedrimiene et al. 2024). In our context, we explored perceived support through the instruction about how to use Grammarly to assist their foreign language writing successfully. This concept should be a component of facilitating conditions. Technological challenges dominantly prevented students from adopting digital learning tools, for example, in online learning contexts (Rasheed et al. 2020). Students and teachers might be bothered by similar difficulties in gaining or providing instruction on using educational technologies and unwillingness to shift to digital education practices (Bond et al. 2018). Students might seek resources for technological instruction, and the available instructional support could help them solve some problems when they used learning technologies. The perceived support mediated the relationship between the application of the community of inquiry and students’ learning outcomes (Sun and Yang, 2023).

Relevant instruction regarding students’ use of educational technologies might significantly reduce the challenges and unwillingness to use digital learning tools (e.g., Schworm and Gruber 2012). When education turned from instructor-led to student-centered modes, self-scaffolding functions in e-learning tools could better satisfy personalized learning needs (Costley and Lange 2023). Despite its critical role, instructional support was still unclear in automated writing evaluation applications. We intended to test the positive impact of instructional support on facilitating conditions in Grammarly utilization through the following hypothesis:

H12: Instructional support significantly predicts higher education students’ facilitating conditions of using Grammarly.

Research aims, questions, and a postulated model

The keyword visualization and systematic review above revealed four primary research paths in practicing automated writing evaluation tools. In line with technology acceptance research, this study intended to introduce some innovative factors that would explain the acceptance and adoption of such tools. The shortage of research exploring technology acceptance and adoption of Grammarly encouraged us to contribute to this topic by establishing the interrelationships underneath the proposed variables. Figure 2 demonstrates the interrelationships between six innovative external factors and five traditional concepts from UTAUT. Based on the above literature reviews, 12 research hypotheses were tested through this study. We aimed to explore how perceptual and systemic factors influenced higher education students’ attitudes toward Grammarly use and their actual use behavior. We would answer two research questions (RQs) in this study:

Fig. 2: A proposed model extending UTAUT to explain higher education students’ acceptance and adoption of Grammarly.
figure 2

Note. Research hypotheses are numbered from H1 to H12 on the paths.

RQ 1 (for quantitative phase): How do perceptual and systemic factors influence higher education students’ acceptance and adoption of Grammarly?

RQ 2 (for qualitative phase): What are the primary advantages and disadvantages of Grammarly according to higher education students’ perceptions?

Methods

Survey instrument design

We designed a comprehensive questionnaire survey to test the hypotheses, elicit subjective opinions, and explore the research questions. The questionnaire consisted of three sections. The first section collected informed consent, age, gender, city, and educational stage. The second section measured variables, each with four or five statements. Eleven variables in this study were measured with 45 items. Specifically, the variables with the referred measurements were as follows: For the traditional variables within UTAUT, such as PE, EE, FC, BI, and AUB, we referred to previous validations and extensions of this model (e.g., Venkatesh et al. 2011; Abbad 2021; Lakhal and Khechine 2021; Hou and Yu 2023). For innovative variables that we incorporated into the traditional model, we adapted the measurements from other empirical studies, although they might not rely on UTAUT. Such variables and their references were as follows: IS (Lin and Yu 2023), TF, INT, PINF (Yu and Yu 2019), PSI (Wang et al. 2022), and WEL (Chang et al. 2017).

An additional attention test was set in the middle to examine whether participants were sufficiently attentive when responding to the questionnaire. The regular items measuring each variable were based on previous literature. In contrast, some variables were not explored in research on technology acceptance, where we demonstrated our innovative contributions to evaluating and incorporating them into our proposed model. The self-designed measurements based on literature could be established by content and statistical validation, demonstrating sufficient validity and reliability. Appendix A of this article provided a complete list of statements used to evaluate each variable. Participants rated each item with a five-point Likert scale, from one (strongly agree) to five (strongly disagree). The third section was an open-ended question, performing an indirect interview about the participants’ perspectives on using Grammarly.

Research procedure

When the questionnaire was designed, two trained experts in technology-enhanced education and structural equation modeling research reviewed and approved the questionnaire. We adopted a popular online questionnaire survey platform in China (www.wjx.cn) to pose the questionnaire, allowing us to use a hyperlink or a QR code to distribute it easily. For data collection, rapid convenience sampling methods were used for this study. Convenience sampling methods refer to the data collection practice where researchers invite participants in an available population close to hand (Obilor 2023). Although they were not completely random in nature, the convenience sampling methods were widely adopted for large-sample structural equation modeling studies (e.g., Sun and Guo 2024). It was one of the most frequently adopted data collection methods in applied linguistic research (Dörnyei 2007).

This sampling method encouraged the researchers to spread the questionnaire through the Internet, including social media and instant messaging applications. The questionnaire survey could reach as many participants as possible in this way, and all possible participants who had related learning experiences and considered themselves eligible could contribute to this survey. Our data were primarily collected through WeChat groups, where we invited participants by sending information about this survey, and the participants could choose to participate and provide their data voluntarily without further intervention. The data sources also included other social networking communities and platforms popular in China, such as Tencent QQ, ResearchGate, and Xiaohongshu. The sampling approaches followed previous methods for educational research using structural equation modeling, such as Hou and Yu (2023) and Deng and Yu (2023). Data quality could also be tested with rigorous statistical approaches in the following sections.

We distributed the questionnaire before the sample size reached a good level for further analysis. Previous studies widely suggested a sample of over 300 as a good one and over 500 as a very good one for structural equation modeling (Raza et al. 2020). Other researchers practiced the “ten times” principle, suggesting that the sample size should be more than ten times the measurements, i.e., 450 in this study (Hair et al. 2014; Hidayat-ur-Rehman and Ibrahim, 2023). We collected 548 responses from 11 April to 3 November 2023. According to informed consent (two participants disagreed), attention test results (53 participants failed), and educational levels (six participants were identified as teachers or workers), we obtained 487 valid records for the statistical analysis. The valid responses formed an adequate sample size for our statistical analysis.

After data collection, the survey platform allowed us to export the data as a CSV file, which was used first to reveal the demographic information about our participants and to test the validity and reliability of our measurement. Assuring that the measurements were valid and reliable, we used SmartPLS 4.1.0.3 to test our proposed model and hypotheses with the collected data (Ringle et al. 2024). SmartPLS is a program based on partial least squares structural equation modeling (PLS-SEM). We could use the “PLS-SEM algorithm” to calculate the construct validity and reliability, discriminant validity, collinearity statistics measured by the variance inflation factor (VIF), and R2 values; the “Bootstrapping” function could calculate path coefficients and their statistical significance; the “PLSpredict” function could evaluate the model’s predictive power measured by Q2 values. Reporting of these indices referenced the existing publications like Gao et al. (2024), and those statistics would be explained in the following sections. The qualitative results from the open-ended question were analyzed through thematic classification and analysis regarding the advantages and disadvantages of Grammarly by both authors.

Statistical approaches

Eligible participant selection and normality test

After data collection, we first excluded participants who did not provide informed consent to the researchers since the survey questionnaire would end if the participants disagreed to participate. Focusing on higher education students’ acceptance and use of Grammarly, we also excluded participants identifying as “Teachers and workers”. According to the attention test, participants who did not choose “Agree” as required were considered not attentive when responding to the questionnaire, and their responses were excluded for more accurate estimates. These eligible responses should be tested for the normality of distribution, as Kline (2015) suggested that multivariate normality should be assumed for structural equation modeling with maximum likelihood estimation. Kline (2015) provided the guideline for the normality test: When the absolute value of skewness was smaller than 3.000 and that of kurtosis smaller than 10.000, the normality of the dataset should be acceptable; otherwise, the distribution would be extremely skewed. The normality of each item was evaluated with SPSS 26.0.

Validity and reliability tests

We tested the validity and reliability of the questionnaire. First, we calculated factor loadings using items for each variable with SPSS. Factor loadings were calculated with the principal component method when the fixed number of factors was set as 11. We adopted the varimax method for factor rotation. A factor loading larger than 0.500 could show that the item measured the corresponding factor satisfactorily (Pham et al. 2019). In contrast, items with factor loadings smaller than 0.500 would indicate unsatisfactory validity and reliability due to poor designs. We removed items with low factor loadings from the measurement model for subsequent statistical analyses to get better validity and reliability. Internal consistency (Cronbach’s α) could be considered excellent when α > 0.900, good when 0.800 ≤ α < 0.900, or acceptable when 0.700 ≤ α < 0.800 (Hutcheson and Sofroniou 1999).

We adopted a Kaiser-Meyer-Olkin (KMO) test and Bartlett’s test of sphericity. When the KMO value was greater than 0.750 and the p-value in Bartlett’s test was significant, the dataset would be suitable for further factor analysis and structural equation modeling (e.g., Zhang and Liu 2022). The average variance extracted (AVE) and composite reliability (CR) were also calculated. According to Fornell and Larcker’s criteria (1981), AVE > 0.500 and CR > 0.500 could demonstrate good validity and reliability of the questionnaire. The Fornell-Larcker method would be adopted to establish the discriminant validity based on AVE and CR values, testing whether the variables were statistically distinguishable (Fornell and Larcker 1981). Another approach was the Heterotrait-Monotrait ratio (HTMT) of correlations proposed by Henseler et al. (2014). For this method, a threshold was proposed by Kline (2015), suggesting that values smaller than 0.850 would demonstrate good discriminant validity.

Common method bias was frequently mentioned in various measurements and originated from the measurement designs in structural equation modeling studies (Kock 2015). Not to mention the theoretical and mathematical significance of testing such bias, a successful approach to identifying it in structural equation modeling research was to observe the variance inflation factor (VIF) values (Kock 2015). In other studies, researchers exempted their measurement models from the collinearity issue with this indicator, for which Hair et al. (2014) set a criterion of VIF < 5.000, or more ideally, VIF < 3.000 (Hair et al. 2019). According to Kock (2015), the low VIF values from a full collinearity test should indicate that the measurement model was not contaminated by common method bias. In addition, the correlation matrix of the measurement model could also be used, where correlation coefficients lower than 0.900 demonstrated the absence of common method bias (Lowry and Gaskin 2014).

Explanatory power of the model and path analysis

The assured validity and reliability of the questionnaire would demonstrate good measurements of the proposed variables. We used the collected data to evaluate to what extent the exogenous variables could explain and predict the endogenous ones. A common statistic used for explanatory power is the R2 value. Ozili (2023) suggested three critical ranges reflecting different explanatory power in social sciences, i.e., 0 < R2 < 0.100 (too low), 0.100 ≤ R2 ≤ 0.500 (acceptable when most explanatory variables were statistically significant), and R2 > 0.500 (significant). Sarstedt and Mooi (2014, p.211) adopted a more general principle: “R2 values of 0.75, 0.50, or 0.25” should be considered substantial, moderate, or weak.

Q2 was a model predictive power indicator, which could be calculated by the PLSpredict function in SmartPLS. Q2 values greater than zero allowed the researchers to compare the mean absolute error (MAE) or the root mean square error (RMSE) with the benchmarks calculated by the linear regression model (LM) (Hair et al. 2021). In PLS-SEM analysis, high predictive power requires all indicators to have lower RMSE (or MAE) values than the naive LM benchmark; medium predictive power is achieved when most indicators have smaller prediction errors than the LM; low predictive power is indicated if only a minority of indicators show lower PLS-SEM prediction errors than the LM; the lack of predictive power is evident if none of the indicators in PLS-SEM have lower prediction errors than the LM (Hair et al. 2021).

Path analysis would examine the effect of one variable on another that could be measured by a path coefficient (β) and an f2 value at the significant level of p < 0.050. The path would demonstrate a positive effect when β > 0 and a negative one when β < 0. Effect sizes would be determined by different ranges of the absolute values of β. When the absolute values of β fell into 0 ~ 0.100, 0.100 ~ 0.300, 0.300 ~ 0.500, or over 0.500, the effect size of the path would be weak, modest, moderate, or strong (Zhang and Liu 2022). According to Cohen (1988), f2 values would indicate large (f2 > 0.350), medium (f2 > 0.150), or low effect sizes (f2 < 0.020).

Results

Demographic information and normality test of each item

Table 1 displays the demographic information of 487 eligible responses that could be considered a good sample for structural equation modeling. The subsequent statistical analysis would be based on these valid records. Due to our personal network and research areas, participants were dominantly females, which was common in arts, humanities, and social sciences majors in China. To reflect higher education students’ attitudes toward Grammarly utilization more comprehensively, the survey also invited students from natural science majors during our data collection, such as engineering, mathematics, and medical science. Our survey questionnaire was actively responded to by master’s students and undergraduates, with only a few doctoral students. Participants were from 28 provinces or regions of China and identified as regular Grammarly users, among which the largest proportion came from Beijing (19.10%). The low proportion of each province and region indicated the diverse sources and the robust representativeness of our participants. Responses to each item were tested for the normality of distribution. The skewness of all items was [–1.130, –0.128], and the kurtosis was [–0.684, 3.342]. The absolute values of skewness and kurtosis were smaller than 3.000 and 10.000, respectively, indicating that the dataset was normally distributed. The normality test results satisfied the preconditions for factor analysis.

Table 1 Participants’ demographic information of valid records for statistical analysis.

Validity and reliability of the survey questionnaire

The KMO value was 0.960 > 0.750, and Bartlett’s test was statistically significant (p < 0.001), which satisfied the preconditions of factor analysis. According to the confirmatory factor analysis results, we deleted INT4 and PE4 due to their low factor loadings (<0.500), after which the rest items gained satisfactory factor loadings to 11 different variables, and the preconditions based on the KMO value and Bartlett’s test remained supported. Table 2 demonstrates that the factor loadings were all greater than 0.500. Examining variables with their items, we found satisfactory internal consistency with Cronbach’s α > 0.500, composite reliability (CR) > 0.500, and average variance extracted (AVE) values > 0.500.

Table 2 Measurement model assessment.

The dataset was exempted from common method bias through the collinearity test. Collinearity test results showed that VIF < 3.000 was supported for most items, with several acceptably close to 3.000 and much smaller than 5.000. Meanwhile, the results could also be interpreted according to Kock’s (2015) method of identifying common method bias in partial least squares structural equation modeling techniques. VIF values lower than 5.000 suggested that the measurement model in this study was free from obvious common method bias. Discriminant validity was tested using two methods in this study. First, the Heterotrait-Monotrait ratio of correlations were all smaller than 0.850. Table 3 demonstrates that the discriminant validity of this survey reached an acceptable level. The results supported that variables measured with the survey questionnaire could be distinguished statistically.

Table 3 Discriminant validity test results using the Heterotrait-Monotrait ratio (HTMT) of correlations.

Then, the Fornell-Larcker method was also adopted to test the discriminant validity. Table 4 places the square roots of average variance extracted values in bold on the diagonal of this matrix, while inter-construct correlation coefficients (r) were displayed below between different variables. The square roots were larger than the correlation coefficients on their corresponding rows and columns. As such, we could support the discriminant validity in this study through the Fornell-Larcker method. This matrix also displays the correlations between variables, and significant correlations (r ≥ 0.500) could be identified in 34 pairs in the following table. The three largest correlation coefficients could be found between (1) peer influence and instructional support (r = 0.707), (2) willingness for e-learning and behavioral intention (r = 0.695), and (3) systemic interactivity and trust in feedback (r = 0.688). All correlation coefficients in Table 4 were smaller than 0.900, which corroborated that the measurement model in our study was free from common method bias.

Table 4 Discriminant validity test results using the Fornell-Larcker method.

Path analysis and research hypothesis testing (RQ1)

The above sections demonstrated that validity and reliability were acceptable for this study. Table 5 illustrates the full results of structural model estimates and the model’s explanatory and predictive power. According to path analysis, we accepted all research hypotheses except H8: Peer influence could not significantly predict facilitating conditions. According to the f2 values of the supported paths, a large effect size was identified in H3 (f2 = 0.415) and a small effect size in H11. The other paths demonstrated moderate effect sizes. Measured by R2 values, the proposed model could explain 19.6%–46.4% of variances in traditional variables in UTAUT regarding higher education students’ use of Grammarly to assist their foreign language writing. The explanation power for the latent variables was acceptably moderate. Table 6 illustrates that measured by Q2, RMAS, and MAE values, the model had nearly medium predictive power for the outcome variables (i.e., PE, EE, FC, BI, and AUB), according to Hair Jr et al.’s criteria (2021). In short, the model had acceptably good explanatory and predictive power for higher education students’ Grammarly utilization.

Table 5 Results of structural model estimates.
Table 6 Predictive power estimates with PLSpredict.

Thematic analysis of qualitative results (RQ2)

Automated suggestions about grammar and vocabulary

The advantages and disadvantages filtered in the qualitative phase primarily contained three aspects. Most dominantly, participants perceived benefits in the reduced workload for correcting grammatical mistakes. They affirmed that Grammarly greatly assisted them by identifying and correcting grammatical errors that they were unaware of. For example, “Non-native speakers may be insensitive to particular types of mistakes and grammatical rules. Grammarly addresses those issues effectively” (Participant 66). They found Grammarly an efficient writing tool to unify punctuation uses, which saved much time and effort, especially for academic purposes. “It helps me better comply with writing standards for academic purposes” (Participant 342).

However, they also highlighted that Grammarly sometimes provided inaccurate corrections for some errors, such as articles and verb-preposition collocations. Other participants suggested that the advice for academic writing might need to be more content-specific. They argued, “The suggestions for different genres and purposes should be improved to distinguish my writing for formal, informal, academic, and general purposes. Although the options are provided, I don’t see many differences” (Participant 235). Furthermore, the participants noted that Grammarly enhanced the diversity of vocabulary use and improved the accuracy of spelling and collocation. “I can use it to enrich my expressions. However, the synonyms provided might be rigid and inappropriate in the context. The suggestions need my additional efforts to determine which synonyms fit my sentences.” (Participant 217). Another participant (105) suggested, “I need a dictionary and more exemplary sentences integrated into the platform to clarify the suggested usages.

User interface

They found the user interface to be difficult to manipulate and unstable. For some participants, the systems “refreshed multiple times and might break down when there were too many errors in the texts” (Participant 173). The configuration of the website “needed to be simplified” (Participants 69 and 137). Participants also indicated their need to use this platform on multiple devices, while they thought the plug-ins provided for Word Office and the websites were unstable and lowered their working effectiveness due to significant delays and slow responses. They considered the mobile version a considerate extension, but the user interface was inconvenient to use, with occasionally unstable and slow responses. Additionally, the participants considered the cost of the premium version so high that it prevented their use.

Incorporation of artificial intelligence

Participants shared the belief that artificial intelligence chatbots like ChatGPT now had the potential to outperform Grammarly. They pointed out that Grammarly “lacked quick updates to its language model, failed to learn and adapt to users’ language use patterns, and struggled to recognize emerging popular words” (Participant 24). They also noticed that the suggested synonyms provided by Grammarly were not always appropriate for the given context. In contrast, large language models with generative functions allowed users to personalize their preferred revision suggestions. Artificial intelligence technologies could also provide suggestions on logic flow and coherence, while “Grammarly seemed not to provide such suggestions on the level of the entire text” (Participant 295). The participants appealed to further integration of artificial intelligence technologies into Grammarly.

Discussion

Extending traditional hypotheses in UTAUT to Grammarly

Paths adapted from the traditional UTAUT model (H1 to H4) can be supported in explaining higher education students’ Grammarly utilization. Thus, the findings in this study provide solid evidence for extending UTAUT to a new context, where a specific automated writing evaluation tool is used to assist foreign language writing. Considering similar variables conceptualized in the traditional Technology Acceptance Model, the findings in this study are consistent with Zhai and Ma’s (2022) modeling results of automated writing evaluation among Chinese college students, although they focused more on how systemic characteristics influenced students’ acceptance and use of such tools from technological sophistication, a rather compound and complex concept.

We also find consistent results about technology acceptance and use with previous validation of the traditional UTAUT model in various educational technologies and contexts, for example, learning management systems (Al-Adwan et al. 2022). Performance expectancy demonstrates higher education students’ expectation of benefits and system usefulness when Grammarly assists foreign language writing; effort expectancy corresponds to their perceived ease of using this platform or program. These two dimensions capture the primary motivators for students to use Grammarly. Thus, the relationship between learning motivation and automated feedback tools in previous literature can be extended to the sources of motivation from students’ perspectives (e.g., Yao et al. 2021). Based on the close relationships among technology acceptance models, the validated traditional hypotheses of UTAUT can also support theories such as UTAUT-3, Technology Continuance Theory (TCT), and Value-based Adoption Model (VAM) (Farooq et al. 2017; Pasupuleti and Thiyyagura 2024; Kim et al. 2007).

The facilitating conditions examined in this study significantly predict students’ actual use behavior of Grammarly (H4), presumably because they allow students to avoid many challenges and difficulties when using Grammarly. This explanation can be supported by previously established challenges on students’ side in technological and learning aspects when automated writing evaluation tools are incorporated into practice (Deeva et al. 2021). Technical challenges have inspired researchers to explore technology readiness, for example, under the Theory of Planned Behavior (Sungur-Gül and Ateş 2021). The mediating effect of behavioral expectations has been found significant in the effects of facilitating conditions on actual use behavior (Venkatesh et al. 2008), supporting the above reasoning for H4.

Interestingly, H1, H2, and H4 demonstrate smaller path coefficients (i.e., effect sizes) than H3. This reflects the differences between behavioral intentions and the actual use of Grammarly. A possible reason is that students still have concerns regarding incorporating Grammarly into their learning practice, even when they understand the functionality and usefulness of this tool. This is a common dilemma found in other educational technologies in various application contexts, for example, in a study of automated writing evaluation tools used in primary education (Wang et al. 2020). Based on our qualitative data in this survey, challenges may include the high cost for full functions on this platform, some unsatisfactory suggestions on revision, and students’ actual needs when foreign language writing tasks are not frequent in their learning presently. Additionally, the necessary skills and knowledge for digital learning are critical to the successful digitalization of higher education. To better prepare students for advancements in educational technologies, the Technological, Pedagogical, and Content Knowledge (TPACK) framework demonstrates its significance among teachers (Abubakir and Alshaboul 2023).

New constructs explaining Grammarly utilization

Most external factors explored in this study significantly predict higher education students’ acceptance and use of Grammarly based on UTAUT. Trust in feedback quality perceived by higher education students can be a dominant aspect of their performance expectancy in using Grammarly (H5), determining the benefits students can gain from it. In a traditional peer assessment context, Rotsaert et al. (2018) established the critical role of feedback quality in encouraging secondary education students to accept peer feedback. Considering automated feedback provided by Grammarly, trust in feedback reflects students’ subjective evaluation and perception of this automated writing evaluation tool. The result is consistent with Zhai and Ma’s (2022) exploration of contributing factors to students’ perceived usefulness. Many factors have been associated with feedback quality and students’ trust in automated evaluation and feedback regarding foreign language writing, such as accuracy and reliability (Zhai and Ma 2022). Some negative findings can also be the evidence, indicating that surface-level feedback by automated evaluation reduced students’ engagement and their incorporation of such tools into learning practice (Tian and Zhou 2020).

Peer students’ influence contributes to performance and effort expectancy (H6 and H7), probably through recommendation and exemplification. Students may feel less challenged or at risk if their peers successfully use and recommend such learning tools. The explanation is grounded on the peer learning theories, where researchers established knowledge construction and skill acquisition as a process within socio-cultural interaction (Meschitti 2019). The results are consistent with Yu and Yu’s findings (2019) in mobile-assisted learning. However, peer influence may have inconsistent impacts on students’ acceptance and use of Grammarly. Based on H8, peer influence cannot significantly predict their facilitating conditions of using Grammarly. One possible reason is that excessive peer influence brings peer pressure that discourages students from easy and comfortable use of this tool. Alternative feedback of higher quality may also prevent students from using automated writing evaluation tools (Tian and Zhou 2020). This finding is consistent with Herting et al. (2023), who found an insignificant predicting effect of social influences on higher education students’ intentions to use PowerPoint. Comments on Grammarly are inconsistent in our study and depend on subjective evaluations and specific educational needs, presumably leading to the insignificant effect of peer influence.

The systemic interactivity perceived in higher education students’ use of Grammarly and personal investment significantly predict their effort expectancy (H9 and H10). Zhai and Ma (2022) explored the technological characteristics of automated writing evaluation, suggesting that the technological sophistication of the system was significantly related to students’ perceived ease of use. Their systemic characteristics encompassed interactivity, flexibility, and a sense of control. In contrast, this study concentrates on the interactive designs of Grammarly and finds results congruent with their established contribution from a more general perspective. Personal investment is a significantly positive predictor of effort expectancy. As this construct includes the investment of money and time, the paid premium functions may allow students to gain better user experiences, and more time invested promotes their proficiency in understanding and using automated writing feedback, hence stronger perceived usefulness and ease of use. From the theoretical perspective, this study specifies that among limited personal resources, the investment of time and money can result in easy utilization and strong use intention, which points out the challenges and opportunities of wider applications. This established effect (H10) may become more stable if learners’ concerns about the high cost of the premium version can be addressed, or cheaper substitutes with better functionality will result in a stronger impact of effort expectancy in alternative tools.

Willingness for e-learning and instructional support significantly predict higher education students’ facilitating conditions of using Grammarly (H11 and H12). Students’ academic experiences and expectations contributed to their understanding of academic work, further encouraging them to exploit digital learning tools, as Lin and Yu (2023) established. The Expectation-Confirmation Model has been validated across educational technologies and learning contexts, suggesting that satisfying the expectations of using certain educational technologies leads to stronger perceived usefulness and the intentions to use them (Obeid et al. 2024; Bhattacherjee 2001). Extending confirmation, e-learning experiences, and academic expectations, students’ willingness for e-learning reflects how they perceive the benefits of Grammarly in assisting their writing based on their experiences and expectations (Fu et al. 2022). Instructional support helps students avoid technical challenges and ensure that they can use the platform successfully. The findings of this study establish that those two aspects dominantly explain students’ facilitating conditions for using Grammarly.

Conclusion

Major findings

This structural equation modeling research explored external factors from perceptual and systemic features influencing higher education students’ acceptance and use of Grammarly to assist foreign language writing based on the traditional Unified Theory of Acceptance and Use of Technology. We found that traditional hypotheses were successfully extended to explain the acceptance and use of Grammarly. Students’ performance and effort expectancy significantly predicted their behavioral intentions to use Grammarly. Facilitating conditions and behavioral intentions significantly predicted the actual use behavior of Grammarly. Additionally, external factors extended the traditional model. Students’ trust in feedback and peer influence significantly predicted their performance expectancy. Peer influence, perceived interactivity, and personal investment significantly predicted their effort expectancy. Willingness for e-learning and instructional support significantly predicted the facilitating conditions of using Grammarly, while peer influence did not. Participants perceived dominant advantages in prompt and accurate feedback, while the inaccuracy in particular circumstances and the high cost resulted in significant concerns. The participants called for further incorporating artificial intelligence into the platform to enhance the accuracy and quality of automated feedback.

Limitations

We must acknowledge some limitations of this study. First, the outcome variables in the proposed model only yielded about 20%–50% explanatory power, suggesting that other variables not included in the research model may also have significant explanatory and predictive power. However, it was inevitable that some innovative and specific variables explored in technology acceptance models came at the cost of models’ low explanatory power. In social science research, small explanatory power (R2 of 10%–50%) could also be significant given the complex influencing mechanisms. This study could contribute to the existing literature with statistically significant paths and established validity and reliability. Not all links between the proposed variables were tested in this study due to our theoretical and practical consideration of the existing literature, which influenced the explanatory power. Future researchers may further examine the interrelationships underexplored among our proposed variables. Second, this study focused on higher education students, while the variables might be extended to other educational levels and technologies for a more comprehensive understanding. Third, the demographic information lacked some details, such as the affiliations of our participants and majors. However, the representativeness of our collected dataset could be justified by other aspects, such as our data collection sources and the identified provinces and regions of the participants. Future studies should be encouraged to collect a more comprehensive and balanced population in genders, ages, majors, and areas.

Implications for future research

Our findings have both theoretical and practical implications. Theoretically, this study incorporates new external variables to UTAUT, including trust in feedback, peer influence, interactivity, personal investment, willingness for e-learning, and instructional support. These features can reveal how specific perceptual and systemic factors influence higher education students’ acceptance and adoption of Grammarly. This study also extends the traditional UTAUT to Grammarly utilization, bridging the scarcity of the existing literature. The results can enrich technology acceptance theories that are conceptually similar or related to UTAUT and offer evidence regarding their adaptivity to emerging contexts. This study adds a specific example of automated writing evaluation tool utilization, which aligns with the implications of Ding and Zou’s systematic review (2024). This study clarifies higher education students’ psychological mechanisms interacting with automated writing evaluation tools. For example, the role of perceived interactivity established in this study further strengthens the significance of interactive learning environments (e.g., Chang et al. 2023).

This study sets an example of exploring innovative and detailed variables in technology acceptance research. Future researchers are encouraged to validate the specific benefits of automated writing evaluation tools in technology acceptance models at different educational levels, writing tasks, and linguistic levels (such as vocabulary, grammatical correctness, and content organization) (e.g., McCarthy et al. 2022). Such methods will allow technology acceptance research to further contribute to domain-specific language education research, congruent with a sustained tendency to apply objective approaches to language testing. Self-designed and adapted measurements validated in this study can provide references to extend our findings to language learners and assessments at various educational levels, presumably in association with other innovative variables. To generate innovative external predictors of technology acceptance and adoption, Yu et al. (2024) study sequentially combined a grounded theory approach, fuzzy set qualitative comparative analysis, and structural equation modeling techniques. Moreover, keyword and publication distribution analysis through bibliographic techniques can also enlighten educational technology researchers, for instance, to dig out innovative predictors and effectiveness indicators (Wang et al. 2024; Lin and Yu 2024b).

Practically, teachers need to guide students in using automated writing evaluation tools like Grammarly so that more students may benefit from it. With the popularity of computer-mediated collaborative learning in teaching writing skills, frameworks and theories capturing teachers’ and students’ feedback literacy should be explored in the applications of automated writing evaluation (e.g., Saeed and Alharbi 2023). This study offers evidence regarding the quality of automated writing feedback by examining students’ trust. Nevertheless, feedback literacy encompasses learners’ openness to feedback, engagement with feedback, and the enactment of feedback (Woitt et al. 2023). Other researchers framed feedback literacy to include information-seeking, sense-making, feedback utilization, and emotional regulation in dealing with feedback (Dawson et al. 2023). This study only adds limited contributions to the literacy of automated feedback, especially in the present time of artificial intelligence-empowered writing assessment. Future studies may further explore feedback-seeking behavior with automated writing evaluation tools to enrich the conceptualizations and theories of feedback literacy. For example, investigations can enrich the understanding and pedagogical implications of using automated writing evaluation to assist peer assessment activity and enhance teachers’ feedback literacy (e.g., Li 2023; Link et al. 2022).

Proper instruction may enlarge the positive impact of peer modeling on students’ acceptance and use of automated writing evaluation tools. The unstable impact of peer influence should arouse educators’ awareness of students’ peer pressure when they adopt educational technologies. Successful transformation to digitalized learning and workflow needs proper guidance to enhance students’ willingness. As for specific approaches, proper instruction can be strengthened by teachers, technology designers, service providers, and other resources. Given the complementary role of automated writing evaluation in some studies, future instructors may consider combining teacher, peer, and automated writing feedback to enhance student engagement and quality of writing (e.g., Zhang and Hyland 2022). Future automated writing evaluation designs may attach significance to interactive features, accuracy, reliability, and instruction on use to enhance students’ acceptance and use of automated writing evaluation tools at all educational levels.