Introduction

Generative artificial intelligence (GenAI) elevates machine translation (MT) into a new stage. Under the generative artificial intelligence translation (GenAIT) model, source texts can be translated and further revised by inputting prompts to an AI system like ChatGPT. Before the extensive application of GenAIT systems, Neural Machine Translation (NMT) is widely acknowledged for its notable performance in delivering translations with high accuracy and fluency (Klubička et al., 2017; Moorkens, 2018), especially in managing inversion in lexical level (Toral and Sánchez-Cartagena, 2017), improving sentence readability (Qin, 2018) and enhancing textual coherence (Zhang et al., 2020). In addition to its advantages, shortcomings of NMT have also been identified (Jiang and Niu, 2022). Nevertheless, to date, the linguistic features of GenAIT, and its (dis)advantages compared to human translation (HT) are still underexplored.

To fill this gap, a comparative study of GenAIT and HT has been conducted. In this study, ChatGPT 3.5 is employed to generate GenAIT and HTs are done by 19 MTI (Master of Translation and Interpreting) students in a university located in South China. The assumption underpinning this study is that GenAIT and HT present distinguished linguistic features and there is a possibility to integrate their respective strengths for higher quality translation. This empirical study focuses on the translation of scientific texts provided its significant role in promoting technological exchanges among countries, as highlighted by Cao (2000). Through its findings, the present study endeavours to enrich the understanding of GenAIT and facilitate effective collaboration between generative artificial intelligence and human translators.

Literature review

Linguistic features between MT and HT

GenAIT, the main concern of the present study, signifies an advanced phase in MT. Research in the comparison of MT-HT linguistic features can be mainly categorised into two types: one in the paradigm of translation studies and the other in the paradigm of linguistic analysis.

Research in the paradigm of translation studies has mainly focused on translation universals (Lapshinova-Koltunski, 2015; Krüger, 2020), translation quality (Ahrenberg, 2017; Muftah, 2022) and translationese (Loock, 2020). Drawing on translation universals (Baker, 1993), Lapshinova-Koltunski (2015) investigates the differences among HT, rule-based MT and statistical MT, concentrating on simplification, explicitation and convergence. Furthermore, Krüger (2020) delves into the explicitation in Neural Machine Translation (NMT), identifying that such translation shifts primarily occur through lexical insertion, lexical specification and relational specification. Regarding translation quality, BLEU and TER metrics are utilised in Ahrenberg (2017). Specifically, in terms of adequacy, Muftah (2022) finds no statistical difference between HT and MT from Google Translate and Babylon Translate. Regarding translationese, MT systems are effective in helping students to identify translation-specific language patterns (Loock, 2020).

Compared to the paradigm of translation studies, research in the paradigm of linguistic analysis has encompassed a broader range of topics, including lexical analysis (Klebanov and Flor, 2013; Vanmassenhove et al., 2019; Frankenberg-Garcia, 2022), morphologically complex words (Passban et al., 2018), cohesion (Wong and Kit, 2012; Niu et al., 2020), syntactic features (Sianipar and Sajarwa, 2021) and figurative language and style (Alowedi, Al-Ahdal, 2023). Many scholars are interested in lexical analysis. For instance, the lexical tightness (a measure of association within a text) of HT has been found greater than that of MT (Klebanov and Flor, 2013). However, when comparing MT with HT in English-French and English-Spanish language pairs, current MT systems fail to achieve the same level of lexical diversity as human translators do (Vanmassenhove et al., 2019). Moreover, through a corpus-driven keyword analysis, MT has problems like lexical inconsistency (Frankenberg-Garcia, 2022). With respect to morphological features, Passban et al. (2018) proposes a double-channel encoder and double-attentive decoder structure, enhancing the performance of NMT in translating morphologically complex words. Regarding cohesion, human translators tend to use more cohesive devices (Wong and Kit, 2012) and MT exhibits weaker coherence compared with HT (Niu et al., 2020). In terms of syntactic features, the use of passive voice, tense, sentence structure and voice conversion differs between MT and HT in Indonesian-English language pairs (Sianipar and Sajarwa, 2021). Concerning figurative language and style, AI-powered translation system is recognised as a powerful tool though it fails to properly translate the figure of speech and preserve the style of the source text (Alowedi, Al-Ahdal, 2023). This aligns with Mehawesh’s (2023) findings, which suggest that human translators are more effective in translating emotive words.

While previous studies in the field of MT-HT linguistic comparison have been extensive, there are notable gaps. Firstly, despite previous research on GenAIT-HT linguistic comparison among literary (Zagood et al., 2021) and legal texts (Moneus and Sahari, 2024), insufficient attention has been devoted to investigating linguistic differences between GenAIT and HT of scientific texts thus far. Secondly, primary emphasis within existing studies is mainly placed on Indo-European languages like French, English and Portuguese, with relatively few concerns on Chinese. With the rise of large language models (LLMs) like ChatGPT potentially offering alternatives to neural machine translation (Lionbridge.com, 2023), and given the vast number of Chinese speakers worldwide, further exploration of the linguistic attributes of GenAIT and HT within the context of Chinese-English scientific translation is imperative.

Scientific text translation

Scientific texts are informative, expository or referential, objectively ascertained and universally valid (Millan-Varela, 2013). As scientific texts offer insight into the development of specific fields (Dunham et al., 2020), studying translated scientific texts is indeed a meaningful way to facilitate the spread of knowledge.

Characteristics and translation standard of scientific texts should be clarified firstly. The scientific text, at lexical level, usually contains an array of proper nouns, abbreviations, tables, figures and so on. In syntactic dimension, the language of these texts is often complex, typified by extended sentence structures, a preference for the passive voice and a focus on noun phrases (Fu and Tang, 2012). Besides, the main function of scientific text is informative (Olohan, 2013). To fulfill the informative role of scientific text translation, maintaining accuracy and objectivity is paramount. However, securing these attributes in translations can pose a significant challenge for translators. For instance, an accurate and justifiable interpretation is a principal challenge for students in rendering English to Arabic (Abdulmajeed and Mohammed, 2023). Additionally, scientific texts often contain technical vocabulary, ordinary words with non-vernacular meanings, ellipsis, lengthy noun phrase and complex sentence (Fang, 2006).

Studies on the MT and HT of scientific texts span a diverse array of topics. For instance, Tehseen et al. (2018) proposes an effective scientific machine system which can work for any domain. Besides, from the cognitive perspective, grammar is the main cause of most problems in translating a piece of scientific text into English (Minchenkov, 2019). As for the translation quality, Fan et al. (2020) demonstrates that computer-aided methods can generate translated scientific texts with quality closely resembling that of professional translators. Moreover, six error types in MT are identified through a fine-grained manual evaluation based on materials extracted from biomedical journals, biomedical books and so on (Liu et al., 2021).

However, studies focusing on the linguistic characteristics of GenAIT and HT within the realm of scientific texts are still scant. A comprehensive analysis of the linguistic elements in GenAIT and HT of scientific content is imperative to identify their respective strengths and weaknesses. Moreover, such an examination can facilitate the realisation of an effective symbiosis between the two.

Method

Research questions

To address the aforementioned gaps and investigate practical strategies for sustaining a productive interplay between GenAIT and HT, we design an empirical study engaging 19 Master-of-Translation-and-Interpreting (MTI) students. All the participants had been admitted to the same MTI programme in 2022 and none of them had prior experience in the translation industry before this task, indicating they were all student translators devoid of professional translation experience. GenAIT is generated by ChatGPT 3.5, a representative GenAI platform. Different linguistic features in lexical and syntactic level are supposed to be explored. The exclusion of textual level parameters from this study lies in two main reasons. Firstly, the limit of time for student translators to finish their translation tasks leads to the relatively short length of the source texts (ST) designed, and thereby the short length of target texts (TT) correspondingly. Limited text length results in unremarkable textual features, and thus textual features are not investigated as a research dimension in the present study. In addition, previous studies on the comparison of GenAIT-HT linguistic features have mainly focused on lexical and syntactic levels of the translated texts. Adhering to these two levels, we hope that our findings can verify those from previous literature more accordingly. This study attempts to answer the following two questions: In the translated scientific texts,

  1. (1)

    What are the lexical and syntactic features presented in HT and GenAIT respectively?

  2. (2)

    Based on these features, how to sustain the effective interaction between GenAIT and HT in scientific text translation?

Research design

Firstly, scientific texts’ translations generated by ChatGPT 3.5 and done by student translators are respectively collected. Then, a two-level analytical framework containing lexical and syntactic levels is built. Further, instruments of Wordless 2.3.0, CorpusWordParser 3.0, CLAWS-5 and AntConc 4.1.2 are used to collect data. Lastly, analysis of quantitative data of the linguistic features and qualitative analysis are implemented.

Materials and participants

This study involves 19 participants who are MTI students in a normal university of Southern China. All participants have signed informed consent voluntarily. Subsequently, they are tasked with two translation assignments, one from English to Chinese, and the other from Chinese to English, without obligatory sequence imposed on the completion of these tasks. The present study only focuses on the translation from English to Chinese as it is traditionally deemed that translators maintain predominance in translating into their mother tongue (Newmark, 1981), and we tried to summarise features of HT from the language direction that translators are best at, which can avoid interference in the results due to their incompetence. Participants translate the text on computers without any online and offline auxiliary means, except for printed dictionaries. All participants have been told that their translations will be graded, thereby encouraging each participant to approach the task with diligence and finish it at their best.

The source texts chosen in this study are extracted from the teaching material of scientific and technological translation, published in authoritative publishing houses in China, ensuring the topic and difficulty of the ST are suitable for the participants. Before undertaking this translation task, none of the participants have used the aforementioned materials employed in this study, as it has not been incorporated into any of their coursework nor listed in the Recommended Books. Length of the text is identified in accordance with the workload-time ratio of the Level-2 of China Accreditation Test for Translators and Interpreters (CATTI), an authentic national translation proficiency test for which MTI students may target. CATTI Level-2 requires examinees to finish translating 500–600 English words into Chinese per hour. Given the time constraint of 1.5 h for the participants in finishing two translation tasks (one from English to Chinese and the other from Chinese to English), the English ST is designed to be 243 words in length. Thus, the 19 pieces of HT analysed in this study are English-to-Chinese translations completed by MTI students without the assistance of any electronic tools, and none of the HT texts have undergone post-editing.

The GenAIT collected in the present study is generated by ChatGPT 3.5. ChatGPT, as a large language model, requires prompts as guidance to function its translation ability (Jiao et al., 2023). The prompt utilised in our study is enlightened by Wang et al. (2023), which demonstrates the importance of context-aware prompts. In their study, the prompt “Translating this document from English to Chinese” is used to generate document translation. Consequently, in our study, the prompt “please translate the following passage into Chinese” has been employed. This prompt serves to inform ChatGPT that the ST is a passage and that the target language is supposed to be Chinese. The researchers input the prompt (operated on June 7th, 2023) into ChatGPT3.5 and paste the same English ST as that is translated by the 19 human translators, and thereupon the GenAIT is generated and collected. Additionally, this study includes a Reference Translated Text (RTT) for comparative analysis. This RTT is derived from the same source material as the original text (ST), ensuring consistency in content for a direct comparison.

Thus, the final corpus including ST, RTT, GenAIT and HT is collected. The general information is provided in Table 1 (The length is calculated by words in Microsoft Word 2019).

Table 1 General information of the final corpus.

Framework of data analysis

At lexical level, the following parameters are observed: tokens, types, standardised type/token ratio (STTR), terminology and part of speech (POS). Token is defined as the total count of words within a text, which essentially represents the frequency of word occurrences (Kang and Yu, 2011; Bieswanger and Becker, 2017). Type, on the other hand, refers to the unique words that remain after eliminating repeated occurrences, essentially denoting non-redundant tokens (Kang and Yu, 2011; Liang et al., 2010). The Type-Token Ratio (TTR) is computed by dividing the total number of words in a text by the count of unique word types (Csomay and Crawford, 2024). The Standardised Type-Token Ratio (STTR) serves as an index to quantify lexical diversity within corpora (Calzada Pérez, 2017). According to Liang et al. (2010), STTR calculation involves a two-step process: initially determining the type/token ratio for each segment of 1000 words (adapted to 100 words in this study due to the brevity of the texts), followed by averaging these ratios to ascertain the STTR. In the present study, the STTR is automatically calculated through Wordless (Ye, 2023). Terminology is selected as a parameter because “scientific texts usually contain a variety of proper nouns” (Fu and Tang, 2012, p.2). Additionally, part of speech (POS) fulfills dual roles in both lexical and syntactic analyses. Given that our study prioritises word classification over sentence function when discussing the POS allocation, we categorise “part of speech” as a lexical feature. Based on the observation of the ST, nouns (N.) and adjectives (Adj.) accounts for a relatively large proportion, thus, they deserve to be noticed. Besides, the translation of numerals (Num.) is essential when a scientific text contains statistics. Given that using conjunctions (Conj.), as a type of connectors, is a major approach to enhance logical relations among clauses in scientific English material (Fu and Tang, 2012), conjunctions are included in the present study. Therefore, N., Adj., Num. and Conj. are chosen as sub-parameters of POS.

At syntactic level, the count of sentences (CS) and average length of a sentence (here as sentence length in tokens, SLT) are typical parameters to analyse syntactic features (Calzada Pérez, 2017). Besides, passive voice (PV) and subordinate clause, frequently used in scientific texts, are also included in the metrics. PV allows readers “to skim quickly and see the main points by looking at the subject of each sentence” (Biber and Conrad, 2009, p.123). Two types of PV structures, “be + V_ed” and “V_ed + by” (without “be”), are chosen in the analytical framework, for the former is an explicit structure and the latter can be regarded an implicit one. They are distinguished by whether the be-verb, as a marker of PV, appears in the structure. In this study, the two examples highlighted are “is called” and “led by”. Subordinate clause is included in the framework as “English scientific texts often involve various relations, premises and designated occasions, and clauses are abundantly existed” (Fu and Tang, 2012, p.86). Those adjective phrases used as attributes could be recognised as an abbreviation of attributive clause so as to strengthen the compact of the sentence structure (Randolph et al., 1985). Object clause (OC) and attributive clause are particularly selected for two reasons. First, it has been identified that the clause type with the simple structure, “a nominal group + a verbal group + the other nominal group”, is frequently found in scientific text (Choi, 2013). This pattern is typical for OC. Second, the attributive clause accounts for a large proportion of long sentences in scientific English texts (Huang and Duan, 2021). Thus, an integrated two-level analytical framework is framed (Fig. 1.)

Fig. 1
figure 1

Framework of data analysis.

Tools and instruments

Four main tools are utilised for the extraction and analysis of the parameters listed in the analytical framework (Table 2). Wordless 2.3.0, developed by Lei Ye, is used to collect token, type, STTR, CS and SLT of the ST, RTT, GenAIT and HT. Wordless is embedded the function of detecting and coding various languages so that users neither need to manually set up the language coding of each file nor worry about the incompatibility due to unclear coding systems. CLAWS-5, a free online tool developed by Lancaster University, is used to tag POS of the ST, and CorpusWordParser 3.0 is used for word segmentation and tagging of Chinese text. AntConc 4.1.2 is utilised for the retrieval of POS. Thus, token, type, STTR, CS, SLT and POS are automatically collected while garbled codes or missing tags have been all corrected manually.

Table 2 Parameters and methods/tools applied.

Researchers manually compile other parameters, including terminology, PV and subordinate clauses. The selection of terminologies from the ST is predicated upon their occurrences and their intrinsic significance to the context of the text. Subsequently, the translations corresponding to the selected terminologies from the RTT, GenAIT and HT are meticulously extracted. In all, ten cases of terminologies, two cases of PV structures, one object clause and one attributive clause are collected for analysis.

Results

Lexical features

Tokens, types and STTR

As there are 19 human translators in total, the mean value of data in human translations, HT (mean) is adopted. Additionally, we present tokens, types, and STTR of all HT in order to provide a whole picture of HT. HT (mean) outnumbers GenAIT in tokens and types while the STTRs of GenAIT and HT (mean) are nearly the same (0.659 and 0.650). Specifically, all HTs, except HT05 (205) and HT15 (206), have more tokens than that of GenAIT. Besides, all HTs, except HT05 (133), have more types than that of GenAIT. However, around 68.42% of HTs have lower STTR than that of GenAIT (Table 3 and Figs. 2 and 3).

Fig. 2
figure 2

Tokens and types of the 19 pieces of HT.

Table 3 Tokens, types, and STTR descriptions (ST, RTT, GenAIT and HTs).
Fig. 3
figure 3

STTR of the 19 pieces of HT.

Table 3 reveals that tokens of the RTT are fewer than that of the ST, aligning with Baker’s (1993) postulated translation universal of simplification. The STTR is quantified through the type/token ratio, which endeavours to minimise the impact of functional words to the greatest extent. As such, STTR is employed as an indicator of lexical richness (Shi and Lei, 2021). Figure 3 shows that HT16 exhibits the minimum STTR value (0.583), signifying its potential inferiority in lexical richness when juxtaposed with that of GenAIT (0.66).

Translation of terminologies

GenAIT renders five out of ten terminologies (hard discs, magnetic fields, current, reading and writing data, femtosecond) into the same target expressions as RTT. In HTs, a superior congruency is observed: six out of ten terms (retrieval, hard discs, IT, magnetic fields, current, reading and writing data) are translated consistently with RTT, as delineated in Table 4.

Table 4 Ten selected terms and their translations in RTT, GenAIT and HT.

In general, HT exhibits advantages in translating abbreviated terms. There are 10 MTI students translating the term “IT” into “信息技术” (information technology) while GenAIT fails to do so. “IT” could be “互联网技术” (Internet technology), or “信息技术” (Information technology). Although there are overlaps between these two fields, “IT” in the ST means “information technology”, which justifies human translators outperforms GenAIT in dealing with abbreviated terms in this case.

However, GenAIT also performs its strength in the translation of “current”. The term “current” is translated into “现在” (right now) and “屏幕” (screen) in HT01 and HT03 respectively instead of the correct correspondence “电流” (current, related to electricity). Despite the term “current” encompassing all aforementioned meanings, the two MTI student translators fail to pick the correct one in the specific context yet ChatGPT 3.5 manages to do so.

POS

Table 5 illustrates that there are more nouns, adjectives, numerals, and conjunctions in HT (mean) than those of GenAIT. Further, except numerals, both GenAIT and HT present fewer counts of other three POS types than ST.

Table 5 POS allocation in different types of translations.

It also reveals that N. takes the largest proportion of all four POS types in ST (68.50%), RTT (76.03%), GenAIT (75.73%) and HT (mean) (73.95%). Furthermore, HT (mean) has more N. than GenAIT. For the quantities of Adj., Num. and Conj., the differences between GenAIT and HT (mean) are smaller. What’s more, the counts of Adj. in HT (mean) (10.68) and GenAIT (8) are within the scope of RTT (7) and ST (19). Num. sees the same feature. But for Conj., both GenAIT and HT (mean) presents fewer counts than ST and RTT.

Figure 4 shows that HT 16 and HT 02 contains 101 and 100 nouns respectively, ranked the first and second. In contrast, HT 15 has the fewest nouns (61). As for Num., the difference in all 19 pieces of HT is not statistically significant, ranges from five to eleven. 84.21% of HTs have 10 or more conjunctions. Additionally, 13 out of all the 19 pieces of HT contains over 10 conjunctions, exceeding the quantity of Conj. in GenAIT.

Fig. 4
figure 4

POS of all HTs.

Syntactic features

Count of sentences and sentence length in tokens

The SLTs of RTT, GenAIT and HT (mean) are shorter than that of ST while the CSs of the ST, RTT, GenAIT and HT (mean) are diversified. Mean value of the CS in HT is higher than that of GenAIT, demonstrating that there are more sentences on average in HT (Table 6).

Table 6 Descriptive statistics of CS and SLT.

Nevertheless, there are still some exceptions. For example, HT03 has the fewest CS (8) while the largest SLT (29.625), which may indicate that the student cannot segment long sentences. For instance, the translated sentence “但通过自旋电子读取和写入数据却受到了发展相对滞后的电磁传感器的阻碍”(HT03, corresponding ST: But reading and writing data through spintronics has been hampered by the relative slowness of magnetic sensors.) is advised to be divided into several short sentences in Chinese.

Passive Voice (PV)

Given that the proportion of PV in scientific writing might stabilise at about 30% (Ping Alvin, 2014), it deserves further explorations. This study focuses on two basic structures of PV: “be + V_ed”, an explicit one, and “V_ed + by” (without “be”), an implicit one.

As a fundamental PV structure, “be + V_ed” has been translated into different forms. RTT and HTs (except HT17) shift the PV into active voice (AV) in Chinese while GenAIT reserves the PV structure. “被” (the bei passive) is a widely used PV structure in Chinese. Nevertheless, rather than employing the bei passive construction, the OSV (Object-Subject-Verb) sequence is identified as the most similar counterpart to the English PV structure (Li, 2005), mainly because the bei passive is typically associated with adverse situations (Tsai, 1972; Li and Thompson, 1981). The example in Table 7 reflects that GenAIT should enhance its ability to recognise and properly translate English PV into Chinese. The student who does HT17 in Table 7 is also advised to pay more attention to the proper use of voice in Chinese.

Table 7 Extracts of “be + V_ed” translated texts.

Different from the explicit form of English PV “be + V_ed”, the structure of “V_ed + by” could be considered as an implicit one due to the lack of “be” as the sign. This example, if presented in its full form, could be “a team (which is) led by Jean-Yves Bigot”. In the example of Table 8, only HT07 shifts the PV into Chinese AV in the form of “you (have) structure” while GenAIT, like other HTs, seems to ignore the passive logical relation.

Table 8 Extracts of “V_ed+by” translated texts.

Subordinate clause

Due to the frequent use and large proportion of OC and attributive clause in scientific text (Randolph et al., 1985; Choi, 2013), this study selects one case for either type for analysis.

Object clause (OC)

In translating OC with “that” as the subordinating conjunction, RTT and most HTs (except HT01 and HT19) divide the complex sentence into two clauses with a comma while GenAIT does not (See Table 9). A plausible explanation might be that the source text lacks a comma, leading ChatGPT 3.5 to strictly adhere to the initial construction without alteration.

Table 9 Extracts of OC translated texts.

Attributive Clause (AC)

The integrated sentence selected from ST for analysis is “The research builds on achievements that earned the 2007 Nobel Physics Prize for Albert Fert of France and Peter Grünberg of Germany, who ushered in a revolution in miniaturised storage in the 1990s.” The restrictive clause is chosen for the comparative analysis (See Table 10).

Table 10 Extracts of AC (restrictive) translated texts.

The restrictive clause forms a cause-and-effect relationship with the indefinite relative clause. The structure in GenAIT is similar to that of RTT, applying “因此” (thus) as a sign to show the connection between two clauses. On the contrary, some human translators rigidly follow the original structure (like HT06, whose way of connecting the two clauses is simply adding a comma), lacking logical connectors to manifest the relationship between two clauses.

Discussion

Lexical features

Lexical features in HT

Most HTs present larger number of tokens and types while lower STTR than those of GenAIT. Besides, compared with ST, there are 12 and 8 HTs using fewer tokens and types respectively. There are multiple factors to consider. Firstly, it could be the translation universal of simplification that results in over half (12/19) of HTs having fewer tokens. However, as for the six HTs (Chinese) that outnumber ST (English) in terms of tokens, it could be justified by that the information quantity of 1.5 Chinese characters is equivalent to one English word (Sun et al., 1985). Secondly, the linguistic differences between Chinese and English could also play a role. As those functional words (like “the”, “of”) are more frequently used in English, it affects the number of types and the STTR. Thirdly, the familiarity of individuals with the characteristics of scientific texts may also influence their translation results. Despite all 19 students being in the same class, it is inevitable that there will be disparity among them concerning their exposure to this specific genre.

In relation to the accuracy of terminology translation, human translations (HTs) demonstrate a lower proficiency. This may be attributed to students’ occasional carelessness or anxiety, given that they are working on a time-constrained translation task.

For part of speech (POS) analysis, the occurrence of nouns in HTs surpasses that of adjectives, numerals, and conjunctions, aligning with the expected lexical density and informational focus of scientific texts. Additionally, the standard deviation (Sd.) of numerals is the least, underscoring numerals’ direct correlation with accuracy. The slight variation in numeral count across HTs may stem from differences in typing and segmentation practices. Furthermore, the relatively smaller Sd. of conjunctions suggests that the majority of students have effectively prioritised logical connectivity in their translations, adhering to findings that the incorporation of more conjunctions can enhance textual cohesion (Dalimunte, 2013).

Lexical features in GenAIT

Compared to HTs, GenAIT has fewer tokens and types, higher STTR, higher accuracy in terminology translation, lower word frequency of nouns, adjectives, numerals and conjunctions. These features are closely related to the working mechanism of ChatGPT. Trained on approximately 45 terabytes of text data at a multimillion-dollar cost (Barreto et al., 2023), GPT-3 sets the stage for its successor, ChatGPT 3.5. This subsequent model inherits GPT-3’s extensive dataset but expands its capabilities with enhanced conversational interfaces that closely mimic human-like interactions (Rane, 2023), potentially augmenting its processing flexibility. Consequently, ChatGPT 3.5 might demonstrate an adeptness in adopting the stylistic features of scientific texts, which typically favour a higher frequency of content-driven words over functional ones. This attribute, combined with the nuanced training in linguistic variability, may explain the incremented STTR observed in ChatGPT 3.5’s translations as compared to those produced by student translators. ChatGPT, to some extent, can be deemed as a unified multilingual machine translation model (Jiao et al., 2023). Although its architecture differs from the traditional encoder-decoder blueprint of NMT, GPT models, including ChatGPT 3.5, demonstrate promising translation capabilities (Hendy et al., 2023). This proficiency may be attributed to large language models’ ability to abstract high-level cognitive capabilities through effective prompting in human-like conversations (Kojima et al., 2023), consequently enhancing the quality of text generated by ChatGPT. Furthermore, as a higher STTR signifies higher lexical diversity, it can be argued that ChatGPT 3.5 yields translations exhibiting broader vocabulary variance compared with student translators in this study. This finding contrasts with Vanmassenhove et al. (2019) which shows a decrease in lexical diversity with MT. Nevertheless, it’s pertinent to note that the system utilised in our study is different from the MT assessed by Vanmassenhove et al. (2019).

As for the count of POS, except the frequency of nouns, counts of the other three POS in GenAIT are similar to those of ST. The reason for the higher frequency of nouns might be that ChatGPT 3.5 transforms many English verbs into Chinese nouns. For instance, the verb “output” is translated into “输出” (production).

Syntactic features

Syntactic features in HT

HTs exhibit a larger CS coupled with shorter average sentence lengths in comparison to GenAIT. Student translators’ translation training might explain this result. When Chinese students embark on English-Chinese translation tasks, on translating long English sentences in particular, they tend to split them into several short sentences. As one of common translation strategies, it has been consolidated through abundant training, and students tend to apply this strategy to practice.

Regarding the translation of PV, most MTI students transform PV into AV. Specifically, all student translators employe the AV structure in Chinese when translating the implicit PV structure “V_ed + by”. This might be associated with a prevalent use of the “有(you, have)-structure” in Chinese language contexts. However, whether to transform PV into AV is influenced by various reasons. For example, a study of English-German reveals that translators mainly use PV to translate AV but editors change most of PV back to AV (Bisiada, 2019).

The translation of OC and AC reflects student translators’ advantages in segmenting long sentences. OC “most frequently appear after verbs like say, believe, think, decide, propose… and the verbs that name some sort of mental activity” (Podoliuc, 2010, p.51). These complements are instrumental in constructing complex sentences. When encountering such complex constructions, student translators opt to divide them into two distinct parts using the connector “that”.

Syntactic features in GenAIT

The GenAIT has fewer sentences (9) than HTs’ average (11.68) but larger SLT (23.11) than that of HTs (20.71). To some degree, the more tokens in a sentence, the lower readability the sentence is. Although readability should be assessed from various perspectives, including syntax, morphology, cohesion, discourse structure and subject matter (François and Fairon, 2012; Denning et al., 2016; Arfé et al., 2018), calculating average sentence length or syllables per word is still a prominent strategy (Azpiazu and Pera, 2019). As there are more tokens in sentences of GenAIT than those of HTs on average, readers might need more efforts when reading the GenAIT.

In relation to translations of PV in GenAIT, the explicit PV structure “be + V_ed” is remained while the passive relation marked by the implicit “V_ed + by” seems to be ignored. The “be + V_ed” structure unambiguously signals that the action is executed by someone, thereby prompting ChatGPT to preserve the passive construction rather than converting the PV to AV. However, with more implicit structures, ChatGPT may struggle to detect the passive construction, leading to potential oversight.

Regarding subordinate clause, GenAIT maintains object complements as intact, whereas most human translators elect to divide them. ChatGPT 3.5 employs a connector to seamlessly join the restrictive attributive clause with the indefinite relative clause, thereby enhancing the clarity of the source text for readers. Similarly, translators of HT01, HT03 and HT19 do not split the OC either. Although NMT systems generally exhibit limitations in generating coherent logical information (Wang and Feng, 2021), the use of connectors by GenAIT demonstrates its partial competency in constructing logical connections. Notably, the ST lacks conjunctions such as “so” or “therefore” to bridge the restrictive attributive clause and the indefinite relative clause. In contrast, ChatGPT 3.5 introduces “因此” (thus) to forge this link.

The ways to sustain the effective GenAIT-HT interaction

Based on above findings, harnessing the lexical and syntactic advantages of both the GenAIT and HT could be an effective interaction between them to improve translation quality. GenAIT offers human translators a wide range of vocabulary and a wealth of background knowledge across various fields. Human translators can effectively leverage words with different parts of speech, allowing them to combine the strengths of both sides and enhance the lexical accuracy of the translated text. For instance, when human translators encounter engineering terminologies that are unfamiliar to them, they could refer to the knowledge provided by GenAI platform to achieve more precise translations. Conversely, when human translators identify inappropriate word usage in GenAIT, such as words belonging to improper parts of speech, they could make adjustments to enhance the accuracy of the translated text.

At syntactic level, the effective interaction of GenAIT and HT can enhance the fluency of translated texts. In this study, student translators excel in deconstructing long and complex sentences. They can segment a lengthy passage into manageable sections for sequential input into GenAI. This strategy facilitates the improvement of GenAI’s output by catering to its preference for processing shorter and simpler sentences. For example, in the case of translating a long poem written in archaic English, a task that remains a formidable challenge for current AI technologies, human translators can strategically divide it into segments based on its meaning and context. Concurrently, GenAI’s capabilities, like the automatic insertion of logical connectors in this research, can further assist human translators in augmenting the fluency of their target texts.

Implications

Drawing from the findings of this research, we propose several implications for translator training, language service providers, and future development of GenAI and human translation. In terms of translator training, educators are encouraged to guide students in effectively utilising GenAI systems to improve their translation quality. For instance, given that GenAI systems like ChatGPT in this study display a higher STTR which indicates richer lexical diversity, it is advisable for trainers to foster students’ proficiency in exploiting GenAI tools to diversify their lexical choices. Additionally, given ChatGPT’s superior accuracy of translating numerals in this research, it is suggested that student translators could employ GenAI systems to verify numerical translations and thus enhance both the efficiency and quality of their work. Nonetheless, translators must be vigilant in scrutinising the input material processed through GenAI platforms, especially long sentences, as GenAI is found to be less effective in dealing with long sentences.

For language service providers (LSPs), it is highly recommended to deepen their understanding and optimise the efficient utilisation of GenAI technologies. For example, in the present study, ChatGPT 3.5 demonstrates advanced lexical diversity and effective clause connection, which could substantially enhance efficiency within LSP environments. Leveraging these advantages of GenAI has the potential to result in providing more sophisticated and nuanced language services. Besides, in September 2023, Lionbridge, a leading language service provider, published its Machine Translation Report, highlighting two pivotal insights. Firstly, companies that successfully integrate AI will likely excel in the evolving digital landscape. Secondly, large language models like ChatGPT could emerge as a potential successor to neural NMT as a new standard in the field. Consequently, it is recommended that LSPs invest in the development of GenAI and in equipping their workforce with AI-enhanced skills to further boost productivity.

The future landscape for GenAI and human translators is supposed to be a harmonious coexistence with a bright outlook. It’s undeniable that GenAIT has shown notable advantages in certain aspects. Nonetheless, widespread consensus maintains that human translators are irreplaceable, as GenAI systems like ChatGPT have yet to adeptly capture uniquely human imaginative skills, such as creativity and emotional intelligence (Xu and Yang, 2022; Hu and Li, 2016). Moreover, this study also underscores GenAIT’s limitations, including difficulties with processing lengthy sentences. As such, GenAI should not be viewed as a threat but rather as a valuable tool that enhances our capabilities. To utilise GenAI more effectively, there’s a growing recognition of the need for “prompt engineering”, a skill that demands specialised knowledge and practice (Oppenlaender et al., 2023).

Conclusion

Major findings

For the first research question, HT and GenAIT present distinguished lexical and syntactic features. In lexical level, HT (mean) has larger number of tokens and types but lower STTR than those of GenAIT; GenAIT has higher accuracy in term translation, fewer nouns, adjectives, numerals and conjunctions. In syntactic level, HT (mean) has more sentences and smaller SLT on average. Most HTs (except HT17) transforms the explicit PV structure into AV in Chinese while GenAIT fails to do so. The implicit PV structure is shifted into AV in all HTs while ChatGPT 3.5 seems to ignore the passive relation. Though ChatGPT 3.5 fails to split the complex sentence into simple sentences, it manifests the cause-and-effect relation of the two attributive clauses by adding a connector “因此” (thus) as some student translators do. For the second research question, combining advantages of both sides is proposed to be a method of sustaining the effective interaction between GenAIT and HT.

Limitations and future directions

Different from the previous study on perceptions and attitudes (Liu et al., 2022), this study provided objective evidence for the complementarities between GenAIT and HT. Thus, this research represents a modest progress. Besides, based on findings, this study points out specific aspects where GenAIT and HT could benefit each other. Implications for translator training, language service provision and the development of GenAIT and HT are also demonstrated. However, some limitations must be noted. Although parameters in the analytical framework are all justified, there are many other dimensions deserved to be investigated. For example, if the material is extracted from a story-telling book, probably verbs should be added into the POS in the framework as verbs might frequently appear in such texts. In addition, this study ignores textual level due to it is not prominent in the selected material. Future researchers are supposed to explore this dimension and involve a larger number of participants. Besides, other language pairs should be researched to see if these findings are replicable in different language contexts. As for other text genres, legal text, literature and many other genres are also deserved to be explored for a more extensive application.