Introduction

Identifying systematic patterns that distinguish translated texts from non-translated ones has long been a central topic in corpus-based translation studies. This line of enquiry has been conceptually underpinned by the notion of ‘translation universals’ (Baker, 1993), offering valuable insights into the cognitive, linguistic and social dynamics that shape translated language. According to Baker’s (1993, pp. 243–246) definition, translation universals are features “which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems”. To date, a number of such universal features have been proposed, supported or contested, including explicitation (Blum-Kulka, 1986; Kajzer-Wietrzny, 2015; Klaudy and Károly, 2005; Marco, 2012, 2018; Murtisari, 2016; Zhang et al., 2020; Zufferey and Cartoni, 2014), simplification (Kajzer-Wietrzny, 2015; Laviosa, 2002; Liu and Afzaal, 2021; Liu et al., 2023; Niu and Jiang, 2024; Xiao, 2010), normalisation (Delaere et al., 2012; Delaere and De Sutter, 2013; Lapshinova-Koltunski, 2015, 2022; Redelinghuys and Kruger, 2015), shining-through (Cappelle and Loock, 2013, 2017; Evert and Neumann, 2017; Lapshinova-Koltunski, 2022; Teich, 2003; Xiao, 2010) and unique item (Kujamäki, 2004; Rabadán et al., 2009; Tirkkonen-Condit, 2004).

Among these, explicitation is arguably the most widely investigated and consistently reported phenomenon (Chesterman, 2011). Blum-Kulka (1986) defined explicitation as the tendency to use explicit cohesive devices even when unnecessary, and Baker (1996, p. 180) contended that translated texts are inclined to “spell things out rather than leave them implicit”. As a result, translated texts may display higher levels of textual cohesion, making them statistically distinguishable from non-translated texts on the basis of features related to cohesion and coherence (Øverås, 1998). Despite sustained scholarly interest, several issues remain in the research on explicitation, especially in terms of scope and methodology.

First, most existing studies have concentrated on explicitation in human translation, but this tendency has been less explored in machine translation. Only in recent years has machine translation begun to attract scholarly scrutiny. Lapshinova-Koltunski (2015) compared the occurrences of conjunctions, proportion of pro-nominal phrases and general nouns between phrase-based and statistical machine translation and human translation, finding that human translation is characterised by less connectivity in coherence than machine translation. In contrast, Krüger (2020) focused on three types of explicitation shifts, namely lexical insertion, lexical specification and relationship specification in human and DeepL translated texts, revealing more explicitation in human translation in all of these categories. Jiang and Niu (2022) analysed the discourse coherence in Google and DeepL translated essays through the comparison of connectives, latent semantic analysis and situation model, finding that human translation and machine translation both use more connectives, compared to the original language, but human translation makes use of more deep cohesion than machine translation. These studies collectively point to the distinctive characteristic of machine translation as a translation variety, underscoring the need for more targeted and systematic exploration.

Second, research on explicitation has often overlooked register variation, limiting the generalisability of its conclusions. For instance, Marco (2018) analysed the occurrences of connectives in translated and non-translated Catalan literary texts, finding that there are no significant differences in the overall frequency of connectives. Similarly, Song (2022) focused on the use of connectives in the translated Chinese version of The Lord of Rings, but he discovered that explicitation can be both found in these two texts. Zhang et al. (2020), on the other hand, investigated the frequency of personal pronouns in Chinese children’s literature translated from English, demonstrating that personal pronouns are used more frequently in translated texts than original children’s books. Nevertheless, these studies focused on literary texts, and the results may not be generalised to non-literary texts, making it difficult to determine whether observed features are register-specific or translation-specific. As De Sutter and Lefer (2020, p. 6) and Evert and Neumann (2017, p. 50) argued, failing to account for register can obscure our understanding of translational phenomena. The register should be treated as an integral factor in shaping the linguistic characteristics of translations (Kruger, 2019).

Third, a large proportion of earlier research adopted a univariate approach, examining linguistic features in isolation. While such studies have yielded important insights, they often lead to fragmented or contradictory conclusions. For instance, in the comparison of two indices of lexical density, Xiao (2010) found that translated Chinese differs significantly from non-translated Chinese based on the ratio of content words versus the total number of words, but there is no significant difference if measured by the standard type/token ratio. However, the characteristics of translated texts cannot be determined or reflected by the observation of a single feature. On the contrary, the tendencies in translated texts, whether they be simplification, explicitation, normalisation and others, are supposed to be expressed through the combination of mixed features in a way similar to genetic information. Evert and Neumann (2017) also suggested that the interactions of different factors have been rarely examined, and recommended that multivariate techniques should be adopted to examine the systematic and structural properties of translated texts. Therefore, it remains unclear how the linguistic properties of translations are influenced by spatial, temporal, technological, cognitive and many other factors solely based on univariate analysis (De Sutter and Lefer, 2020).

Against this backdrop, the present study aims to investigate how translation variety and register divergence jointly shape patterns of cohesion and coherence in English texts. To this end, three multivariate techniques, including principal component analysis (PCA), flexible discriminant analysis (FDA) and Permutational Multivariate Analysis of Variance (PERMANOVA), are employed to capture the multidimensional nature of linguistic variation. The study is guided by the following three research questions: (i) Are translated and non-translated texts characterised by different patterns in cohesion and coherence? (ii) Are these characteristics consistently observable across different registers? (iii) Are these differences more strongly influenced by translation variety, register variation or their interaction?

By addressing these questions, the study makes several theoretical, methodological and practical contributions to the research on translation universals and machine translation. Theoretically, it advances our understanding of explicitation by situating it within both technological and functional contexts. Methodologically, it demonstrates the value of multivariate analysis in revealing latent textual structures that are often missed by univariate methods. Practically, it offers insights into how machine translation aligns with or diverges from human translation norms, thereby informing both translator training and machine translation system development. In an era where machine-generated texts increasingly permeate professional and everyday communication, understanding the cohesion and coherence profiles of these outputs is crucial. This study thus not only interrogates generalised claims about translation universals but also calls for more nuanced, context-sensitive approaches to characterising translated language.

Theoretical underpinnings

This section offers an overview of fundamental concepts in this work. Section “Cohesion and coherence” introduces the two concepts of cohesion and coherence, while section “The concept of explicitation revisited” critically examines explicitation as a frequently explored linguistic characteristic in research on translation universals. Section “The hypothesis of risk-aversion and algorithmic bias” discusses two theoretical models explaining the explicitation tendency.

Cohesion and coherence

Cohesion and coherence are two notions pertaining to the connectedness of a discourse, whether it be in spoken or written form, but they are different in certain aspects (Bublitz, 2011, p. 37). Halliday and Hasan (1976, p. 4) defined cohesion as a semantic item linking the meaning of the text and creating context. It indicates that semantic relations between the current item and the previous or subsequent ones exist via lexis or structures. In contrast, coherence is believed to be “a cognitive category that depends on the language user’s interpretation and is not an invariant property of discourse or text” (Bublitz, 2011, p. 38). In other words, coherence can be understood as a construct reflecting how well the receiver comprehends a text, and needs to be assessed through asking readers questions and evaluating how much information they obtain from the text (McNamara et al., 2014). Due to its subjective and often intangible nature, coherence remains a concept that is underexplored, marked by its complexity and ambiguity (Sinclair, 1991, p. 102). The distinction, then, lies in the fact that while cohesion is a surface-level feature that can be observed and measured in the discourse itself, coherence is a mental construct that resides in the mind of the reader or listener (Carrell, 1982; Givon, 1995; Graesser et al., 2004).

Halliday and Hasan (1976) proposed a taxonomy of cohesive devices, comprising five principal categories: reference, conjunction, substitution, ellipsis and lexical cohesion. To be more specific, reference is often situational and relies on linguistic cues to help readers link propositions, clauses or sentences within their mental representation of the text (Halliday and Hasan, 1976; McNamara and Kintsch, 1996). For instance, in Example (1), the personal pronoun he refers to Mahmoud el Zaki, creating a clear referential tie between the two clauses. Similarly, in Example (2), it refers back to the song, maintaining textual continuity. Conjunctions can be categorised based on the relationships they signal, including additive (and, furthermore, in addition, etc.), adversative/contrastive (however, but, in contrast, etc.), causal (because, so, therefore, etc.) and temporal (e.g. then, next, finally, etc.). By way of illustration, Example (2) indicates the use of because to link two clauses and explain the rationale behind the song selection. Substitution involves replacing an element with another, often realised through noun phrases (e.g. one), verb phrases (e.g. did/do) and clauses (e.g. so). For instance, in Example (2), one substitutes for the song, and do replaces the verb sing, and in Example (3), so is used to replace the sentence that I would call that a whirlwind. Ellipsis, often regarded as zero substitution, entails the omission of an element that is recoverable from context. In Example (4), women is omitted after two more, and the reader is expected to infer it from the earlier text. Finally, lexical cohesion is based on the identity or semantic similarity of reference across items, realised through repetition, synonyms, superordinates, general terms or collocations. Example (5) illustrates this with repeated items such as television, reading and books, and semantically related words like event and activity.

  1. (1)

    His name was Mahmoud el Zaki and he was one of the Parquet’s rising stars. (FLOB, L09)

  2. (2)

    I was prepared to do this song because it is one that I like. (FLOB, E35)

  3. (3)

    Would you call that a whirlwind? I don’t think so. I think by this age I know what I want”. (FLOB, A10)

  4. (4)

    Smith, of Marion Road, Charlton, was originally charged with sex attacks on eight women and robbing two more, from 1988 to 1990. (FLOB, A13)

  5. (5)

    It is rare to find parents and educators actively promoting a television series (other than the specifically didactic ‘schools’ broadcasts) and treating it as a cultural event. This reflects a deeply rooted ambivalence about television as entertainment, which is directly linked to attitudes surrounding children’s reading. Watching television is inevitably regarded as an activity less worthwhile than reading, and for long has been accused of seducing children away from books. (FLOB, G40)

  6. (6)

    During the battle, it was Templars who directed the devastating arrow power that broke the Scottish spear schiltroms, and it was Templar Knights who led the final cavalry charge that destroyed Wallace’s army. (FLOB, N25)

McNamara et al. (2014, p. 63) further elaborated on lexical cohesion by identifying five principal forms of lexical referential overlap: nouns, pronouns, arguments, stems and content words. Noun overlap occurs when the same nouns are repeated across sentences, reinforcing topical continuity. Pronoun overlap involves the consistent use of pronouns with matching gender and number, ensuring referential clarity. Argument overlap encompasses two scenarios: the repetition of a noun in singular or plural form across sentences or the use of matching personal pronouns to maintain reference to the same entity. Stem overlap captures shared lemmas across varied grammatical forms, reflecting semantic continuity even when word forms differ. Finally, content word overlap measures the proportion of shared content words between sentence pairs, highlighting lexical consistency.

While cohesion can be systematically analysed using observable grammatical markers or lexical repetition, coherence is less readily accessible from the discourse surface. Nevertheless, computational tools such as Coh-Metrix (McNamara et al., 2014) offer approximate indicators of coherence, including semantic similarity and situation models. Coherence emerges at the semantic level and can be computationally modelled using techniques such as Latent Semantic Analysis (LSA) (Landauer and Dutnais, 1997), Word2Vec (Mikolov et al., 2013) or Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., 2019). This study employs LSA via Coh-Metrix because it is an integral and validated component of the Coh-Metrix tool, which forms the basis of the coherence measurements. LSA operates on the assumption that words acquire meaning from the contexts in which they occur (McNamara et al., 2014, p. 66), representing deeper semantic relationships even when words do not appear in close proximity. For instance, in Example (5), the word school is situated within a semantic field that includes parents, educators and children. Similarly, in Example (6), the word battle co-occurs with other war-related terms such as arrow, spear, cavalry, knights and army, reflecting a coherent semantic cluster.

Beyond the semantic dimension, readers also construct mental representations of the events, characters and settings described in a text. These representations are referred to as situation models (Dijk and Kintsch, 1983) or mental model (Johnson-Laird, 1989). Situation models are central to discourse comprehension, as they allow readers to mentally simulate the narrative world. Zwaan et al. (1995) identified three dimensions of situation models, namely temporality, spatiality and causality. McNamara et al. (2014) further distinguished between intentionality and causality, contending that intentionality refers to the actions of animate agents as part of plans in pursuit of goals, whereas causality is described as the mechanisms that may or may not be driven by goals of people. Coherence is maintained when any of these four dimensions remain continuous or logically connected; where discontinuity arises, additional cohesive devices become necessary to restore coherence. Example (6) demonstrates this through the use of causal verbs such as break and destroy, which establish a cause-and-effect relationship and support the narrative of triumph in warfare. These verbs reinforce coherence through causality and intentionality.

In summary, cohesion is an observable feature of discourse that can be measured through linguistic devices such as conjunctions, referential ties, and lexical overlap. In contrast, coherence is a higher-order construct reflecting the mental representation of meaning, which can only be approximated through measures like semantic similarity and situation modelling. Understanding both cohesion and coherence is essential for analysing how discourse is structured and understood by readers or listeners.

The concept of explicitation revisited

As discussed, the tendency for translated texts to exhibit increased cohesion compared to their non-translated counterparts has been widely described as explicitation. Blum-Kulka (1986) formally introduced this concept, arguing that translators often employ more cohesive devices in the target text than are strictly necessary for comprehension. However, the roots of the concept can be traced back to Vinay and Darbelnet (1958, p. 9) as a translation procedure that “consists in introducing in the target language details that remain implicit in the source language, but become clear through the relevant context or situation”.Footnote 1 Shuttleworth (1997, p. 55) described explicitation as a phenomenon in which target texts tend to express source text information more explicitly than the original, often through the addition of explanatory elements and enhanced communicative cues.

A significant development in explicitation studies was Klaudy’s (1998) typology, which classified explicitation into four types: obligatory, optional, pragmatic and translation-inherent. This categorisation is particularly important as it distinguishes explicitness that emerges in the translation process itself from that which arises due to language or cultural constraints. In her definitions, obligatory explicitation occurs due to structural differences between the source language and target language in grammar or semantics, and optional explicitation is because of the strategies to build texts or stylistic influences across languages in the form of additional connectives or emphasisers (Klaudy, 1998). While pragmatic explicitation arises for the clarification of cultural-specific items by translators due to assumed cultural differences, translation-inherent explicitation is described as an inevitable consequence of all translational activities.

However, this typology has not gone unchallenged. Englund Dimitrova (2005) argued that the concept of translation-inherent explicitation is not sufficiently clear to understand, and pragmatic explicitation in fact belongs to optional explicitation. Becher (2010) also criticised the vagueness in the definition of the explicitation and supports the asymmetry hypothesis proposed by Klaudy and Károly (2005, p. 14), which means that there tends to be more explicitation than the corresponding implicitation in the translated language. Indeed, Kruger (2019) found that the level of explicitness exceeds that of implicitness in translated English from Afrikaans based on the frequency of the optional complementiser that.

Despite ongoing debates regarding its definition and classification, most scholars agree on the underlying premise that translated texts tend to contain more communicative cues to facilitate comprehension. This increased explicitness may manifest through denser cohesive ties or the explicit articulation of information that remains implicit in the source text. Consequently, explicitation has become a central concept in corpus-based translation studies, where it has been investigated using various linguistic indicators. For example, Konšalová (2007) examined the morphosyntactic structures between Czech and German, discovering a strong tendency towards explicitation in both Czech and German translations. Jiménez-Crespo (2015) focused on the verbal use in two production stages of Spanish-English translation, finding that explicitation varies under different production conditions. Other indicators used to investigate explicitation include connectives (Jiang and Niu, 2022; Marco, 2018; Song, 2022; Xiao, 2010, 2015; Zufferey and Cartoni, 2014), personal pronouns (Xiao and Hu, 2015; Zhang et al., 2020), optional complementiser that (De Sutter and Lefer, 2020; Kruger, 2019; Kruger and De Sutter, 2018; Olohan and Baker, 2000) and mean length of sentence (Hu et al., 2019; Xiao and Hu, 2015).

Taken together, these studies underscore the robustness of explicitation as a translational phenomenon. They also highlight the importance of considering both linguistic and contextual factors in analysing the explicitness of translated texts.

The hypothesis of risk-aversion and algorithmic bias

While explaining the tendency towards explicitation in human translation, Pym (2005, 2015, 2020) formulated the risk-aversion hypothesis, suggesting that translators tend to explicate the information to reduce ambiguity and misunderstanding for the reader. Pym (2015, 2020) considers translation as a process of risk management, by which translators may face three types of risks, including credibility risk, uncertainty risk and communicative risk. Credibility risks focus on the social networks, involving the danger of losing trust from clients, end-users and other participants. Uncertainty risks entail the cognitive process when translators make linguistic decisions on how to render. Communicative risks, as its name suggests, are related to the fear of unfulfilling the expected communicative effects. These risks are correlated, meaning that linguistic uncertainties may further lead to miscommunication and even loss of trust from the clients (Pym, 2020, p. 449).

This model resonates with recent developments in the cognitive approach to translation, particularly the 4EA (embodied, embedded, enacted, extended and affective) paradigm of cognition, also referred to as situated or social cognition (Halverson, 2015; Milošević and Risku, 2021; Muñoz Martín, 2016; Risku and Windhager, 2013, 2013; Robinson, 2020). The basic view of situated cognition is that human cognition is embedded, embodied, enacted, extended and affective, or in other words, human cognitive process is interacted with the environment, mediated by the body, oriented to actions and supported by artefacts (Rowlands, 2010, pp. 51–84). From this perspective, translators’ preference for explicit, risk-reducing choices can be seen not as an isolated cognitive tendency, but as an adaptive strategy shaped by the broader institutional and communicative context of the translation industry.

The hypothesis of risk-aversion has been supported by several studies. For instance, Kruger (2019) examined several factors that may cause the omission of the complementiser that, finding that translators favour using explicit that in contexts which trigger low communicative risk, such as in fictional texts. Kruger and De Sutter (2018) also investigated several situations where that is either explicit or implicit, revealing that translators tend to avoid omitting that in registers, whether omission is more conventional (e.g., creative writing and reportage) and use the most frequent and formal option. Delaere and De Sutter (2013) provided further evidence by investigating the lexical choice between translated and non-translated Dutch, demonstrating that translators tend to favour safer, more mainstream linguistic options in their choices compared to original writers. They argued that the potential reason behind the risk-averse behaviours of translators is that they are more influenced by their perception of the target audience. Generally, these studies reveal a consistent pattern: when faced with uncertainty, translators gravitate toward options that minimise risk, enhance clarity and conform to normative language use. If this interpretation holds, one would expect human-translated texts to display a higher frequency of cohesive devices than original texts, even in registers where lower cohesion is stylistically acceptable.

In parallel, tendencies shown in machine translated texts are believed to be linked to algorithmic bias, where the linguistic patterns in the training data, often drawn from human translations, are not merely replicated but amplified by statistical or neural models (De Clercq et al., 2021; Jiang and Niu, 2022; Luo and Li, 2022; Niu and Jiang, 2024; Vanmassenhove et al., 2019, 2021). From this perspective, machine translation may inherit and intensify the explicitation patterns found in its human-produced training corpora. Consequently, it is reasonable to hypothesise that machine translation output might also exhibit strong tendencies toward explicitation, potentially even exceeding those found in human translations under certain conditions.

Research methodology

This section provides an exposition of the data and methodologies employed in the present study. The structure of the corpora is elaborated upon in section “Corpora design”, followed by an introduction to the measurement indices in section “Measurement”. Considering the research objectives and data distribution, section “Data analysis” elucidates on three non-parametric multivariate analysis techniques.

Corpora design

In order to make a comparison between translated and original English, two balanced corpora, the Corpus of Chinese into English (COCE)Footnote 2 (Li and Yang, 2017; Liu and Afzaal, 2021) and the Freiburg-LOB Corpus of British English (FLOB) (Hundt et al., 1999) were used in the present study. Furthermore, for the specific purpose of contrasting and investigating machine translationese, two neural machine translation corpora were generated by rendering the texts in the Lancaster Corpus of Mandarin Chinese (LCMC) (McEnery and Xiao, 2004) through two popular neural machine translation tools, namely DeepLFootnote 3 and Google TranslateFootnote 4. We name them the Lancaster Corpus of Mandarin Chinese Translated into English by DeepL (LCMCTD), and the Lancaster Corpus of Mandarin Chinese Translated into English by Google Translate (LCMCTG) respectively. In this way, the four corpora are comparable in terms of the number of texts, registers and types, and three translation corpora share the same source language (see Table 1).

Table 1 Registers and genres in the four corpora.

Measurement

The present study takes a quantitative approach using a great number of linguistic features extracted from Coh-Metrix 3.0 (McNamara et al., 2014)Footnote 5. Coh-Metrix was developed by McNamara et al. (2014) as a powerful tool providing measurement of cohesion and coherence in a discourse. This latest version provides 108 indices of a text, covering descriptive statistics of a text, text easability, referential cohesion, LSA, connectives, situation model, syntactic pattern density, word information and readability, and these metrics can be applied to nearly any types of texts and genres (Graesser and McNamara, 2011).

Given that the primary aim of this study is to examine how translation variety and register influence patterns of cohesion and coherence, our analysis focuses on a subset of Coh-Metrix indices most directly related to these two constructs. Specifically, we include features grouped into five core components: referential cohesion, personal pronouns, connectives, LSA and the situation model. These are summarised in Table 2. More information about the variables can be accessed in McNamara et al. (2014, pp. 247–251).

Table 2 Indices used in the present study from Coh-Metrix.

As outlined in section “Cohesion and coherence”, we classify referential cohesion, personal pronouns and connectives as indicators of cohesion, while LSA and the situation model are treated as proxies for coherence. Referential cohesion reflects the degree of overlap among nouns, pronouns and other content words that create continuity across sentences. Coh-Metrix calculates five types of coreference: noun overlap (CRFNO1 and CRFNOa), argument overall (CRFAO1 and CRFAOa), stem overlap (CRFSO1 and CRFSOa), content word overlap (CRFWO1 and CRFWOa) and anaphor overlap (CRFANO1 and CRFANOa). The personal pronoun category includes normalised frequencies (per 1000 words) of first-person singular (WRDPRP1s), first-person plural (WRDPRP1p), second-person (WRDPRP2), third-person singular (WRDPRP3s) and third-person plural (WRDPRP3p) pronouns, which reflect inter-personal cohesion in discourse. Connectives, as cohesive ties between clauses or sentences, are quantified by type (per 1000 words), including causal (CNCCaus), logical (CNCLogic), adversative/contrastive (CNCADC), temporal (CNCTempx), additive (CNCAdd), positive (CNCPos) and negative (CNCNeg). Semantic similarity, measured via LSA, focuses on “semantic overlap between explicit words and words that are implicitly similar or related in meaning” (McNamara et al., 2014, p. 66). Coh-Metrix provides measurement of this component at the level of adjacent sentences (LSASS1) and paragraphs (LSASSp). A unique measure of how much new versus old information exists (LSAGN) is also provided. According to McNamara et al. (2014, p. 66), sentence content is partitioned as given, partially given (based on various types of inferential availability) or new, and LSAGN serves as a proxy for how much given versus new information exists in each sentence in a text, compared with the content of prior text information. Finally, the situation model represents deeper, coherence-related cognitive structures that track causality, intentionality and temporal continuity. Relevant indices include: causal verbs (SMSCAUsv), causal verbs and particles (SMCAUSvp), intentional verbs (SMINTEp), ratio of causal particles to verbs (SMCAUSr), ratio of intentional particles to verbs (SMINTEr), LSA verbs overlap (SMCAUSlsa), WordNet verbs overlap (SMCAUSwn) and tense and aspect repetition (SMTEMP).

Descriptive statistics of these indices across registers and translation varieties are presented in Tables 3 and 4, respectively. Because the Coh-Metrix indices differ in scale and units, we standardised all variables into z-scores prior to conducting multivariate analyses. This step ensured comparability across metrics and allowed for the identification of relative patterns in cohesion and coherence features across text types.

Table 3 Summary of descriptive statistics for cohesion and coherence measures across different registers (mean ± SD).
Table 4 Summary of descriptive statistics for cohesion and coherence measures across different translation varieties (mean ± SD).

Data analysis

The present study triangulates exploratory and confirmatory analyses by combining PCA, an unsupervised machine learning technique and FDA, a supervised machine learning technique. The method of triangulation in corpus-based translation studies has been widely accepted and adopted in a large number of earlier studies. For example, Evert and Neumann (2017) employed PCA and linear discriminant analysis (LDA) to investigate the shining-through effect (Teich, 2003) in German-English translations. De Sutter et al. (2012) and Delaere and De Sutter (2017) combined profile-based correspondence analysis and logistic regression model to investigate the influence of source languages and registers on onomasiological variants in Dutch translations. These studies serve as important methodological precedents and reinforce the validity of the combined approach adopted here.

In the current study, PCA was first employed to explore the underlying structure of the dataset without imposing any assumptions related to the grouping variables. As an unsupervised technique, PCA enables the visualisation of potential clustering among text samples—potentially reflecting register differences or translation varieties—based solely on their linguistic features. To evaluate whether the observed group separations were statistically significant, we subsequently conducted a PERMANOVA (Anderson, 2017), which provides a robust, non-parametric assessment of group differences based on distance matrices.

While PCA offers an exploratory overview, FDA was subsequently implemented to confirm and quantify the discriminative power of linguistic features in classifying texts according to register and translation variety. This method was chosen over LDA due to the violation of multivariate normality, as indicated by the Henze-Zirkler test (HZ = 1.01, p < 0.001). Unlike LDA, which assumes a multivariate normal distribution, FDA applies non-parametric regression techniques, making it more suitable for the distributional properties of our dataset (Hastie et al., 1994; Mallet et al., 1996). Importantly, the FDA not only classifies observations into predefined groups but also identifies the relative contribution of each predictor (i.e., linguistic feature) to the classification process. This enables the analysis to move beyond general group differences and pinpoint which aspects of cohesion and coherence most effectively differentiate registers and translation varieties. Through this, we are able to uncover meaningful patterns and provide interpretive depth to the observed variation. All analyses were performed using R Studio version 4.2 (R Core Team, 2023).

Results and discussion

This section encompasses the presentation of the research findings and discussion of the results. In the section “General results”, a general analysis based on PCA and PERMANOVA was conducted to identify which factor exerts a greater influence. Subsequently, confirmatory analyses of register variation, translation variety and their interactions were performed in sections “Register divergence”, “Translation variety” and “Interactions of register divergence and translation variety”, respectively. Section “Discussion” expounds upon these findings, elucidating how they contribute to addressing the three research questions.

General results

This section presents the overall results of the PCA, followed by cross-validation using PERMANOVA to assess the statistical significance of observed differences. Figure 1 displays scatterplots for the three dimensions generated by PCA, with ellipses representing 95% confidence intervals and density curves plotted along each axis to visualise distributional tendencies. As PCA is an unsupervised technique that does not incorporate categorical variables, the dimensions extracted may reflect variation attributable to registers, translation varieties or other latent factors. To facilitate interpretation, we overlay labels for both register and translation variety onto the PCA plots. Figure 1a presents the biplot of the first (x-axis) and second (y-axis) dimensions, which account for 34.9% and 12.2% of the total variance, respectively. The horizontal axis (dimension 1) appears to primarily reflect variation in registers. Specifically, academic and fictional texts are situated at opposite ends of the axis: academic texts toward the left and fictional texts toward the right, while general and journalistic texts occupy the middle positions. This suggests that dimension 1 captures a continuum of register-based discourse variation. By contrast, dimension 2 (as shown on the vertical axis of Fig. 1a and the horizontal axis of Fig. 1b) does not reveal a clear pattern associated with register. However, dimension 3 (on the vertical axis of Fig. 1b and horizontal axis of Fig. 1c), representing 9.3%, appears to differentiate literary from non-literary texts. Fictional texts cluster in the positive coordinate space, whereas journalistic, general and academic texts are predominantly located in the negative range.

Fig. 1: Biplots of PCA dimensions with overlaid density curves.
figure 1

ac Illustrate the intersections of the first three PCA dimensions, grouped by register, while panels (df) present the same intersections grouped by translation variety. Density curves indicate the distributional tendencies within each grouping.

Turning to translation variety, Fig. 1d (dimensions 1 and 2) does not show a strong separation between translated and non-translated texts. However, clearer distinctions emerge in Fig. 1e, f, where dimensions 2 and 3 are plotted along the horizontal axis. Along dimension 2, original English texts are concentrated between 0 and +5, while translated texts, both human and machine (DeepL and Google), are mainly positioned between −5 and 0. In Fig. 1f, the trend reverses along dimension 3, further suggesting that dimensions 2 and 3 jointly capture variation attributable to translation variety. Notably, within the translated group, there is substantial overlap between human translation and machine translations from DeepL and Google, while all translated varieties are slightly separated from non-translated texts. This suggests that dimension 3 may reflect a convergence of register and translation-based variation. Although translation variety does influence cohesion and coherence patterns, it does so to a lesser extent than register, based on the relatively larger variance of dimension 1 compared to dimensions 2 and 3.

This interpretation is reinforced by the results of the PERMANOVA analysis, summarised in Table 5. Euclidean distance was used to measure dissimilarity among samples, and the Bonferroni method was applied to adjust p-values. We tested the main effects of register and translation variety, as well as their interaction. All effects were found to be statistically significant. However, their relative contributions to variance differ markedly: register explains 18% of the variance, translation variety accounts for 7% and their interaction contributes only 2%. These findings align with the PCA results and substantiate the conclusion that register variation is the dominant factor shaping cohesive and coherent patterns in the texts analysed. In contrast, translation variety, though significant, exerts a comparatively smaller effect. The minimal interaction effect further suggests that register and translation variety influence discourse features in largely additive rather than synergistic ways.

Table 5 Results of PERMANOVA.

Register divergence

To investigate how cohesion and coherence vary across registers and to uncover the characteristics of each register, we applied FDA and a post-hoc PERMANOVA test. This dual approach serves to both confirm whether meaningful variation exists across registers and to identify the specific linguistic features that contribute to such variation. Key discriminating variables were identified by examining their weights in the most informative FDA dimensions. In addition, Kruskal–Wallis tests and Dunn’s post-hoc comparisons were used to determine how these variables differ across the four registers.

Figure 2 presents the biplots of the three discriminants extracted through FDA. The model’s classification performance is acceptable, with an accuracy rate of 72% for the training data (70% of the total dataset) and 70% for the testing data (30%). This performance suggests that registers exhibit distinctive cohesive and coherent patterns that are reliably separable by statistical modelling.

Fig. 2: Biplot of FDA dimensions for registers with overlaid density curves.
figure 2

a Illustrates the combination of dimensions 1 and 2, panel (b) shows the combination of dimensions 2 and 3, and panel (c) presents the combination of dimensions 3 and 1.

In Fig. 2a, the x-axis represents discriminant 1 (66.48% of the variance), and the y-axis represents discriminant 2 (23.39%). The density curves along each axis further illustrate distributional patterns. The four registers are clearly separated in this plot, closely mirroring the PCA results. Specifically, fictional texts cluster between −2 and 0 on the horizontal axis, academic texts are situated on the far right, and journalistic and general texts fall between them. Figure 2b shows that discriminant 2 distinguishes general texts from the others, with journalistic texts occupying a middle position. In Fig. 2c, discriminant 3 (10.14%) appears to differentiate news texts from the remaining registers. Overall, the dominant role of discriminant 1, which captures over half of the total variance, highlights the central importance of register in shaping cohesive and coherent patterns.

These differences are statistically validated by pairwise PERMANOVA results in Table 6, which confirm significant variation across all four registers. The F-values reflect the magnitude of dissimilarity between register pairs, with higher values indicating more substantial differences. For example, academic texts differ most strongly from fictional texts (F = 382.35, df = 1, p < 0.001). The difference between journalistic and fictional texts is also pronounced (F = 194.11, df = 1, p < 0.001). Even between journalistic and general texts, where visual overlap is observed in the central PCA and FDA plots, a significant difference is present (F = 35.63, df = 1, p < 0.001), confirming the presence of nuanced but meaningful register-specific variation.

Table 6 Pairwise comparison of registers.

To identify the linguistic features responsible for these differences, we examined the variable weights in the first FDA dimension. Figure 3 displays the five most important contributors: lemmas overlap between two adjacent sentences (CRFSO1), third-person pronouns in plural forms (WRDPRP3s), logic connectives (CNCLogic), the ratio of new and given information measured by LSA (LSAGN) and the ratio of intentional particles to the intentional verbs (SMINTEr). Among these, features related to coreference and personal pronouns appear to play a particularly central role in register differentiation, suggesting that referential cohesion is a key driver of register variation.

Fig. 3
figure 3

Variable weights contributing to the FDA discriminant 1.

To explore how these variables behave across registers, we conducted Kruskal–Wallis tests, followed by Dunn’s post-hoc comparisons. The results, presented in Fig. 4, indicate significant register-based differences in all five dimensions. The χ2 values represent the test statistic used to compare distributions across groups, and higher χ2 values indicate greater divergence between groups. In Fig. 4a, academic texts show the highest levels of cohesion measured by lemmas overlap, significantly exceeding all other registers (χ2 = 871.61, df = 3, p < 0.001), indicating a stronger reliance on lexical repetition for cohesion. Interestingly, Fig. 4b reveals that fictional texts are characterised by a significantly higher frequency of third-person plural pronouns, whereas academic texts show the lowest usage (χ2 = 847.37, df = 3, p < 0.001), consistent with narrative storytelling versus impersonal academic discourse (Biber, 1988). In Fig. 4c, the use of logical connectives is again most prominent in academic texts (χ2 = 72.46, df = 3, p < 0.001), aligning with their expository function and argumentative structure. Figure 4d shows that academic texts also demonstrate the highest semantic similarity (LSAGN) among these four registers, while fictional texts display the lowest levels (χ2 = 422.15, df = 3, p < 0.001), reflecting their preference for creative linguistic variation over semantic repetition. With regard to intentionality markers (Fig. 4e), academic texts also lead, showing the highest intentional coherence (χ2 = 631.07, df = 3, p < 0.001), reinforcing their deliberate rhetorical organisation.

Fig. 4: Results of Kruskal–Wallis tests on five key variables for registers in discriminant 1.
figure 4

ae Display the associations between register variation and five representative variables: CRFSO1, WRDPR3s, CNCLogic, LSAGN and SMINTER, respectively. Asterisks indicate statistically significant differences between pairs, while “ns” denotes non-significant differences (*p < 0.05, **p < 0.01, ***p < 0.001).

In summary, academic texts are marked by high levels of referential cohesion, semantic similarity, logicality and intentionality, reflecting their formal and structured communicative purpose. In contrast, fictional texts tend to ensure narrative fluidity and character interaction through the use of third-person plural pronouns. Journalistic and general texts display more balanced cohesion and coherence profiles. In some features, such as the overlap of lemmas between sentences and third-person plural pronouns, their patterns are statistically indistinct, indicating a degree of stylistic convergence between these two registers.

Translation variety

Following the same procedure outlined in section “Register divergence”, another FDA was conducted to examine the differences among translation varieties. The model yielded an accuracy rate of 77% on the training dataset and 75% on the testing dataset, indicating a generally reliable classification performance. Figure 5 presents the biplots of the three discriminant dimensions generated by the second FDA model. In Fig. 5a, the x-axis represents the first discriminant dimension (56.04% of the variance), while the y-axis corresponds to the second (36.06%). Human-translated texts primarily fall within the positive range of the x-axis, whereas the other varieties cluster on the negative side, suggesting a clear distinction between human translations and the other three varieties along this dimension. In Fig. 5b, the horizontal axis (dimension 2) differentiates Google translations from the remaining varieties, with non-translations and DeepL translations occupying a central position. Figure 5c plots the third discriminant (7.9%) along the x-axis and the first dimension along the y-axis. While dimension 3 captures a relatively smaller portion of the variance, it visibly separates translated from non-translated texts based on the density curves, echoing the findings from the PERMANOVA and PCA analyses. Specifically, non-translated texts tend to cluster between 0 and +2, while translated texts span from −2 to 1 on the third dimension. However, it should be noted that despite the overall high accuracy of the model, the dimension capturing translations versus non-translations does not account for a very large proportion of variance. This is because the FDA model captures multidimensional distinctions among the four translation varieties.

Fig. 5: Biplot of FDA dimensions for translation varieties with overlaid density curves.
figure 5

a Illustrates the combination of dimensions 1 and 2, panel (b) shows the combination of dimensions 2 and 3, and panel (c) presents the combination of dimensions 3 and 1.

To statistically validate these observations, a pairwise PERMANOVA was conducted. The results, presented in Table 7, indicate significant differences among the four translation varieties. Much larger disparities are observed in comparisons involving non-translated texts: Original English vs. DeepL Translation (F = 73.95, df = 1, p < 0.001), Original English vs. Google Translation (F = 72.32, df = 1, p < 0.001) and Human Translation vs. Original English (F = 64.97, df = 1, p < 0.01). Interestingly, although the two machine-translated varieties, DeepL and Google translations, also differ significantly from each other (F = 9.12, df = 1, p < 0.001), the magnitude of difference is relatively small since they share the same source text. These results suggest that machine translations exhibit a distinct cohesive and coherent profile, which is neither fully aligned with human translations nor with original texts.

Table 7 Pairwise comparison of translation varieties.

To further interpret these differences, we examined the contribution of individual variables to the third discriminant dimension (Fig. 6). Although discriminant 3 explains only 7.9% of the variance, choosing this dimension for further analysis is because the primary goal of this study is not to maximise classification accuracy between translation varieties, but rather to investigate general tendencies of cohesion and coherence in translated texts. This approach follows a precedent set by Evert and Neumann (2017), who highlighted the value of interpreting lower-variance dimensions when they reveal linguistically meaningful patterns. Nonetheless, we acknowledge this limitation in the discussion and call for cautious interpretation.

Fig. 6
figure 6

Variable weights contributing to the FDA discriminant 3.

Variables that contribute most to distinguishing translated from non-translated texts are primarily related to referential cohesion and semantic similarity (as measured by LSA), with lesser contributions from situation model, connectives and personal pronouns. Similarly, we focus on the most influential variable in each component for interpretive analysis. These include: the average number of shared lemmas between sentences (CRFSOa) for referential cohesion, first-person plural pronouns (WRDPR1s) for personal pronouns, frequency of negative connectives (CNCNeg) for connectives, semantic similarity between adjacent sentences (LSASS1) for LSA and frequency of causal verbs indicating changes of state (SMCAUSv) for situation models.

Figure 7 presents the results of Kruskal–Wallis tests on these five key variables, followed by Dunn’s test for pairwise comparisons. Overall, translated and non-translated texts exhibit systematically different patterns of cohesion and coherence. Specifically, Fig. 7a shows significant differences in the score of lemmas overlap between sentences in an entire text across the four varieties (χ2 = 55.79, df = 3, p < 0.001). Human-translated texts exhibit more referential cohesion than non-translated texts, but both types of machine translation demonstrate even higher levels of stem overlap. In contrast, Fig. 7b shows no statistically significant differences in the use of first-person plural pronouns across the four varieties (χ2 = 7.27, df = 3, p = 0.064), indicating that this particular aspect of cohesion may be less sensitive to translation variety. Figure 7c shows that negative connectives are more frequently used in original texts than in translations (χ2 = 214.38, df = 3, p < 0.001), suggesting that non-translations may present more contrastive or argumentative discourse relations. Similarly, Fig. 7d highlights a significant difference in semantic similarity (χ2 = 62.01, df = 3, p < 0.001), with both human and machine translations showing greater local coherence than original texts. For verb-based cohesion (Fig. 7e), human translations employ significantly more causal verbs than both non-translations and machine translations (χ2 = 172.83, df = 3, p < 0.001), reinforcing their tendency toward greater explicit causality. However, this finding contrasts with Jiang and Niu (2022), who reported that human translations tend to show higher semantic similarity (LSASS1) but lower usage of causal verbs (SMCAUSv) than original texts.

Fig. 7: Results of Kruskal–Wallis tests on five key variables for translation varieties in discriminant 3.
figure 7

ae Display the associations between translation variety and five representative variables: CRFSOa, WRDPR1s, CNCNeg, LSASS1 and SMCAUSv, respectively. Asterisks indicate statistically significant differences between pairs, while “ns” denotes non-significant differences (*p < 0.05, **p < 0.01, ***p < 0.001).

In summary, translated texts by both humans and machines tend to exhibit enhanced cohesion relative to non-translated texts. However, the increased explicitness is not uniform across all cohesive and coherent metrics, and machine translations tend to overrepresent certain cohesive features. These results underscore the complex, multidimensional nature of translation-induced variation in textual cohesion and coherence.

Interactions of register divergence and translation variety

Sections “Register divergence” and “Translation variety” examined the main effects of register and translation variety. However, it remains unclear whether the linguistic characteristics exhibited by translated texts are consistently distributed across different registers. This section aims to address this gap by investigating whether the phenomenon of explicitation can be observed consistently across the four specific registers in both human and machine translations. To this end, a register-specific analysis was conducted using Kruskal–Wallis tests followed by Dunn’s post hoc comparisons. The results are summarised in Table 8. In the table, the symbols ‘>’ and ‘<’ denote whether the mean rank of the first translation variety is higher or lower than that of the second. An asterisk ‘*’ indicates a statistically significant difference, while ‘ns’ refers to non-significant results.

Table 8 Differences between translation varieties in key variables across registers.

A closer analysis of journalistic texts reveals that, for three out of five variables, translated texts exhibit higher values than non-translated texts, indicating a tendency toward explicitation. However, negative connectives continue to be used significantly more frequently in original texts. In contrast, the pattern in general texts is somewhat more nuanced. Human and machine translations tend to employ first-person plural pronouns, causal verbs and semantic overlaps more frequently than non-translations, though not all of these differences are statistically significant. Interestingly, while overlaps of lemmas between sentences occur more often in non-translated texts than in human translations, machine translations do not differ significantly from originals in this regard, suggesting a closer alignment with native patterns in this dimension. A different pattern is also observed in academic texts. Human translations display explicitation primarily through increased use of causal verbs. Machine translations, by contrast, show a greater tendency toward increased cohesion and coherence in terms of semantic similarity and lemma overlaps, though the latter is not statistically significant. In fictional texts, a clearer tendency toward explicitation is observed in both referential cohesion and LSA measures. However, causal verbs, which often serve to clarify logical connections, are not consistently more frequent in human translations in this register, indicating that explicitation may operate selectively depending on discourse function and narrative style. Conversely, negative connectives remain consistently more frequent in non-translated texts across all registers, challenging the idea that original texts tend to favour implicit cohesion strategies. These results highlight the genre-specific effect on the tendency towards explicitation.

In summary, while explicitation is a prominent feature of translated texts, it is not uniformly distributed across registers. Instead, its presence appears to be register-sensitive and context-dependent. This conditional distribution suggests that explicitation should not be regarded as a universally consistent feature of translation, but rather as one that interacts dynamically with genre conventions and translation varieties.

Discussion

The aforementioned analysis suggests that the observed discrepancies in cohesion and coherence are predominantly attributed to register variation, while translation varieties and their interaction exert a comparatively minor and marginal influence. Specifically, substantial differences in cohesion and coherence are found across the four registers examined, with particularly marked contrasts between academic and fictional texts. Furthermore, both human and machine translations exhibit a general tendency toward explicitation, with machine translations displaying an even stronger inclination toward increased cohesion and coherence in certain metrics. However, this tendency is not uniformly evident across all registers or variables, but is instead context-sensitive. The following discussion interprets these findings in relation to the study’s three research questions.

The first research question addressed how translated texts, both human and machine, differ from non-translated texts in terms of cohesion and coherence. The analysis reveals a clear overall tendency toward explicitation in both translation varieties, supporting prior findings and theoretical expectations. In particular, machine-translated texts show significantly higher levels of referential cohesion and semantic similarity, which may reflect the influence of algorithmic bias—that is, the tendency of machine learning models to favour more frequent or prototypical linguistic choices from their training data (De Clercq et al., 2021; Jiang and Niu, 2022; Luo and Li, 2022; Vanmassenhove et al., 2019, 2021). This tendency can be understood as a form of amplification of translational features, where machine translation exaggerates existing norms or tendencies found in human translation. However, explicitation in machine translation cannot be solely attributed to algorithmic bias. Other factors, such as training data, model architecture and algorithmic design choices, also play an important role (Jiang and Niu, 2022). For instance, Google Translate is typically based on recurrent neural networks (RNNs) and trained on broad digital resources, while DeepL relies on convolutional neural networks (CNNs) and draws from the Linguee bilingual corpus (Mouratidis et al., 2021; Ziganshina et al., 2021). Although both systems show explicitation tendencies, intra-system differences are also apparent: for example, Google Translate demonstrates a higher frequency of verb cohesion to express causality, whereas DeepL exhibits a more conservative approach in that regard.

The second research question seeks to address whether these translational characteristics are consistently observable across different registers. While explicitation is not uniformly present across all variables or registers, a general trend toward increased cohesion and coherence is still discernible in both human and machine-translated texts. This trend, particularly in human translation, can be partly interpreted through a risk-averse lens. According to Pym (2015, 2020), human translators often adopt risk-reduction strategies—increasing textual cohesion to enhance comprehension and provide clearer communicative cues for the reader. This strategic choice is especially relevant when translators operate under pressure to produce accurate and culturally acceptable outputs. Even in fiction, a register typically characterised by lower cohesion, translated texts often exhibit greater cohesion and coherence scores than their original English counterparts. Nevertheless, there are also exceptions, where human translators may engage in risk-taking behaviour, such as the use of lemmas that overlap between sentences in academic and general texts. As Pym (2015, 2020) suggests, the risks translators seek to mitigate are not confined to the text-translator relationship, but also include potential consequences involving readers, clients and other institutional stakeholders. Hu (2020), Sela-Sheffy (2005) and Robinson (2020) depicted how translators validate and internalise norms from the long-term interaction of their experience and the embedded environment, demonstrating that translators may undergo some dangers, negative outcomes or penalties if they disobey the designated norms. Hence, risk-taking behaviours are more likely to be avoided, while risk-aversion options are more likely to be favoured in general due to the entrenchment of experience. Nonetheless, the presence of implicitness in certain components of the translated texts also likely reflects source-language interference, or the shining-through effect (Teich, 2003), particularly given the tendency for Chinese source texts to exhibit implicit cohesion. This finding also aligns with Xiao and Hu (2015) and Zhang et al. (2020).

The third research question examined whether register or translation variety has a greater impact on patterns of cohesion and coherence. The triangulated results from PCA, PERMANOVA and FDA confirm that register exerts a more substantial influence than translation variety. This finding aligns with prior studies that emphasise the dominant role of register in shaping textual features, including those in translated texts (Diwersy et al., 2014; Kruger and Rooy, 2018; Neumann, 2014). Additionally, the interaction between translation variety and register underscores the contextual dependence of translational tendencies. As Delaere and De Sutter (2017, p. 106) argue, the general characteristics associated with translated language should not be treated as universal or homogeneous; rather, they are modulated by register-specific constraints and communicative goals. Therefore, any investigation into the linguistic features of translated texts must account for the interplay between translation and register, rather than isolating translation effects in a vacuum.

In summary, this study demonstrates that while both human and machine translations exhibit a tendency toward explicitation, this is neither uniform across all linguistic variables nor independent of contextual factors. Register emerges as a more powerful contributor to variation in cohesion and coherence than translation variety, and translational choices are shaped by a complex constellation of cognitive, social and technological influences. These findings call for a context-sensitive and multifactorial approach to the study of translated language, particularly as machine translation continues to evolve and interact with human translation practices.

Conclusion

The current study is related to the investigation of cohesive and coherent features shown in translated texts by incorporating the factors of translation variety and register divergence. The data were extracted from five components of Coh-Metrix, including referential cohesion, person pronouns, connectives, LSA and situation model, and the results were based on the triangulation of exploratory and confirmatory multivariate techniques. Several interesting findings were reported in this work: first, translated texts display a tendency towards explicitation at the general level, and human and machine-translated texts are characterised by different unique patterns in cohesion and coherence; second, these patterns are not consistently distributed in different registers and largely context-dependent; third, among the influencing factors, register variation exerts a greater impact on cohesion and coherence than translation.

This work is of significance due to its theoretical, methodological and practical implications. Theoretically, it underscores the importance of examining universal tendencies in translation, particularly explicitation, within their broader technological and functional contexts. Methodologically, the study adopts a triangulated analytical approach, combining PCA, PERMANOVA and FDA, to provide a robust and multidimensional picture of the cohesion and coherence features in translated versus non-translated texts. Practically, the findings call for a more nuanced consideration of machine translation in translation practices, especially regarding how its output diverges from that of human translators in terms of textual cohesion and coherence. This could inform future strategies for improving neural machine translation systems, post-editing processes and translator training.

Nevertheless, there are still several issues worthy of further exploration in the future. The current study focused on English translations from Chinese, where a degree of implicitation in connectives was observed, likely due to the shining-through effect, as Chinese tends to favour implicit cohesion. Future studies should investigate other language pairs to assess how source language typology influences cohesive patterns in translation. Moreover, additional factors such as translator experience, stage of translation (draft vs. final version), intended target audience, directionality (L1–L2 vs. L2–L1 translation) and language status (e.g., dominant vs. minority languages) could provide a more comprehensive understanding of the variation in translated texts. Finally, our analysis also suggests instances of risk-taking behaviour among translators in certain registers, despite the general tendency toward risk aversion. This highlights the importance of further investigating the socio-cognitive and contextual factors that influence when and why translators might adopt risk-taking strategies, moving beyond norms and constraints to consider agency, motivation and communicative intent.