Introduction

The strengthening of global connections has led to a growing need for fast, accurate, and culturally appropriate translation, which in turn has significantly increased the demand for qualified translators and interpreters (Bai et al. 2018). As of 2024, ~300 higher education institutions in China have been approved by the Ministry of Education to offer undergraduate programs in translation. Assessing the translation ability of these students is an important component of the teaching quality guarantee system (Lvy 2018).

A translation test typically refers to an assessment designed to evaluate an individual’s ability to translate text from one language to another, serving as an indispensable link in teaching practice (Zhang 2020). Such assessments are crucial because well-structured and content-reasonable evaluations are vital for enhancing translation competence and quality. Besides, a translation test can be viewed as a pedagogical activity. It is instructive for teaching because it gathers information about learners to detect the individual strengths and weaknesses, thereby guide them to move in the expected direction (Dickson et al. 2020).

However, translation test has long been neglected in China due to the lack of systematic theoretical guidance and effective research methods (Zheng and Mu 2007). Currently, the majority of translation tests are components of broader foreign language proficiency tests in China (Lvy 2018). Apart from translating texts, language tests usually contain other parts such as reading, listening, writing etc. The grades that students receive are merely composite grades which combine a lot of other skills. Moreover, even after receiving these grades, teachers still lack insight into the specific translation ability of the students, as most grades merely discriminate who are performing better than the other (Lin et al. 2010). According to Lin et al. (2010), test grades preferably should supply teachers with analytical information of what learners can or cannot do and provide suggestions on what students need to improve.

Furthermore, there is a conspicuous lack of specialized test designed explicitly for translation majors, especially within the realm of undergraduate translation education (Lvy 2018). Translation ability of undergraduates is often assessed through final exams and assignments, which lack standardization and fail to reflect true ability. Existing qualification exams partly assess translation skills, but their goals differ from teaching evaluation, offering limited feedback and guidance for students and teachers.

Due to the lack of effective translation test to identify students’ ability, translation teaching tends to be confined to the syllabus and textbook, without regard to students’ real situation (Wang 2020). However, in actual teaching settings, nearly every class is a mixed-ability class, meaning that within a single classroom, students possess diverse backgrounds, varying levels of knowledge, distinct learning capacities, and unique characteristics (Tomlinson 2001; Dudley and Osváth 2016). Teachers should tailor their instruction to students’ varying levels by supporting lower-proficiency learners and fostering growth for advanced students (Djurayeva 2021). In this way, assessment is crucial as it enables both students to evaluate and improve their performance and teachers to optimize teaching quality.

Despite efforts by scholars to make research on translation tests (Feng 2014; Chen 2010; Yang and Mu 2016), several challenges and areas for enhancement remain to be addressed to ensure that these tests are both effective and comprehensive. For example, the validity of translation tests is often questioned, highlighting the need to ensure that test scores are meaningful and results are applicable (Xi 2017). Additionally, current translation instruction often overlooks formative assessment, instead prioritizing summative assessment (Mu 2006). Furthermore, scoring methods tend to be subjective with single criteria (Wang and Wen 2009).

Therefore, this study intended to develop a translation test exclusively for translation undergraduates, with the aim of promoting assessment for learning. The development process systematically followed three phases: Analysis, Design and Development, and Evaluation, during which the test underwent three iterations to ensure its validity and reliability. It can provide specialized scores to identify the translation ability of undergraduates majoring in translation instead of composite grades, enabling teachers to implement targeted instruction and maximize each student’s potential.

Objectives of the study

This study mainly answers the following research questions:

  1. 1.

    RQ1: What are the key components of the translation test?

  2. 2.

    RQ2: How would the translation test be designed and developed?

  3. 3.

    RQ3: How would the effectiveness of translation test be evaluated?

Literature review

This section reviews previous studies and relevant theories that form the foundation for the development of the translation test.

Previous related studies on translation test

Translation testing has been examined from various research perspectives. In terms of test development, some scholars have put forward the relevant research framework to guide the process. For instance, Xi (2017) provided a detailed description of the development process of translation test, which included three stages: preparation, implementation and follow-up. Colina (2003) listed the essential tasks that must be completed prior to developing a translation test, which included the test purpose, description of translation ability, acceptability, comparability, clear scoring criteria and its effectiveness.

Some scholars carried out empirical research on developing a translation test. Lvy (2018) designed a translation test to serve as a summative assessment for graduates specializing in translation, providing a final grade. Based on the translation corpus, Chen (2011) conducted the initial development of translation test and formulated its scoring scales. Meanwhile, Brunette (2002) relied on both formative assessment and summative assessment in her attempt to establish a terminology base for translation quality assessment. Other scholars too conducted theoretical and review-type studies on translation tests. However, most of the current research on translation qualification exams employs comparative analysis, contrasting domestic exams with those from other countries (Feng 2014).

The validity of translation tests has also been a key focus of research. For instance, Xi (2017) elaborated on the requirements proposed by existing validity theories for translation test while Yin (2017) examined the validity and reliability of test item construction, particularly focusing on how the selection of source texts impacts test validity. Based on the findings, several suggestions were proposed to enhance validity through more effective text selection.

In addition, the types of translation test had been deliberated. Goff-Kfouri (2004) mentioned four different types of tests in his paper, namely placement test, diagnostic test, progress test, and academic test. Furthermore, the translation test can be classified into two major categories based on their purposes: certification tests and teaching tests (Campbell et al. (2003)).

Nida’s equivalence theory (2003)

The theoretical foundation for the translation test in this study is grounded in Nida’s Equivalence Theory. According to Nida and Taber (2003), translation equivalence can be categorized into two main types: formal equivalence and functional (or dynamic) equivalence. Functional equivalence seeks to produce a similar effect on the target language audience as the original did on its audience, emphasizing naturalness and clarity in the translated text. He further classified translation equivalence into five levels—word class, grammatical category, semantic class, discourse type, and cultural context—offering a framework for analyzing the equivalence between source and target texts.

The translation test designed in this study includes six text types with a mix of Chinese-to-English and English-to-Chinese tasks, aligns closely with Nida’s Equivalence Theory from the three aspects. They are, firstly, covering various text types to assess translation ability comprehensively. Secondly, reflecting the five levels of equivalence through a gradual progression in text difficulty and finally, emphasizing the importance of balancing cultural nuances (Table 1).

Table 1 How translation test is related to equivalence theory (2003).

Halliday’s functional linguistic theory (1985)

Halliday proposed that language had three metafunctions: conceptual function, interpersonal function and textual function (1985). Conceptual functions help people express their cognition and experience of the world. Interpersonal functions show how language is used to express attitudes, build relationships, and set the tone in communication while textual functions organize language into clear and connected texts using themes, information flow, and linking words. According to functional linguistics theory, all linguistic activities are presented through text. Different text types can invoke different language functions (see Table 2). Based on these linguistic functions, Knapp and Watkins (2005) divided the text into narrating text, explaining text, arguing text, instructing text and communicative text. Liu and Wu (2019) classified texts into six categories: descriptive, narrative, expository, argumentative, directive, and communicative. This classification covers the most commonly used text types and aligns with the framework outlined in China’s Standards of English Language Ability, establishing the basis for designing the translation test in this study.

Table 2 How translation test is related to functional linguistics theory (1985).

Vygotsky’s theory (1979)

The concept, Zone of Proximal Development (ZPD) proposed by Vygotsky, emphasizes that the potential range of learners should be accurately identified, and their ability and growth should be stimulated through dynamic support (Vygotsky and John-Steiner 1979). ZPD deals with a person’s potential to learn. In any given grade, while some students have already surpassed expected levels, others are still working to meet the basic requirements — highlighting the importance of identifying not only what learners can do currently, but also what they can achieve with appropriate support. In addition, ZPD provides a theoretical basis for personalized teaching. In education, teachers can design teaching activities suitable for the current development stage of each student based on their ZPD, ensuring that the teaching content is both challenging and within the students’ ability range. Informed by Vygotsky’s concept of the ZPD, translation test design aligns the following concepts (see Table 3).

Table 3 How translation test is related to ZPD (1979).

Methodology

Research design

This study adopted the Design and Development Research (DDR) approach in developing a translation test. DDR established a robust framework for developmental research, systematically studying design, development, and evaluation processes to create an empirical basis for instructional and non-institutional products and tools (Richey and Klein 2014). The DDR approach was chosen due to its comprehensive, systematic, and iterative characteristics. This method enables a structured analysis of needs, design and development of the test, and evaluation of its effectiveness across three phases: Analysis, Design and Development, and Evaluation. DDR involves an iterative process where the initial design is tested, evaluated, and refined based on feedback and performance data, allowing the researcher to adapt the translation test to align with students’ needs and the requirements of translation teaching.

Participants

Stratified sampling is a widely used probability sampling method in survey research, which involves dividing a population into distinct subgroups or strata based on characteristics relevant to the research objectives. In this study, stratified sampling was used to ensure balanced representation across academic levels. Participants were grouped by academic year—Year 2, Year 3 and Year 4—to include students from each relevant category. The target population consisted of 453 translation undergraduates enrolled in their second, third, and fourth years at a public university. Two groups of samples participated in the experiment. In the first stage, a pilot sample comprising 182 translation undergraduates was selected based on a sample size formula. In the second stage, a main study sample of 209 students was chosen.

Research instrument

The researcher aimed to develop a translation test comprising six texts to identify the translation ability of undergraduate students majoring in translation. Texts from three national exams—NAETI-4 (National Accreditation Examinations for Translators and Interpreters-4), NAETI-3 (National Accreditation Examinations for Translators and Interpreters-3) and TEM-8 (Test for English Majors-8)—spanning several past years were utilized in the construction of the translation test. Two sets of the test (Test 1 & Test 2) were developed, each containing six texts for translation: three from English to Chinese and three from Chinese to English.

In addition, an interview protocol was developed to gather feedback from participants following the completion of the main study. Six structured questions were formulated to explore the challenges participants faced when translating different types of texts with varying levels of difficulty and how they addressed these challenges. Participants were required to provide examples for each question to ensure comprehensive feedback.

Test development process

The DDR approach, including Analysis, Design, Development, and Evaluation phases as shown in Fig. 1, was chosen for constructing the translation test. In Phase 1, the Undergraduate Program for Translation Studies (UPTS) and three translation exams were carefully analyzed to identify essential components for developing the test. Phase 2 involved formulating the test’s objectives, structure, content, and marking scheme. Subsequently, the translation test, translation descriptors, and interview protocols are constructed. At the same time, a pilot study was conducted to check the validity, reliability and amount of time taken to answer the translation test. In Phase 3, the main study evaluated the effectiveness of the test by analyzing participant performance and feedback. During the whole process, the translation test has undergone two iterations. The first iteration involved expert review to ensure content validity of the translation test in Phase 2. The second iteration was implemented after the main study, where refinements were made to the descriptors based on students’ feedback in the interview.

Fig. 1
figure 1

Development process.

Analysis

In the analysis phase, the researcher began by examining the Undergraduate Program for Translation Studies (UPTS). This document is a foundational guideline for undergraduate translation education, clearly defining the cultivation objectives, curriculum system and graduation requirements for undergraduates majoring in translation (see Table 4). A translation test must meet the requirements of UPTS, as it is an important means to measure whether students have achieved the cultivation objectives, mastered the curriculum content, and met the graduation requirements. The researcher conducted a detailed analysis of UPTS from three aspects. Table 4 illustrates how a translation test can be designed to meet these requirements. The three national exams—NAETI-4, NAETI-3, and TEM-8—were analyzed to provide references for designing the contents of translation test. The insights gained from this analysis were used to make informed adjustments in the subsequent design and development phase.

Table 4 Requirements of UPTS.

Design

This phase focuses on establishing the objectives, structures, content and marking scheme of the translation test that can accurately assess translation undergraduates’ translation ability, as depicted in Table 5.

Table 5 Tasks and outcomes of the design phase.

Development

The primary focus of the development phase is to develop the translation test, conduct pilot study, develop translation descriptors, and develop interview protocols.

Development of the translation test

In this study, the researcher sourced materials from three well-recognized translation exams: NAETI-4, NAETI-3 and TEM-8. These exams increase in difficulty, with TEM-8 being the most advanced and only available to translation majors in their fourth year. Descriptive and expository texts were selected from NAETI-4, narrative and persuasive texts were taken from NAETI-3, and argumentative and literary texts were obtained from TEM-8. Throughout the process, no modifications were made to these texts in terms of length, content, or structure. Every effort was made to ensure they remained identical to their original sources. This was done because maintaining the integrity of the original texts is crucial for guaranteeing the authenticity of the translation materials.

The prototype translation test consists of three parts, each aligning with the students’ proficiency levels. Part A (descriptive and expository texts) is for Year 2 students due to its lower difficulty. Part B (narrative and persuasive texts) suits Year 3 students with moderate difficulty. Part C (argumentative and literary texts) targets Year 4 students owing to its higher difficulty. Ebel and Frisbie (1972) suggested that test item difficulty should follow a normal distribution: about 50% medium, with fewer easy and difficult items. This avoids floor or ceiling effects and ensures fairness for most test-takers (Bachman and Palmer 1996). Based on this, the proportion of the test score is 25% easy, 50% medium, and 25% difficult.

In summary, this translation test aims to identify the translation ability of translation undergraduates of Year 2, Year 3 and Year 4 (see Table 6). The test is meticulously structured, progressing from easier to more challenging parts, thereby being capable of comprehensively assessing their translation ability.

Table 6 Prototype translation test.

As recommended by Almanasreh et al. (2019), a content validity assessment was conducted subsequent to the test development. The researcher invited three content experts to evaluate the content validity of the translation test. Before participating, informed consent was obtained from all three experts. Two weeks before the meeting, they received the original versions of Test 1 and Test 2 by email. During the meeting, the experts were provided with a content-validity evaluation form to assess based on four key elements: text type, text length, text difficulty, and the marking scheme. The form featured a 5-point Likert scale, where 1 denoted the least suitable and 5 the most suitable. According to Setambah et al. (2017), the scores provided by the experts should be converted into percentage values to determine content validity using the following formula:

$$\frac{{Total\; Expert\; Score}}{{Total\; Maximum\; Score}}* 100={Content\; Validity\; Level}$$

Following the evaluations, the researcher calculated the inter-rater validity results using the Percentage Calculation Method (PCM) to determine expert agreement rates. A Content Validity Level (CVL) of 70% or higher is considered indicative of high content validity (Setambah et al. 2017). The researchers’ calculations revealed that all items achieved values exceeding 70%, demonstrating positive ratings and consistent agreement among the three experts.

The experts were also required to provide written comments on each text in the translation test. The researcher collated and analyzed their feedback. Based on the evaluation of Test 1, revisions were needed, particularly concerning text typology and difficulty. By considering both the experts’ ratings and their detailed comments, the researcher decided to replace Text 2 with another expository text from NAETI-4 exam. This replacement was a direct response to the content experts’ suggestions. The revised test was subsequently reviewed by them, who confirmed its appropriateness.

Piloting the translation test

A pilot study was conducted to examine the construct validity and reliability of the developed translation test. The researcher recruited 182 Year 2, Year 3, and Year 4 translation undergraduates from a public university as the pilot sample (98 for Test 1 and 84 for Test 2). Three experienced translation instructors from the same university were then invited to serve as raters. To ensure fairness and consistency, each rater independently evaluated the students’ performance using the provided marking scheme. The researcher grouped the scores obtained from the pilot test into 3 parts, namely Part A (basic level), Part B (intermediate level) and Part C (advanced level), and keyed-in to SPSS (version 20). Then, the researcher calculated respectively the mean percentage scores of the respondents’ (Year 2, 3 and 4) translation performance when answering Part A, Part B and Part C (see Table 7).

Table 7 Comparison of the mean (%) at educational levels and difficulty levels for pilot study for test 1.

The study found that respondents with higher educational levels performed better than those from the lower levels across Part A (basic), Part B (intermediate), and Part C (advanced) in both Test 1 and Test 2. The Mean (%) for Part A is higher than for Parts B and C because Part A is derived from NAETI-4, an exam suitable for Year 2 translation undergraduates, making it the easiest section. Part B, from NAETI-3, is intended for Year 3 students, resulting in a lower Mean (%). Part C, taken from TEM-8, is designed for Year 4 students, explaining its lowest Mean (%). These results demonstrate strong construct validity of the test.

Inter-rater reliability is defined as a generic term for rater consistency, and it relates to the extent to which raters can consistently distinguish different items on a measurement scale (Chaturvedi and Shweta 2015). Good inter-rater reliability means that different raters mark a test and get roughly the same scores (Brown 2006). In order to check the inter-rater reliability, the researcher organized three independent raters to grade the translation test based on the marking scheme.

The intraclass correlation coefficient (ICC) was calculated using SPSS to assess the inter-rater reliability among the three raters. ICC quantifies the degree of agreement and consistency among raters for two or more numerical or quantitative variables (Bujang and Baharum 2017). Typically, an ICC value above 0.8 or 0.9 is considered indicative of good or excellent inter-rater reliability (McGraw and Wong 1996; Koo and Li 2016). For both Test 1 and Test 2, the ICC values calculated from the three raters exceeded 0.9, demonstrating excellent inter-rater consistency across both assessments.

The pilot study also determined the time taken to complete the translation test. All the respondents sat for the test. The researcher recorded the completion times for all participants, from the fastest to the slowest, and calculated the average time taken by them. Specifically, Year 2 students took about 168 min to complete the test, while Year 3 and Year 4 students took 160 min and 153 min, respectively. Based on these findings, the duration allocated for completing the translation test was set at 160 min, which is the average of the total time taken by the three surveyed groups.

Development of the translation descriptors

After administering the prototype test and obtaining the scores, respondents were categorized into several performance bands to determine their translation performance. The researcher utilized Z-scores to establish the range of cut scores between bands, thereby ensuring the accuracy and reliability of the classification. The Z-score, also known as the standard score, is a measure of how many standard deviations a score is above or below the mean (Abdi 2007).

The combined mean score of respondents in Test 1 was 62.8 with a standard deviation (SD) of 15.2, while in Test 2, the mean score was 62.4 with an SD of 13.9. The mean scores and relevant raw scores were then rounded to the nearest whole number. The scores for the various performance bands were then calculated based on Z-scores. Finally, the respondents were categorized into five bands based on the cut-scores. The researcher further calculated the average scores of Test 1 and Test 2 to establish the final cut scores for the translation test. Using the Mean and Standard Deviation (SD), the researcher developed rational terms to describe undergraduates’ translation ability (see Table 8). Additionally, the Mean and SD values were converted into percentages to determine the corresponding percentile ranks. The translation ability of undergraduates should also conform to this assumption: from low and very limited translation ability to high and professional translation ability, and there is a continuous normal distribution in the middle (Zhu 2015).

Table 8 Terms used for translation descriptors for the performance bands.

After deciding the terms used to describe for the translation descriptors, the researcher calculated the respondents’ performance of different texts in the pilot study. Year 2, Year 3, and Year 4 respondents were classified into Bands 1–5 based on their test scores and cut scores. The mean percentage scores for each text type within each band were then calculated (see Table 9). For example, in Test 1, the total scores for descriptive texts from Band 3 respondents were summed and divided by the maximum possible score for descriptive texts in that band, reflecting Band 3 respondents’ performance on descriptive texts.

Table 9 Average mean% of test 1 & test 2.

Together with the Table 8 Terms used for Translation Descriptors for The Performance Bands, the researcher formed a set of initial descriptors for translation test (see Table 10, using a sample of a descriptive text as an example).

Table 10 Initial translation descriptors (taking descriptive text as an example).

Developing interview protocol

In this section, the interview questions and their corresponding scoring rubrics were developed and validated to assess the respondents’ performance in the translation test. Initially, six questions were designed to elicit feedback from respondents regarding their experience translating different types of texts with varying difficulties. Three experts evaluated the questions using a content validity form with a Likert scale. The interview questions achieved a CVL of above 80%, confirming good content validity.

Based on feedback from the three content experts, the researcher made slight revisions to the question format. The updated questions were sent back to the experts, who responded positively. A pilot interview is a crucial and beneficial process in qualitative research as it can reveal necessary improvements for the main study (Aung et al. 2021). During the interviews, researchers must ensure that participants understand the questions while simultaneously obtaining relevant data to address the research objectives (Dikko 2016). Therefore, a pilot interview was conducted to evaluate the reliability of the interview questions intended for the actual study. The intraclass correlation coefficient (ICC) was calculated using SPSS to assess inter-rater reliability among the three raters. The result showed an Average Measure ICC of 0.886, indicating strong consistency.

Evaluation

The researcher conducted a main study with 106 students participating in Test 1 and 103 students participating in Test 2. Following the main study, interviews were conducted to collect students’ feedback and reflections on the testing experience. To evaluate the effectiveness of the translation test, both the quantitative results (test scores) and qualitative data (interview responses) were analyzed. On the quantitative side, participants’ test scores were analyzed using descriptive statistics, including the calculation of mean percentage scores for each text type under different bands. This helped reveal performance patterns and differences. On the qualitative side, interview data were examined through thematic analysis to explore the challenges, solutions and overall experiences with the translation test. Together, these two approaches provided deeper insights into the effectiveness of the test.

Results and discussions

Participants’ performance in translation test

After completing the main study, the participants’ performance on the translation test were analyzed. Figure 2 shows the mean percentage scores in translating different types of text.

Fig. 2
figure 2

Mean percentage in translating different types of text (test 1 & test 2).

These results revealed a clear scoring rate gradient across different bands. The mean% for all text types gradually decrease from Band 5 to Band 1, but with varying drops by text type. For instance, literary texts show a sharp decline from Band 5 to Band 1, indicating increased difficulty and significant differences among students. In contrast, descriptive texts exhibit a slower decline, suggesting most students can handle basic translation tasks. This pronounced performance gradient strongly supports the test’s construct validity, effectively distinguishing translation ability.

Additionally, it was found that the performance gap narrowed as the bands increased. At higher bands, like Band 5, the performance gap between simple and complex texts gradually decreases. Band 5 students demonstrate consistent performance across all text types, possessing comprehensive translation ability. This confirms that the test aligns with the expectation that higher ability leads to more balanced performance and can effectively track students’ growth in skills over time.

Thematic analysis of the participants’ feedback

Ten participants, two participants from each of the five bands were interviewed regarding their translation experiences, focusing on challenges faced and solutions used in translating different types of texts (See Table 11).

Table 11 List of codes for the interviews.

Thematic analysis, a qualitative method for identifying and interpreting patterns in data (Wæraas 2022), was used in this study. Following Braun and Clarke’s (2006) six-step framework, namely: data familiarization, initial codes generation, searching for themes, reviewing themes, defining and naming themes, and writing the report, the researcher began by reading interview transcripts and taking notes. Initial codes were generated and refined into 28 different codes, which were grouped into 20 categories. From these, several key themes emerged that were reflected by the participants’ responses, which included: Understanding of Source Text, Lexical Resource, Linguistic differences between Chinese and English, and Translation Techniques. To ensure reliability, each theme was cross-checked against the data to confirm it was clearly supported by participants’ responses.

Problems of translating different types of texts

Theme one: understanding of source text

Codes attributed to this theme depicted the challenges students encountered primarily in English-to-Chinese translation. Due to the unfamiliar words, complex sentences or professional terms, students had difficulties in understanding the source text (usually the English version) or misunderstood it, therefore facing problems in the translation process.

“…When translating descriptive text, I encountered some new words which I had no idea of their meaning. At first, I would guess their meanings according to the context. If such familiar words were not too many, I think it’s Ok. But if they appeared frequently, indeed affected my general understanding of the article.…” (FOI-S1B2-25/6/2024)

“…Actually, some of the terms are kind of familiar, but I still feel like the translation doesn’t quite make sense. I may not be aware of the additional meanings of certain professional terms or nuances, such as polysemy. Some of the terms I may know, but when they are combined together, I don’t know the meaning..…” (FOI-S2B2-25/6/2024)

“…The long and complex sentences were quite challenging for me in translating argumentative text. Although the vocabulary was not too difficult, the sentence structure was very complex which made me hard to understand the sentence meaning, maybe I misunderstood them. I did not fully grasp the article’s meaning.…” (FOI-S1B4-25/6/2024)

To accurately and faithfully convey the writer’s intention, original thoughts, and opinions in the target language (TL), a translator must possess not only a high degree of linguistic sensitivity but also solid understanding of the source language. Seresová and Breveníková (2019) emphasized the importance of a “thorough understanding of the text to be translated” to create a quality target text. Cragie and Pattison (2018) created a framework to guide students in comprehending source texts more effectively, thereby enhancing their ability to handle translation assignments. In this research, students’ feedback showed that there were several factors that affected the understanding of source text in translation, namely unfamiliar words, professional terms, complex sentences to name a few. The degree of understanding source text would directly affect the quality of translation.

Theme two: lack of lexical resource

Codes attributed to this theme depicted the challenges students faced mainly in Chinese-to-English translation. Due to the limited lexical resource, students had problems in translating four-character idioms, expressions with Chinese culture characteristics, and vivid descriptions etc., for they had no idea how to translate appropriately with accuracy and fluency.

“…I met many Chinese four-character expressions with cultural flavor. It’s easy to understand but hard to express. If you have very limited word resource, that would make your translation ordinary and plain.…” (FOI-S2B2-25/6/2024)

“…In terms of narrative text, translating vivid descriptions, such as physical appearances, expressions, actions, and scenery, was challenging due to time constraints and a lack of vocabulary…”(FOI-S2B1-25/6/2024)

“…the main issue was how to translate using more natural expressions. For example, in argumentative text, there was a parallel sentence. It’s easy to understand this sentence, but translating it with more idiomatic expressions tested one’s vocabulary resources…” (FOI-S2B4-25/6/2024)

Translation is a complex process that requires good English skills, especially vocabulary mastery (Ningrum and Dhewi 2023). Students’ translation accuracy are very much influenced by the amount of vocabularies they possessed. The larger their repertoire of vocabulary the students possess, the more their accuracy in translation (Kulsum 2020). Students with substantial knowledge of vocabulary and grammar find it easier to translate texts. According to students’ feedback in this research, while they often understood the meanings of certain Chinese expressions and also knew the meaning of some expressions, such as Chinese phrases, four-character expressions with cultural flavor, vivid descriptions, four-character idioms with cultural connotations, their limited lexical knowledge hinder them to find appropriate equivalents for accurate translation.

Theme three: linguistic differences between Chinese and English

Codes attributed to this theme depicted the challenges students faced both in English-to-Chinese and Chinese-to-English translation. Owing to the significant linguistic differences between Chinese and English, including the way of expressing ideas, language-specific features, sentence structure, and cultural nuances, translators faced considerable difficulties in translations.

“…The main issue was the difference between English and Chinese in terms of narrative order, with English generally stating the conclusion first and Chinese narrating before presenting the conclusion…” (FOI-S2B5-25/6/2024)

“…when translating this literary text, if the unique characteristics and nuances of Chinese verbs are not considered and a literal translation into English is used, semantic or logical errors frequently occur…” (FOI-S1B3-25/6/2024)

The literature is rich concerning the linguistic differences between Chinese and English, which were typical of the Sino-Tibetan and Indo-European language family respectively (Gu 2005; Link 2012). Translators should well master the differences between two languages, thus overcoming the barriers of translation (He 2011). Elhadary (2023) shared a quite similar idea that recognizing the impact of linguistic disparities between two languages is crucial for producing accurate and meaningful translations. It was noted from the interview in this research that students frequently talked about how these linguistic differences influenced translation, highlighting specific aspects such as the unique characteristics and nuances of Chinese verbs, the different ways of expressing ideas as well as sentence structure between Chinese and English. Students agreed that failing to adequately address these linguistic differences would compromise the quality of translations.

Solutions to the problems in translation test

Theme four: translation techniques

Codes attributed to this theme revealed the translation skills students used when faced with problems during translation. Students from the lower-band students (Band 1 and Band 2) tend to ignore and skip the problems, adopting translation with simplification or omissions. They also resorted to literal translation, regarding it as a reliable method to ensure accuracy. In contrast, students from the higher-band students (Band 3 and Band 4) demonstrated superior performance. They knew how to solve the problems encountered in a proper way. For instance, when faced with unfamiliar words or phrases, they were able to infer meanings from context rather accurately; When dealing with complex phrases or sentence structures, they were able to apply translation techniques skillfully to produce smooth and natural translations. The highest-band students displayed the most flexibility in using translation techniques, producing both accurate and fluent translations.

“…If I came across words I didn’t know, I would skip them because of time constraints…When I couldn’t translate a sentence, I simplified it into a basic sentence….” (FOI-S1B1-25/6/2024)

“…When I had trouble finding appropriate words for translation, I preferred literal translation, which can guarantee accuracy to some extent. I knew this may cause awkward and unnatural in translation, but it’s my best choice…” (FOI-S2B1-25/6/2024)

“…Overall it’s relatively easy. However, it’s easy to be constrained by the original expression, resulting in rigid translations. So I read through the entire text to grasp the main idea, and then appropriately added function words or adjusted the word order to make the text smoother…” (FOI-S1B3-25/6/2024)

“…I ensured that the translation was both faithful to the original meaning and natural in Chinese…I accurately discerned the original style and faithfully reproduced its style and content in terms of language, sentence structure, organization, and rhetoric…” (FOI-S1B5-25/6/2024)

A translation technique is an operational mechanism put in place by the translator in the course of actual translation (Tardzenyuy 2016). Due to the inherent differences between source and target languages, various translation techniques are utilized to achieve equivalence in translated works. Different students have different understanding of translation techniques (Lestiyanawati et al. 2014). The ability to choose the correct translation technique is an indispensable skill. Therefore, it is essential for translation major students to be aware of why a particular technique is used (Zainudin and Awal 2012). According to the responses elicited from the interview, students utilized various translation techniques to address the challenges encountered, such as simplification, literal translation, omission, and corresponding techniques. When faced with the same problem, students at different proficiency levels may tackle with different methods.

Refining the descriptors based on respondents’ feedback

Assimilating the feedback to develop the descriptors was crucial as it helped to improve the quality, relevance, and effectiveness of the descriptors. Besides, integrating insights and perspectives from the respondents made the descriptors clearer and more straightforward to understand. After the thematic analysis, the researcher added the fourth sub-theme “Translation Techniques” into the descriptors. This sub-theme was all about solutions to students’ problems in translation test, which explained what students under each band can do and cannot do in translating different types of texts, thereby contributing to enriching the descriptors (Table 12).

Table 12 Translation descriptors with participants’ feedback (taking descriptive text as an example).

Incorporating the feedback into the descriptors included the following two procedures:

  1. 1.

    Add participants’ feedback on translating different types of texts under each band into the translation descriptors. For instance, a Band 5 participant mentioned that they could translate descriptive texts accurately and fluently. Consequently, the researcher incorporated the phrase “can translate descriptive texts accurately and fluently” into the Band 5 descriptors.

  2. 2.

    Ensure that the updated descriptors are distinct and do not overlap significantly with existing ones. Additionally, ensure that the descriptors are clear, concise, and accurately reflect the translation performance.

Limitations

The developed translation ability test has several key limitations that constrain its broader applicability and the generalizability of its findings. Primarily, its geographical and institutional specificity is a significant concern and the test was designed and validated exclusively for translation major undergraduates in a public university in China. Furthermore, the sample size and scope are restricted, as all 182 participants were drawn from a single institution. This study was conducted at a public second-tier university in China. Although China has over 300 universities offering translation programs, their status (public or private) and tier (first-, second-, or third-tier) vary greatly, leading to notable differences in student capabilities and educational experiences. The test’s focus on undergraduates also limits its utility, as its suitability for postgraduate students, professional translators, or students in other foreign language majors has not been established. While the test developed shows potential for broader use, its initial validation was limited to its context. Thus, the generalizability of the test to other university types may remain constrained.

Conclusion

To accurately identify translation ability of translation major undergraduates in a public university in China, the study developed the translation test, verified its validity and reliability, and analyzed students’ performance and feedback. Initially, the existing Undergraduate Program for Translation Studies (UPTS) and three translation exams were examined, focusing on cultivation objectives, curriculum system, test structures, test content and its marking scheme. This analysis provided key components to be incorporated into the new translation test. Subsequently, the translation test and its descriptors were developed. The results of the content evaluation and the pilot study demonstrate that the developed instrument displayed satisfactory levels of validity and reliability, suggesting its potential for effective implementation on a larger scale. In addition, thematic analysis of participants’ feedback was helpful to refine the translation descriptors.

The translation test developed in this study effectively assesses the translation ability of undergraduates across different academic years, offering valuable support for teaching. It allows teachers to gain a comprehensive understanding of students’ translation ability, facilitating personalized instruction tailored to individual students’ needs and enhancing overall teaching quality. In addition, by covering multiple year levels, the test provides a useful tool for tracking students’ progress over time and evaluating the development of their translation skills throughout their undergraduate studies. Moreover, the translation test and descriptors provide students with clear insights into their strengths and weaknesses, thereby guiding them on how to improve. Finally, this test helps universities assess the quality of their translation programs by comparing student performance across cohorts. The results can guide curriculum improvement and enhance teaching practices.

Future research should replicate this study across different university types, including both private and public institutions at various levels. It should also examine how the test can support teaching, such as course design and material development. Additionally, further studies could explore how this test framework can be adapted for use in other language contexts and how it may be enhanced through integration with artificial intelligence tools.