Abstract
Large language models (LLMs) have exhibited remarkable abilities in understanding and generating human language, which is applied in transferring languages. However, the translation of literary works presents unique challenges. The translation quality of literary works generated by LLMs is yet to be explored and tested. Therefore, this study aims to evaluate the quality of translations produced by various LLMs in comparison to a well-established human-translated work. The famous Chinese literary work Border Town by Shen Congwen was selected as the source text. ChatGPT 4, ChatGPT 4o, WXYY 4.0 Turbo, and Gemini were adopted as the models to process the translation. Jeffrey Kinkley’s translation was chosen as the human translation for comparison. This research employs Multidimensional Quality Metrics to evaluate translation quality by providing detailed error typologies. We focused on error analysis from three key dimensions of translation quality: accuracy, fidelity, and cultural appropriateness. The results showed that five types of errors were identified: mistranslation, omission, over-translation, cultural mistranslation, and discourse-level errors. Mistranslation has the top frequency in all models, omission occurs the most in Gemini, over translation and cultural mistranslation appear the most in GPT4. Discourse-level error occurred in WXYY 4.0 turbo the most. GPT-4o appears to yield comparatively higher translation quality under the MQM framework. The research reveals that literary translation by LLMs requires more specific training prompt strategies and human post-editing to improve its accuracy, fidelity, and cultural appropriateness.
Similar content being viewed by others
Data availability
The materials analyzed in this study include excerpts from Shen Congwen’s Border Town, the large language model–generated translations, and the corresponding MQM-based error annotation data. Due to copyright restrictions on the original literary text, the source text excerpts cannot be publicly shared. The LLM-generated outputs, annotated evaluation data, coding manual, and other related files are available from the corresponding author upon reasonable request.
References
Chen X (2019) When translation meets psychoanalysis: a study in contemporary Chinese literary translation [Doctoral thesis]. State University of New York
Fernandes P, Deutsch D, Finkelstein M, Riley P, Martins AFT, Neubig G, Garg A, Clark J, Freitag M, Firat O (2023) The devil is in the errors: leveraging large language models for fine-grained machine translation evaluation. In: Conference on machine translation - proceedings. https://doi.org/10.48550/arXiv.2308.07286
Freitag M, Foster G, Grangier D, Ratnakar V, Uszkoreit J (2021) Experts, errors, and context: A large-scale study of human evaluation for machine translation. Trans Assoc Comput Linguist. https://doi.org/10.1162/tacl_a_00437
Guo X, Ang LH, Rashid MS, Ser WH (2020) The translator’s voice through the translation of characters’ names in Bian Cheng. Southeast Asian J Engl Lang Stud 26:81–95
He Z, Liang T, Jiao W, Zhang Z, Yang Y (2024) Exploring human-like translation strategy with large language models. Trans Assoc Comput Linguist. https://doi.org/10.1162/tacl_a_00642
Hermans T (2007) Literary translation. In: Kuhiwczak P, Littau K (eds), A companion to translation studies. Multilingual Matters, p 77–91 https://translationjournal.net/journal/45review.htm
Hu F (2019) A study on English translations of Bian Cheng from the perspective of imagology [Master thesis]. Shanghai International Studies University
Jiao W, Huang J-T, Wang W, He Z, Wang X, Tu Z (2023) ParroT: Translating during chat using large language models. In Findings of the Association for Computational Linguistics: EMNLP 2023. pp.15009−15020. https://doi.org/10.18653/v1/2023.findings-emnlp.1001
Klubicka F, Toral A, Sánchez-Cartagena V (2018) Quantitative fine-grained human evaluation of machine translation systems: a case study on English to Croatian. Mach Transl 32:195–215. https://doi.org/10.1007/s10590-018-9214-x
Kocmi T, Bojar O, Federmann C, Graham Y, Grundkiewicz R, Haddow B, Zampieri M (2025) Findings of the 2025 conference on machine translation (WMT25). In: Proceedings of the eighth conference on machine translation (WMT)
Kocmi T, Federmann C (2023) Large language models are state-of-the-art evaluators of translation quality. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation. pp. 193-203. https://aclanthology.org/2023.eamt-1.19/
Li J, Zhou H, Huang S, Cheng S (2024) Eliciting the translation ability of large language models via multilingual finetuning with translation instructions. Trans Assoc Comput Linguist. https://doi.org/10.1162/tacl_a_00655
Liu L (2019) A corpus-based study of the translator’s style in four English translations of Biancheng [Master Thesis]. East China University of Science and Technology
Liu N (2023) A contrastive study of three English versions of Biancheng from the perspective of corpus-based critical translation studies. Modern Linguistics. pp. 145–157. https://doi.org/10.12677/ML.2023.111021
Lommel A (2018) Metrics for translation quality assessment: a case for standardising error typologies. In: Moorkens J, Castilho S, Gaspari F, Doherty S (eds) Translation quality assessment. machine translation: technologies and applications, Vol 1. Springer, Cham. https://doi.org/10.1007/978-3-319-91241-7_6
Lu Q, Qiu B, Ding L, Xie L, Tao D (2023) Error analysis prompting enables human-like translation evaluation in large language models: a case study on ChatGPT. In Findings of the Association for Computational Linguistics: ACL 2024. pp. 8801−8816. https://doi.org/10.18653/v1/2024.findings-acl.520
Ma R (2022) China’s image in Jefferey Kinkley’s translation of Border Town: An imagological approach [Master thesis]. Beijing Foreign Studies University
Mikoyan A (2019) Understanding in literary translation. Armenian Folia Anglistika. 64–85. https://doi.org/10.46991/afa/2019.15.2.064
Ponzio A (2007) Translation and the literary text. TTR 20:89–119. https://doi.org/10.7202/018823AR
Sai A, Nagarajan V, Dixit T, Dabre R, Kunchukuttan A, Kumar P, Khapra M (2023) IndicMT Eval: a dataset to meta-evaluate machine translation metrics for indian languages. In: Proceedings of the 61st annual meeting of the association for computational linguistics (Volume 1: Long Papers), https://doi.org/10.18653/v1/2023.acl-long.795
Sun S (2015) Measuring translation difficulty: theoretical and methodological considerations. Across Lang Cult 16:29–54. https://doi.org/10.1556/084.2015.16.1.2
Thompson G, Dooley K (2019) Ensuring translation fidelity in multilingual research. In: The Routledge handbook of research methods in applied linguistics. Routledge, p 63–75. https://doi.org/10.4324/9780367824471-6
Venuti L (1995) The translator’s invisibility: a history of translation. In: The translator’s invisibility. Routledge. https://doi.org/10.4324/9780203360064
Wang L, Lyu, C, Ji, T, Zhang, Z, Yu D, Shi S, Tu Z (2023) Document-level machine translation with large language models. In: Proceedings of the 2023 conference on empirical methods in natural language processing. https://doi.org/10.18653/v1/2023.emnlp-main.1036
Wang Z (2014) The translator’s subjectivity in literary translation. Comp Lit 19:96–111. https://doi.org/10.1080/25723618.2014.12015489
Wright C (2016) Literary translation, 1st edn. Routledge. https://doi.org/10.4324/9781315643694
Xu H, Kim YJ, Sharaf A, Awadalla HH (2024) A paradigm shift in machine translation: boosting translation performance of large language models. Preprint at https://arxiv.org/abs/2309.11674
Xu M (2012) On scholar translators in literary translation–a case study of Kinkley’s translation of ‘Biancheng. Perspect Stud Translatol 20:151–163. https://doi.org/10.1080/0907676X.2011.554610
Xu M (2019) Translation of modern Chinese literature in America: an interview with Jeffrey C. Kinkley. ARIEL 50:127–138. https://doi.org/10.1353/ari.2019.0036
Xu M, Yu J (2019) Sociological formation and reception of translation: the case of Kinkley’s translation of Biancheng. Transl Interpret Stud 14:333–350. https://doi.org/10.1075/tis.19039.xu
Zhang B, Haddow B, Birch A (2023) Prompting large language models for machine translation: a case study. In: Proceedings of the 40th international conference on machine learning (ICML'23). https://doi.org/10.5555/3618408.3620130
Zhu W, Liu H, Dong Q, Xu J, Kong L, Chen J, Huang S (2023) Multilingual machine translation with large language models: Empirical results and analysis. https://doi.org/10.48550/arXiv.2304.04675
Zuo Y, Ching GS, Khotsing R (2024) The application of ChatGPT in literary translation: a case study from Thai to Chinese. In: Uden L, Liberona D (eds) Learning technology for education challenges. LTEC 2024. Communications in computer and information science, vol 2082. Springer, Cham. https://doi.org/10.1007/978-3-031-61678-5_24
Author information
Authors and Affiliations
Contributions
In terms of author contributions, WY wrote the main manuscript text, MXY prepared tables, and both conducted the analysis of the translation errors. All authors reviewed and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical approval
This study analyzes literary texts and machine-generated translations and does not involve human participants, personal data, or animal subjects. Therefore, ethical approval was not required.
Informed consent
This article does not contain any studies with human participants performed by any of the authors therefore, informed consent was not required.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, W., Yang, M. Evaluating literary translation by large language models: a multidimensional quality assessment of Shen Congwen’s Border Town. Humanit Soc Sci Commun (2026). https://doi.org/10.1057/s41599-026-06868-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1057/s41599-026-06868-y


