LingualX64: a multilingual benchmark for evaluating symmetry and asymmetry in LLM translation

Huang, Yan; Liu, Wei; Wang, Jiayi; Zhu, Huidong

doi:10.1038/s41598-026-49738-y

Download PDF

Article
Open access
Published: 26 April 2026

LingualX64: a multilingual benchmark for evaluating symmetry and asymmetry in LLM translation

Yan Huang¹,
Wei Liu¹,
Jiayi Wang¹^na1 &
…
Huidong Zhu¹^na1

Scientific Reports (2026) Cite this article

623 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Large Language Models (LLMs) have revolutionized Natural Language Processing, including machine translation (MT), achieving unprecedented performance. However, this progress masks underlying asymmetries in training data and model architecture that impact multilingual translation quality. This paper introduces LingualX64, a novel dataset spanning 64 languages, designed to evaluate the extent to which these asymmetries affect LLM translation performance, particularly under zero-shot conditions. LingualX64 is constructed to minimize data overlap with existing LLM training corpora and to provide a balanced representation of diverse linguistic features, enabling a more robust assessment of cross-linguistic generalization. Our evaluation reveals significant performance disparities across languages, highlighting the impact of data scarcity and linguistic complexity on translation quality. These findings underscore the need for strategies to mitigate asymmetries in LLM training and model design to achieve more equitable and robust multilingual translation capabilities. LingualX64 provides a valuable benchmark for researchers and developers seeking to address these challenges and unlock the full potential of LLMs for global communication.

Linguistic features of AI mis/disinformation and the detection limits of LLMs

Article Open access 11 December 2025

Large language models show Dunning-Kruger-like effects in multilingual fact-checking

Article Open access 25 February 2026

Testing AI on language comprehension tasks reveals insensitivity to underlying meaning

Article Open access 14 November 2024

Funding

This research was funded by the Henan Science and Technology Research Project, Zhengzhou, China (242102211060).

Author information

These authors contributed equally: Jiayi Wang and Huidong Zhu.

Authors and Affiliations

Zhengzhou University of Light Industry, Zhengzhou, 450000, China
Yan Huang, Wei Liu, Jiayi Wang & Huidong Zhu

Authors

Yan Huang
View author publications
Search author on:PubMed Google Scholar
Wei Liu
View author publications
Search author on:PubMed Google Scholar
Jiayi Wang
View author publications
Search author on:PubMed Google Scholar
Huidong Zhu
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Huidong Zhu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Language

See Table 4.

Table 4 We list the ISO code, language name, language, alphabet and resource level for each language³⁹.

Full size table

Score

See Tables 5, 6, 7 and 8.

Table 5 BLEU scores on xx\({\Rightarrow }\)en.

Full size table

Table 6 BLEU scores on xx\({\Rightarrow }\)zh.

Full size table

Table 7 COMET scores on xx\({\Rightarrow }\)en.

Full size table

Table 8 COMET scores on xx\({\Rightarrow }\)zh.

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, Y., Liu, W., Wang, J. et al. LingualX64: a multilingual benchmark for evaluating symmetry and asymmetry in LLM translation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-49738-y

Download citation

Received: 10 January 2026
Accepted: 16 April 2026
Published: 26 April 2026
DOI: https://doi.org/10.1038/s41598-026-49738-y