Table 3 Comparison of FLORES-101 devtest

From: Scaling neural machine translation to 200 languages

 

eng_Latn-xx

xx-eng_Latn

xx-yy

Average

87 languages

M2M-100

–/–

–/–

–/–

13.6/–

Deepnet

–/–

–/–

–/–

18.6/–

NLLB-200

35.4/52.1

42.4/62.1

25.2/43.2

25.5/43.5

101 languages

DeltaLM

26.6/–

33.2/–

16.4/–

16.7/–

NLLB-200

34.0/50.6

41.2/60.9

23.7/41.4

24.0/41.7

  1. We evaluated using FLORES-101 for 10,000 directions. We report both spBLEU and chrF++ scores when available. Scores for DeltaLM are taken from the FLORES-101 leaderboard. M2M-100 and Deepnet averages only apply to 87 languages that overlap with FLORES-101. The performance of NLLB-200 was evaluated on this subset of languages. The highest score in each column and in each grouping of languages is shown in bold.