Machine Translation: Multilingual 1.6B Any-Any NMT Model - Model Overview

Description:

The Megatron Multilingual 1.6B Neural Machine Translation model translates text in any to any directions across the 37 supported languages, including non-English centric translation (such as French to Chinese, etc). The Supported languages are: English(en), Czech(cs), Danish (da), German(de), Greek(el), European Spanish(es-ES), LATAM Spansish(es-US), Finnish(fi), France(fr), Hungarian(hu), Italian(it), Lithuanian(lt), Latvian(lv),Dutch(nl), Norwegian(no), Polish(pl), European Portuguese(pt-PT), Brazillian Portuguese(pt-BR), Romanian(ro), Russian(ru), Slovak(sk), Swedish(sv), Simplified Chinese(zh-CN), Traditional Chinese(zh-TW), Japanese(ja), Hindi(hi), Korean(ko), Estonian(et), Slovenian(sl), Bulgarian(bg), Ukrainian(uk), Croatian(hr), Arabic(ar), Vietnamese(vi), Turkish(tr), Indonesian(id), Thai(th). This model is ready for commercial use.

Model Architecture

Architecture Type: Transformer

Network Architecture: Megatron

The model is based on Transformer architecture originally presented in "Attention Is All You Need" paper [1]. In this particular instance, the model has 24 layers in the encoder and 24 layers in the decoder. It is using SentencePiece tokenizer [2].

Input:

Input Type(s): Text String
Input Format(s): List

Other Properties Related to Input: No Pre-Processing Needed; No Tokenization required; 1024 Character Text String Limit (No non-textual characters)

Output:

Output Type(s): Text String
Output Format: List
Output Parameters: Selected Language
Other Properties Related to Output: Outputs are not tokenized or processed to hide sensitive input information

References:

[1] Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017). [2] https://github.com/google/sentencepiece [3] https://en.wikipedia.org/wiki/BLEU [4] https://github.com/mjpost/sacreBLEU [5] NVIDIA NeMo Toolkit

Software Integration:

Runtime Engine(s): [Riva 2.18.0]

Supported Hardware Platform(s):

NVIDIA Ampere
NVIDIA Hopper
NVIDIA Jetson
NVIDIA Lovelace
NVIDIA Turing
NVIDIA Volta

Supported Operating System(s):

Linux
Linux 4 Tegra

Model Version(s):

rmir_nmt_megatron_1b_any_any:2.18.0

Training & Evaluation Dataset:

** Data Collection Method by dataset

[Human]

** Labeling Method by dataset

[Automated]

Performance of the models

The performance of the model from Any -> Any direction for Flores-101 dataset

|-----------|-------|-------|-------|-------|-------|-------|-------|

Languages	de	es-es	es-us	fr	ja	ru	zh-cn
de	-	24.50	24.10	39.30	27.30	26.10	33.30
es-es	22.10	-	-	30.30	23.50	20.20	29.80
es-us	22.10	-	-	30.30	23.50	20.20	29.80
fr	25	24.80	30.40	-	26.60	25.50	32.70
ja	16.90	16.40	18.10	23.70	-	15.20	28.90
ru	22.40	21.90	26.40	33.40	25.40	-	30.90
zh-cn	17.50	17.30	19.10	25.60	16.80	23.70	-
-----------	-------	-------	-------	-------	-------	-------	-------

The performance of any->en and en->any direction for Flores-101 dataset

Language	Eng -> Language	Language -> Eng
Czech	32.90	41.10
Danish	46.20	49.60
German	38.20	45.20
Greek	27.50	36.50
European Spanish	27.60	30.70
Latin American Spanish	26.80	30.70
Finnish	22.70	35
French	50.50	46.50
Hungarian	26.70	36.90
Italian	29.90	34.50
Lithuanian	27.50	35.10
Latvian	31.00	37.00
Dutch	26.70	32.60
Norwegian	34.00	44.80
Polish	20.80	30.30
European Portugese	48.10	50.50
Brazil Portugese	49.80	50.50
Romanian	40.70	45.00
Russian	31.30	36.10
Slovak	35	40.60
Swedish	45.00	49.60
Simplified Chinese	39.50	28.50
Traditional Chinese	30.80	26.80
Japanese	32.50	26.70
Hindi	33.50	39.90
Korean	28.00	29.50
Estonian	27.30	38.90
Slovenian	30.70	36.20
Bulgarian	41.80	42.10
Ukrainian	30.70	40.20
Croatian	27.90	37.80
Arabic	28	40.60
Vietnamese	41.80	36.90
Turkish	29.50	38.80
Indonesian	47.20	44.90
Thai	30.90	28.10

Inference:

Engine: Triton

Test Hardware:

NVIDIA Volta V100
NVIDIA Turing T4
NVIDIA A100 GPU
NVIDIA A30 GPU
NVIDIA A10 GPU
NVIDIA H100 GPU
NVIDIA L4 GPU
NVIDIA L40 GPU
NVIDIA Jetson Orin
NVIDIA Jetson AGX Xavier
NVIDIA Jetson NX Xavier

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.

Publisher

NVIDIA

Latest Version2.18.0

UpdatedMarch 7, 2025 UTC

Compressed Size2.96 GB

Labels

Translation