NGC | Catalog
CatalogModelsriva_megatronnmt_any_en_1b

riva_megatronnmt_any_en_1b

Description
Multilingual 1.1B Any2en model
Publisher
-
Latest Version
2.14.0
Modified
January 18, 2024
Size
2.03 GB

Translation: Megatron 1.1B

Model Overview

This model can be used for translating text in source language (32 languages) to a text in target language (En).

Model Architecture

The model is based on Transformer "Big" architecture originally presented in "Attention Is All You Need" paper [1]. In this particular instance, the model has 12 layers in the encoder and 2 layers in the decoder. It is using SentencePiece tokenizer [2].

Training

These models were trained on a collection of many publicly available datasets comprising of millions of parallel sentences.

Tokenizer Construction

We used the SentencePiece tokenizer [2] with shared encoder and decoder BPE tokenizers.

How to Use this Model

The Riva Quick Start Guide is recommended as the starting point for trying out Riva models. For more information on using this model with Riva Speech Services, refer to the Riva User Guide.

Input

This translate method of the NMT model accepts a list of de-tokenized strings.

Output

The translate method outputs a list of de-tokenized strings in the target language.

References

[1] Vaswani, Ashish, et al. "Attention is all you need." arXiv preprint arXiv:1706.03762 (2017). [2] https://github.com/google/sentencepiece [3] https://en.wikipedia.org/wiki/BLEU [4] https://github.com/mjpost/sacreBLEU [5] NVIDIA NeMo Toolkit

Suggested Reading

Refer to the Riva documentation for more information.

License

By downloading and using the models and resources packaged with Riva Conversational AI, you accept the terms of the Riva license.

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.