Transformer en-de checkpoint

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

Transformer trained on wmt14_en_de_joined_dict. Trained in TF32 saved in FP32

Publisher

NVIDIA Deep Learning Examples

Latest Version

20.06.0_tf32

Modified

April 4, 2023

Size

2.62 GB

Model Overview

This implementation of Transformer model architecture is based on the optimized implementation in Fairseq NLP toolkit.

Model Architecture

The Transformer model uses standard NMT encoder-decoder architecture. This model unlike other NMT models, uses no recurrent connections and operates on fixed size context window. The encoder stack is made up of N identical layers. Each layer is composed of the following sublayers: 1. Self-attention layer 2. Feedforward network (which is 2 fully-connected layers) Like the encoder stack, the decoder stack is made up of N identical layers. Each layer is composed of the sublayers: 1. Self-attention layer 2. Multi-headed attention layer combining encoder outputs with results from the previous self-attention layer. 3. Feedforward network (2 fully-connected layers)

The encoder uses self-attention to compute a representation of the input sequence. The decoder generates the output sequence one token at a time, taking the encoder output and previous decoder-outputted tokens as inputs. The model also applies embeddings on the input and output tokens, and adds a constant positional encoding. The positional encoding adds information about the position of each token.

Figure 1. The architecture of a Transformer model.

The complete description of the Transformer architecture can be found in Attention Is All You Need paper.

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

WMT14 German-English - Dataset for machine translation.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.