NGC Catalog
CLASSIC
Welcome Guest
Models
Transformer en-de checkpoint

Transformer en-de checkpoint

For downloads and more information, please view on a desktop device.
Logo for Transformer en-de checkpoint
Description
Transformer trained on wmt14_en_de_joined_dict. Trained in TF32 saved in FP32
Publisher
NVIDIA Deep Learning Examples
Latest Version
20.06.0_tf32
Modified
April 4, 2023
Size
2.62 GB

Model Overview

This implementation of Transformer model architecture is based on the optimized implementation in Fairseq NLP toolkit.

Model Architecture

The Transformer model uses standard NMT encoder-decoder architecture. This model unlike other NMT models, uses no recurrent connections and operates on fixed size context window. The encoder stack is made up of N identical layers. Each layer is composed of the following sublayers: 1. Self-attention layer 2. Feedforward network (which is 2 fully-connected layers) Like the encoder stack, the decoder stack is made up of N identical layers. Each layer is composed of the sublayers: 1. Self-attention layer 2. Multi-headed attention layer combining encoder outputs with results from the previous self-attention layer. 3. Feedforward network (2 fully-connected layers)

The encoder uses self-attention to compute a representation of the input sequence. The decoder generates the output sequence one token at a time, taking the encoder output and previous decoder-outputted tokens as inputs. The model also applies embeddings on the input and output tokens, and adds a constant positional encoding. The positional encoding adds information about the position of each token.


Figure 1. The architecture of a Transformer model.

The complete description of the Transformer architecture can be found in Attention Is All You Need paper.

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • WMT14 German-English - Dataset for machine translation.

Performance

Performance numbers for this model are available in NGC.

References

  • Original paper
  • NVIDIA model implementation in NGC
  • NVIDIA model implementation on GitHub

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.