BERT TF checkpoint (Large, pretraining, AMP, LAMB)

NVIDIA Deep Learning Examples

Model

NVIDIA Deep Learning Examples

BERT TF checkpoint (Large, pretraining, AMP, LAMB)

BERT Large TensorFlow checkpoint pretrained using AMP and LAMB optimizer

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional Transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model	Hidden layers	Hidden unit size	Attention heads	Feedforward filter size	Max sequence length	Parameters
BERTBASE	12 encoder	768	12	4 x 768	512	110M
BERTLARGE	24 encoder	1024	16	4 x 1024	512	330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process in depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.

Publisher

NVIDIA Deep Learning Examples

Latest Version19.03.1_amp_optim-lamb

UpdatedApril 4, 2023 UTC

Compressed Size5.03 GB

Labels

NLP NLU