NGC | Catalog
CatalogModelsBERT TF checkpoint (Large, pretraining, AMP, LAMB)

BERT TF checkpoint (Large, pretraining, AMP, LAMB)

For downloads and more information, please view on a desktop device.
Logo for BERT TF checkpoint (Large, pretraining, AMP, LAMB)

Description

BERT Large TensorFlow checkpoint pretrained using AMP and LAMB optimizer

Publisher

NVIDIA Deep Learning Examples

Latest Version

19.03.1_amp_optim-lamb

Modified

April 4, 2023

Size

5.03 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional Transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model Hidden layers Hidden unit size Attention heads Feedforward filter size Max sequence length Parameters
BERTBASE 12 encoder 768 12 4 x 768 512 110M
BERTLARGE 24 encoder 1024 16 4 x 1024 512 330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process in depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
  • Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.