NGC | Catalog
CatalogModelsBert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

Bert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

For downloads and more information, please view on a desktop device.
Logo for Bert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

Description

Bert Large TensorFlow2 checkpoint pretrained using AMP and LAMB optimizer

Publisher

NVIDIA Deep Learning Examples

Latest Version

21.02.0_amp_optim-lamb

Modified

April 4, 2023

Size

1.25 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model Hidden layers Hidden unit size Attention heads Feedforward filter size Max sequence length Parameters
BERTBASE 12 encoder 768 12 4 x 768 512 110M
BERTLARGE 24 encoder 1024 16 4 x 1024 512 330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process is depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
  • Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.