NGC | Catalog
CatalogModelsBert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

Bert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

For downloads and more information, please view on a desktop device.
Logo for Bert Large checkpoint (TensorFlow2, Pretraining AMP, LAMB)

Description

Bert Large TensorFlow2 checkpoint pretrained using AMP and LAMB optimizer

Publisher

NVIDIA Deep Learning Examples

Use Case

Language Modeling

Framework

TensorFlow2

Latest Version

21.02.0

Modified

October 29, 2021

Size

1.25 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model Hidden layers Hidden unit size Attention heads Feedforward filter size Max sequence length Parameters
BERTBASE 12 encoder 768 12 4 x 768 512 110M
BERTLARGE 24 encoder 1024 16 4 x 1024 512 330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process is depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo

Dataset

The following datasets were used to train this model:

  • Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
  • Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.

Performance

Performance numbers for this model are available in NGC

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.