BERT TF checkpoint (Large, pretraining, AMP, LAMB)

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

BERT Large TensorFlow checkpoint pretrained using AMP and LAMB optimizer

Publisher

NVIDIA Deep Learning Examples

Latest Version

19.03.1_amp_optim-lamb

Modified

April 4, 2023

Size

5.03 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional Transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model	Hidden layers	Hidden unit size	Attention heads	Feedforward filter size	Max sequence length	Parameters
BERTBASE	12 encoder	768	12	4 x 768	512	110M
BERTLARGE	24 encoder	1024	16	4 x 1024	512	330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process in depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.