NGC Catalog
CLASSIC
Welcome Guest
Models
Bert Large checkpoint (TensorFlow2, AMP, Squad1.1, seqLen384)

Bert Large checkpoint (TensorFlow2, AMP, Squad1.1, seqLen384)

For downloads and more information, please view on a desktop device.
Logo for Bert Large checkpoint (TensorFlow2, AMP, Squad1.1, seqLen384)
Description
Bert Large TensorFlow2 checkpoint finetuned on Squad1.1 using seqLen=384
Publisher
NVIDIA Deep Learning Examples
Latest Version
21.02.0
Modified
April 4, 2023
Size
3.74 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model Hidden layers Hidden unit size Attention heads Feedforward filter size Max sequence length Parameters
BERTBASE 12 encoder 768 12 4 x 768 512 110M
BERTLARGE 24 encoder 1024 16 4 x 1024 512 330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process is depicted in the following image:

Figure 1: BERT Pipeline

Training

This model was trained using script available on NGC and in GitHub repo

Dataset

The following datasets were used to train this model:

  • SQuAD 1.1 + 2.0 - Reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Performance

Performance numbers for this model are available in NGC

References

  • original paper
  • NVIDIA model implementation in NGC
  • NVIDIA model implementation on GitHub

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.