BERT TF checkpoint (Large, QA, SQuAD 1.1, AMP, SeqLen=384)

BERT TF checkpoint (Large, QA, SQuAD 1.1, AMP, SeqLen=384)

Logo for BERT TF checkpoint (Large, QA, SQuAD 1.1, AMP, SeqLen=384)
BERT Large TensorFlow checkpoint trained for QA on SQuAD1.1 using AMP
NVIDIA Deep Learning Examples
Latest Version
April 4, 2023
3.75 GB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

BERT's model architecture is a multi-layer bidirectional Transformer encoder. Based on the model size, we have the following two default configurations of BERT:

Model Hidden layers Hidden unit size Attention heads Feedforward filter size Max sequence length Parameters
BERTBASE 12 encoder 768 12 4 x 768 512 110M
BERTLARGE 24 encoder 1024 16 4 x 1024 512 330M

BERT training consists of two steps, pre-training the language model in an unsupervised fashion on vast amounts of unannotated datasets, and then using this pre-trained model for fine-tuning for various NLP tasks, such as question and answer, sentence classification, or sentiment analysis. Fine-tuning typically adds an extra layer or two for the specific task and further trains the model using a task-specific annotated dataset, starting from the pre-trained backbone weights. The end-to-end process in depicted in the following image:

Figure 1: BERT Pipeline


This model was trained using script available on NGC and in GitHub repo


The following datasets were used to train this model:

  • SQuAD 1.1 + 2.0 - Reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.


Performance numbers for this model are available in NGC



This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.