NGC | Catalog
Welcome Guest
CatalogModelsBERT PyTorch checkpoint (Dist-6L-768D, SQuAD1, AMP)

BERT PyTorch checkpoint (Dist-6L-768D, SQuAD1, AMP)

For downloads and more information, please view on a desktop device.
Logo for BERT PyTorch checkpoint (Dist-6L-768D, SQuAD1, AMP)

Description

BERT Distilled 6L-768D PyTorch checkpoint distilled on SQuAD v1.1 dataset using AMP

Publisher

NVIDIA Deep Learning Examples

Use Case

Nlp

Framework

PyTorch

Latest Version

20.12.0

Modified

March 8, 2022

Size

253.45 MB

Model Overview

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Model Architecture

The BERT model uses the same architecture as the encoder of the Transformer. Input sequences are projected into an embedding space before being fed into the encoder structure. Additionally, positional and segment encodings are added to the embeddings to preserve positional information. The encoder structure is simply a stack of Transformer blocks, which consist of a multi-head attention layer followed by successive stages of feed-forward networks and layer normalization. The multi-head attention layer accomplishes self-attention on multiple input representations.

An illustration of the architecture taken from the Transformer paper is shown below.

BERT

Training

This model was trained using script available on NGC and in GitHub repo

Dataset

The following datasets were used to train this model:

  • Wikipedia - Dataset containing a 170GB+ Wikipedia dump.
  • Bookcorpus - Large-scale text corpus for unsupervised learning of sentence encoders/decoders.
  • SQuAD 1.1 + 2.0 - Reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

Performance

Performance numbers for this model are available in NGC

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.