Pretrained weights of the BERT model.

Within this card, you can download a trained-model of BERT for PyTorch.

How to use

For a quick start:

Download this model

In order to download the most recently uploaded version, click the Download button in the top right of this page. You can also browse through other versions of this pre-trainded model in the Version history tab.

To preview the contents of the download go to File Browser tab and select your version.

Browse to the corresponding model-script

This model was trained using a script also available here in the NGC and on Github. We do not recommended to use this model without its corresponding model-script which contains the definition of the model architecture, preprocessing applied to the input data, as well as accuracy and performance results.

You can access the most recent BERT model-script via NGC or GitHub.

If the pre-trainded model was trained with an older version of the script you can find the corresponding repo link in the details of the given model version in the Version details section below.

Build the proper NGC container and start an interactive session

You can support yourself with the steps described in the Quick Start Guide of the corresponding model-script. Note that you might want to skip some steps (eg. training - as you have just downloaded an already trained model).

Run inference/evaluation

Within the container and with the support of the model-script, you can evaluate your model and use it for inference. Refer to sub-sections on inference in the Quick Start Guide and Advanced tabs.

What can you do with a pre-trained model?

A few examples of what you can do with a pre-trained model are:

running inference/predictions using the model directly
resuming training from the downloaded checkpoint
building more efficient inference engines
transfer learning
training a student network

Version details

The following is a list of models and corresponding versions of model-scripts which were used to train them:

version	model-script	architecture config	training config	dataset	performance
1	script v3 on NGC or this github commit	type=Large, purpose=pretraining	iterations:8601, bs_phase1:64, LR_phase1:0.006, warmup_proportion_phase1:0.2843, iterations_phase1:7038, global_batch_size_phase1:65536, bs_phase2:8, LR_phase2:0.004, warmup_proportion_phase2:0.128, iterations_phase2:1563, global_batch_size_phase2:32768	Wikipedia, BookCorpus	training_loss:1.38

Compatibility with other scripts

All available versions of this model were trained using corresponding model-scripts optimized for DGX usage. Although possible, usage of the model in different configurations is not supported.

Glossary

"Model-script": a set of scripts containing the definition of the model architecture, training methods, preprocessing applied to the input data, as well as documentation covering usage and accuracy and performance results

"Model": a shorthand for (pre)trained-model, also used interchangeably with model checkpoint and model weights. It is a saved state of all the internal parameters of the model.

"Pretrained-model": see "Model"

"Trained-model": see "Model"

"Model weights": see "Model"

"Checkpoint": see "Model"

BERT-Large(pre-training using LAMB optimizer) for Pytorch