NGC | Catalog
CatalogModelsBERT-Large(pre-training using LAMB optimizer) for Pytorch

BERT-Large(pre-training using LAMB optimizer) for Pytorch

For downloads and more information, please view on a desktop device.
Logo for BERT-Large(pre-training using LAMB optimizer) for Pytorch


Pretrained weights for the BERT model.



Use Case




Latest Version



October 30, 2021


4.38 GB

Pretrained weights of the BERT model.

Within this card, you can download a trained-model of BERT for PyTorch.

How to use

For a quick start:

Download this model

In order to download the most recently uploaded version, click the Download button in the top right of this page. You can also browse through other versions of this pre-trainded model in the Version history tab.

To preview the contents of the download go to File Browser tab and select your version.

Browse to the corresponding model-script

This model was trained using a script also available here in the NGC and on Github. We do not recommended to use this model without its corresponding model-script which contains the definition of the model architecture, preprocessing applied to the input data, as well as accuracy and performance results.

You can access the most recent BERT model-script via NGC or GitHub.

If the pre-trainded model was trained with an older version of the script you can find the corresponding repo link in the details of the given model version in the Version details section below.

Build the proper NGC container and start an interactive session

You can support yourself with the steps described in the Quick Start Guide of the corresponding model-script. Note that you might want to skip some steps (eg. training - as you have just downloaded an already trained model).

Run inference/evaluation

Within the container and with the support of the model-script, you can evaluate your model and use it for inference. Refer to sub-sections on inference in the Quick Start Guide and Advanced tabs.

What can you do with a pre-trained model?

A few examples of what you can do with a pre-trained model are:

  • running inference/predictions using the model directly
  • resuming training from the downloaded checkpoint
  • building more efficient inference engines
  • transfer learning
  • training a student network

Version details

The following is a list of models and corresponding versions of model-scripts which were used to train them:

version model-script architecture config training config dataset performance
1 script v3 on NGC or this github commit type=Large, purpose=pretraining iterations:8601, bs_phase1:64, LR_phase1:0.006, warmup_proportion_phase1:0.2843, iterations_phase1:7038, global_batch_size_phase1:65536, bs_phase2:8, LR_phase2:0.004, warmup_proportion_phase2:0.128, iterations_phase2:1563, global_batch_size_phase2:32768 Wikipedia, BookCorpus training_loss:1.38

Compatibility with other scripts

All available versions of this model were trained using corresponding model-scripts optimized for DGX usage. Although possible, usage of the model in different configurations is not supported.


"Model-script": a set of scripts containing the definition of the model architecture, training methods, preprocessing applied to the input data, as well as documentation covering usage and accuracy and performance results

"Model": a shorthand for (pre)trained-model, also used interchangeably with model checkpoint and model weights. It is a saved state of all the internal parameters of the model.

"Pretrained-model": see "Model"

"Trained-model": see "Model"

"Model weights": see "Model"

"Checkpoint": see "Model"