Within this card, you can download a trained-model of BERT for PyTorch.
For a quick start:
In order to download the most recently uploaded version, click the Download button in the top right of this page. You can also browse through other versions of this pre-trainded model in the Version history tab.
To preview the contents of the download go to File Browser tab and select your version.
This model was trained using a script also available here in the NGC and on Github. We do not recommended to use this model without its corresponding model-script which contains the definition of the model architecture, preprocessing applied to the input data, as well as accuracy and performance results.
You can access the most recent BERT model-script via NGC or GitHub.
If the pre-trainded model was trained with an older version of the script you can find the corresponding repo link in the details of the given model version in the Version details section below.
You can support yourself with the steps described in the Quick Start Guide of the corresponding model-script. Note that you might want to skip some steps (eg. training - as you have just downloaded an already trained model).
Within the container and with the support of the model-script, you can evaluate your model and use it for inference. Refer to sub-sections on inference in the Quick Start Guide and Advanced tabs.
A few examples of what you can do with a pre-trained model are:
The following is a list of models and corresponding versions of model-scripts which were used to train them:
|version||model-script||architecture config||training config||dataset||performance|
|1||script v3 on NGC or this github commit||type=Large, purpose=pretraining||iterations:8601, bs_phase1:64, LR_phase1:0.006, warmup_proportion_phase1:0.2843, iterations_phase1:7038, global_batch_size_phase1:65536, bs_phase2:8, LR_phase2:0.004, warmup_proportion_phase2:0.128, iterations_phase2:1563, global_batch_size_phase2:32768||Wikipedia, BookCorpus||training_loss:1.38|
All available versions of this model were trained using corresponding model-scripts optimized for DGX usage. Although possible, usage of the model in different configurations is not supported.
"Model-script": a set of scripts containing the definition of the model architecture, training methods, preprocessing applied to the input data, as well as documentation covering usage and accuracy and performance results
"Model": a shorthand for (pre)trained-model, also used interchangeably with model checkpoint and model weights. It is a saved state of all the internal parameters of the model.
"Pretrained-model": see "Model"
"Trained-model": see "Model"
"Model weights": see "Model"
"Checkpoint": see "Model"