NGC | Catalog
CatalogResourcesBERT for TensorFlow

BERT for TensorFlow

For downloads and more information, please view on a desktop device.
Logo for BERT for TensorFlow

Description

BERT is a method of pre-training language representations which obtains state-of-the-art results on a wide array of NLP tasks.

Publisher

NVIDIA

Use Case

Nlp

Framework

TensorFlow

Latest Version

20.06.17

Modified

November 12, 2021

Compressed Size

1.37 MB

To pretrain or fine tune your model for Question Answering using mixed precision with Tensor Cores or using FP32/TF32, perform the following steps using the default parameters of the BERT model.

  1. Clone the repository.
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/TensorFlow/LanguageModeling/BERT
  1. Build the BERT TensorFlow NGC container.
bash scripts/docker/build.sh
  1. Download and preprocess the dataset.

This repository provides scripts to download, verify and extract the SQuAD dataset, GLUE dataset and pretrained weights for fine tuning as well as Wikipedia and BookCorpus dataset for pre-training.

To download, verify, and extract the required datasets, run:

bash scripts/data_download.sh

The script launches a Docker container with the current directory mounted and downloads the datasets to a data/ folder on the host.

Note: For fine tuning only, Wikipedia and Bookscorpus dataset download and preprocessing can be skipped by commenting it out.

  • Download Wikipedia only for pretraining

The pretraining dataset is 170GB+ and takes 15+ hours to download. The BookCorpus server most of the times get overloaded and also contain broken links resulting in HTTP 403 and 503 errors. Hence, it is recommended to skip downloading BookCorpus data by running:

bash scripts/data_download.sh wiki_only

  • Download Wikipedia and BookCorpus

Users are welcome to download BookCorpus from other sources to match our accuracy, or repeatedly try our script until the required number of files are downloaded by running the following: bash scripts/data_download.sh wiki_books

Note: Ensure a complete Wikipedia download. If in any case, the download breaks, remove the output file wikicorpus_en.xml.bz2 and start again. If a partially downloaded file exists, the script assumes successful download which causes the extraction to fail. Not using BookCorpus can potentially change final accuracy on a few downstream tasks.

  1. Download the pretrained models from NGC.

We have uploaded checkpoints that have been fine tuned and pre-trained for various configurations on the NGC Model Registry. Our data download scripts, by default download some of them but you can browse and download the relevant checkpoints directly from the NGC model catalog. Download them to the data/download/nvidia_pretrained/ to easily access them in your scripts.

  1. Start an interactive session in the NGC container to run training/inference.

After you build the container image and download the data, you can start an interactive CLI session as follows:

bash scripts/docker/launch.sh

The launch.sh script assumes that the datasets are in the following locations by default after downloading the data.

  • SQuAD v1.1 - data/download/squad/v1.1
  • SQuAD v2.0 - data/download/squad/v2.0
  • GLUE The Corpus of Linguistic Acceptability (CoLA) - data/download/CoLA
  • GLUE Microsoft Research Paraphrase Corpus (MRPC) - data/download/MRPC
  • GLUE The Multi-Genre NLI Corpus (MNLI) - data/download/MNLI
  • BERT Large - data/download/google_pretrained_weights/uncased_L-24_H-1024_A-16
  • BERT Base - data/download/google_pretrained_weights/uncased_L-12_H-768_A-12
  • BERT - data/download/google_pretrained_weights/
  • Wikipedia + BookCorpus TFRecords - data/tfrecords<config>/books_wiki_en_corpus
  1. Start pre-training.

BERT is designed to pre-train deep bidirectional representations for language representations. The following scripts are to replicate pre-training on Wikipedia and BookCorpus from the LAMB paper. These scripts are general and can be used for pre-training language representations on any corpus of choice.

From within the container, you can use the following script to run pre-training using LAMB.

bash scripts/run_pretraining_lamb.sh <train_batch_size_phase1> <train_batch_size_phase2> <eval_batch_size> <learning_rate_phase1> <learning_rate_phase2> <precision> <use_xla> <num_gpus> <warmup_steps_phase1> <warmup_steps_phase2> <train_steps> <save_checkpoint_steps> <num_accumulation_phase1> <num_accumulation_steps_phase2> <bert_model>

For BERT Large FP16 training with XLA using a DGX-1 V100 32GB, run:

bash scripts/run_pretraining_lamb.sh 64 8 8 7.5e-4 5e-4 fp16 true 8 2000 200 7820 100 128 512 large

This repository also contains a number of predefined configurations to run the Lamb pretraining on NVIDIA DGX-1, NVIDIA DGX-2H or NVIDIA DGX A100 nodes in scripts/configs/pretrain_config.sh. For example, to use the default DGX A100 8 gpu config, run:

bash scripts/run_pretraining_lamb.sh $(source scripts/configs/pretrain_config.sh && dgxa100_8gpu_fp16)

Alternatively, to run pre-training with Adam as in the original BERT paper from within the container, run:

bash scripts/run_pretraining_adam.sh <train_batch_size_per_gpu> <eval_batch_size> <learning_rate_per_gpu> <precision> <use_xla> <num_gpus> <warmup_steps> <train_steps> <save_checkpoint_steps>
  1. Start fine tuning.

The above pretrained BERT representations can be fine tuned with just one additional output layer for a state-of-the-art Question Answering system. From within the container, you can use the following script to run fine-training for SQuAD.

bash scripts/run_squad.sh <batch_size_per_gpu> <learning_rate_per_gpu> <precision> <use_xla> <num_gpus> <seq_length> <doc_stride> <bert_model> <squad_version> <checkpoint> <epochs>

For SQuAD 1.1 FP16 training with XLA using a DGX A100 40GB, run:

bash scripts/run_squad.sh 32 5e-6 fp16 true 8 384 128 large 1.1 data/download/google_pretrained_weights/uncased_L-24_H-1024_A-16/bert_model.ckpt 2.0

This repository contains a number of predefined configurations to run the SQuAD fine tuning on NVIDIA DGX-1, NVIDIA DGX-2H or NVIDIA DGX A100 nodes in scripts/configs/squad_config.sh. For example, to use the default DGX A100 8 gpu config, run:

bash scripts/run_squad.sh $(source scripts/configs/squad_config.sh && dgxa100_8gpu_fp16) 1.1 data/download/google_pretrained_weights/uncased_L-24_H-1024_A-16/bert_model.ckpt 2.0

Alternatively, to run fine tuning on GLUE benchmark, run:

bash scripts/run_glue.sh <task_name> <batch_size_per_gpu> <learning_rate_per_gpu> <precision> <use_xla> <num_gpus> <seq_length> <doc_stride> <bert_model> <epochs> <warmup_proportion> <checkpoint>

For MRPC FP16 training with XLA using a DGX A100 40GB, run:

bash scripts/run_glue.sh MRPC 16 3e-6 true 8 128 64 large 3 0.1

The GLUE tasks supported include CoLA, MRPC and MNLI.

  1. Start validation/evaluation.

The run_squad_inference.sh script runs inference on a checkpoint fine tuned for SQuAD and evaluates the validity of predictions on the basis of exact match and F1 score.

bash scripts/run_squad_inference.sh <init_checkpoint> <batch_size> <precision> <use_xla> <seq_length> <doc_stride> <bert_model> <squad_version>

For SQuAD 2.0 FP16 inference with XLA using a DGX-1 V100 32GB using checkpoint at /results/model.ckpt , run:

bash scripts/run_squad_inference.sh /results/model.ckpt 8 fp16 true 384 128 large 2.0

For SQuAD 1.1 FP32 inference without XLA using a DGX A100 40GB using checkpoint at /results/model.ckpt, run:

bash scripts/run_squad_inference.sh /results/model.ckpt 8 fp32 false 384 128 large 1.1

Alternatively, to run inference on GLUE benchmark, run:

bash scripts/run_glue_inference.sh <task_name> <init_checkpoint> <batch_size_per_gpu> <precision> <use_xla> <seq_length> <doc_stride> <bert_model>