Linux / amd64
State of the art NLP uses large transformer models like BERT to extract meaningful representations from text. These models are pre-trained on a massive corpus of text using unsupervised methods to fill in randomly masked words. The pre-trained BERT model produces embeddings of the text input which then can be used in downstream tasks like text classification, question-answering, and named entity recognition.
In this notebook our task will be text classification. We use NVIDIA Neural Modules (NeMo) to compose our text classification system. NeMo makes state of the art natural language understanding accessible and fast for data scientists. NeMo can automatically download pre-trained BERT models, use single-GPU or multi-GPU training, and leverages powerful optimization techniques like automatic mixed-precision (AMP).
In this notebook you will learn the following workflow for text classification
Please see the NeMo tutorials and examples for more information on how to use BERT for natural language understanding:
Installation is very easy. Everything is included in this container, except the dataset must be downloaded by following the instructions in the Jupyter Notebook. Other text classification datasets may be used as long as the preprocessing instructions are followed in the notebook.
docker pull nvcr.io/nvidia/nemo_bert_text_classification:20.07
docker run --gpus all -it --rm \
--shm-size=8g \
-p 8888:8888 \
-p 6007:6006 \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
nvcr.io/nvidia/nemo_bert_text_classification:20.07
The text classification notebook will be open and ready to go.
The notebook will go through the entire text classification workflow step by step using a single GPU. To scale training to multiple GPUs we use the same workflow but in a python script. We provide the text classification python training script in the container. This script can be launched from within the Jupyter notebook or from the command line.
NUM_GPUS=4
PRETRAINED_MODEL_NAME='bert-base-uncased'
SPLIT_DATA_DIR='/data/nlp/SST-2/split'
NUM_EPOCHS=3
WORK_DIR='bert-base-uncased'
AMP_OPTIMIZATION_LEVEL='O1'
BATCH_SIZE=256
MAX_SEQ_LENGTH=64
python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS \
text_classification_with_bert.py \
--pretrained_model_name $PRETRAINED_MODEL_NAME \
--data_dir $SPLIT_DATA_DIR \
--train_file_prefix 'train' \
--eval_file_prefix 'eval' \
--use_cache \
--batch_size $BATCH_SIZE \
--max_seq_length $MAX_SEQ_LENGTH \
--num_gpus $NUM_GPUS \
--num_epochs $NUM_EPOCHS \
--amp_opt_level $AMP_OPTIMIZATION_LEVEL \
--work_dir $WORK_DIR