Text Classification with BERT and NeMo

Text Classification with BERT and NeMo. This NeMo application trains text classification models using single-GPU or multi-GPU. We log performance metrics and visualize them with TensorBoard. We show how to do inference with NeMo, and we visualize BERT embeddings before and after fine-tuning.
April 9, 2024
3.79 GB
Text Classification with BERT and NeMo

State of the art NLP uses large transformer models like BERT to extract meaningful representations from text. These models are pre-trained on a massive corpus of text using unsupervised methods to fill in randomly masked words. The pre-trained BERT model produces embeddings of the text input which then can be used in downstream tasks like text classification, question-answering, and named entity recognition.

In this notebook our task will be text classification. We use NVIDIA Neural Modules (NeMo) to compose our text classification system. NeMo makes state of the art natural language understanding accessible and fast for data scientists. NeMo can automatically download pre-trained BERT models, use single-GPU or multi-GPU training, and leverages powerful optimization techniques like automatic mixed-precision (AMP).

In this notebook you will learn the following workflow for text classification

  • Data exploration and preprocessing
  • Build scalable pipelines for single and multi-gpu training
  • Log training and validation metrics, checkpoint fine-tuned BERT and classification models
  • Visualize training and validation metrics with Tensorboard
  • Build inference pipelines within NeMo for model validation
  • Classify text interactively after fine-tuning
  • Visualize BERT embeddings before and after fine-tuning

To Learn More

Please see the NeMo tutorials and examples for more information on how to use BERT for natural language understanding:

Installation and Getting Started

Installation is very easy. Everything is included in this container, except the dataset must be downloaded by following the instructions in the Jupyter Notebook. Other text classification datasets may be used as long as the preprocessing instructions are followed in the notebook.

  1. Download the container from NGC
docker pull
  1. Run the container
docker run --gpus all -it --rm \
        --shm-size=8g \
        -p 8888:8888 \
        -p 6007:6006 \
        --ulimit memlock=-1 \
        --ulimit stack=67108864 \
  1. Navigate to the ip address (or localhost) of the machine serving the container.


The text classification notebook will be open and ready to go.

Multi-GPU Training

The notebook will go through the entire text classification workflow step by step using a single GPU. To scale training to multiple GPUs we use the same workflow but in a python script. We provide the text classification python training script in the container. This script can be launched from within the Jupyter notebook or from the command line.


python -m torch.distributed.launch --nproc_per_node=$NUM_GPUS \ \
    --pretrained_model_name $PRETRAINED_MODEL_NAME \
    --data_dir $SPLIT_DATA_DIR \
    --train_file_prefix 'train' \
    --eval_file_prefix 'eval' \
    --use_cache \
    --batch_size $BATCH_SIZE \
    --max_seq_length $MAX_SEQ_LENGTH \
    --num_gpus $NUM_GPUS \
    --num_epochs $NUM_EPOCHS \
    --amp_opt_level $AMP_OPTIMIZATION_LEVEL \
    --work_dir $WORK_DIR

Getting Help & Support

If you have any questions or need help, please email