NGC | Catalog
CatalogResourcesResNet v1.5 for TensorFlow

ResNet v1.5 for TensorFlow

For downloads and more information, please view on a desktop device.
Logo for ResNet v1.5 for TensorFlow

Description

With modified architecture and initialization this ResNet50 version gives ~0.5% better accuracy than original.

Publisher

NVIDIA

Use Case

Classification

Framework

TensorFlow

Latest Version

20.12.6

Modified

March 2, 2022

Compressed Size

2.67 MB

To train your model using mixed precision or TF32 with Tensor Cores or FP32, perform the following steps using the default parameters of the ResNet-50 v1.5 model on the ImageNet dataset. For the specifics concerning training and inference, see the Advanced section.

  1. Clone the repository.

    git clone https://github.com/NVIDIA/DeepLearningExamples
    cd DeepLearningExamples/TensorFlow/Classification/ConvNets
    
  2. Download and preprocess the dataset. The ResNet50 v1.5 script operates on ImageNet 1k, a widely popular image classification dataset from the ILSVRC challenge.

  • Download the images
  • Extract the training and validation data:
    mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
    tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
    find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
    cd ..
    mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
    
  • Preprocess dataset to TFRecord form using script. Additional metadata from autors repository might be required.
  1. Build the ResNet-50 v1.5 TensorFlow NGC container.

    docker build . -t nvidia_rn50
    
  2. Start an interactive session in the NGC container to run training/inference. After you build the container image, you can start an interactive CLI session with

    nvidia-docker run --rm -it -v <path to imagenet>:/data/tfrecords --ipc=host nvidia_rn50
    
  3. (Optional) Create index files to use DALI. To allow proper sharding in a multi-GPU environment, DALI has to create index files for the dataset. To create index files, run inside the container:

    bash ./utils/dali_index.sh /data/tfrecords <index file store location>
    

    Index files can be created once and then reused. It is highly recommended to save them into a persistent location.

  4. Start training. To run training for a standard configuration (as described in Default configuration, DGX1V, DGX2V, single GPU, FP16, FP32, 50, 90, and 250 epochs), run one of the scripts int the resnet50v1.5/training directory. Ensure ImageNet is mounted in the /data/tfrecords directory.

For example, to train on DGX-1 for 90 epochs using AMP, run:

bash ./resnet50v1.5/training/DGX1_RN50_AMP_90E.sh /path/to/result /data

Additionally, features like DALI data preprocessing or TensorFlow XLA can be enabled with following arguments when running those scripts:

bash ./resnet50v1.5/training/DGX1_RN50_AMP_90E.sh /path/to/result /data --xla --dali

  1. Start validation/evaluation. To evaluate the validation dataset located in /data/tfrecords, run main.py with --mode=evaluate. For example:

python main.py --mode=evaluate --data_dir=/data/tfrecords --batch_size <batch size> --model_dir <model location> --results_dir <output location> [--xla] [--amp]

The optional --xla and --amp flags control XLA and AMP during evaluation.