SSD v1.2 for TensorFlow1

NVIDIA Deep Learning Examples

Resource

NVIDIA Deep Learning Examples

SSD v1.2 for TensorFlow1

With a ResNet-50 backbone and a number of architectural modifications, this version provides better accuracy and performance.

The following sections provide greater details of the dataset, running training and inference, and the training results.

Scripts and sample code

Dockerfile: a container with the basic set of dependencies to run SSD

In the model/research/object_detection directory, the most important files are:

model_main.py: serves as the entry point to launch the training and inference
models/ssd_resnet_v1_fpn_feature_extractor.py: implementation of the model
metrics/coco_tools.py: implementation of mAP metric
utils/exp_utils.py: utility functions for running training and benchmarking

Parameters

The complete list of available parameters for the model/research/object_detection/model_main.py script contains:

./object_detection/model_main.py:
  --[no]allow_xla: Enable XLA compilation
    (default: 'false')
  --checkpoint_dir: Path to directory holding a checkpoint.  If `checkpoint_dir` is provided, this binary operates in
    eval-only mode, writing resulting metrics to `model_dir`.
  --eval_count: How many times the evaluation should be run
    (default: '1')
    (an integer)
  --[no]eval_training_data: If training data should be evaluated for this job. Note that one call only use this in eval-
    only mode, and `checkpoint_dir` must be supplied.
    (default: 'false')
  --hparams_overrides: Hyperparameter overrides, represented as a string containing comma-separated hparam_name=value
    pairs.
  --model_dir: Path to output model directory where event and checkpoint files will be written.
  --num_train_steps: Number of train steps.
    (an integer)
  --pipeline_config_path: Path to pipeline config file.
  --raport_file: Path to dlloger json
    (default: 'summary.json')
  --[no]run_once: If running in eval-only mode, whether to run just one round of eval vs running continuously (default).
    (default: 'false')
  --sample_1_of_n_eval_examples: Will sample one of every n eval input examples, where n is provided.
    (default: '1')
    (an integer)
  --sample_1_of_n_eval_on_train_examples: Will sample one of every n train input examples for evaluation, where n is
    provided. This is only used if `eval_training_data` is True.
    (default: '5')
    (an integer)

Command line options

The SSD model training is conducted by the script from the object_detection library, model_main.py. Our experiments were done with settings described in the examples directory. If you would like to get more details about available arguments, please run:

python object_detection/model_main.py --help

Getting the data

The SSD320 v1.2 model was trained on the COCO 2017 dataset. The val2017 validation set was used as a validation dataset. The download_data.sh script will preprocess the data to tfrecords format.

This repository contains the download_dataset.sh script which will automatically download and preprocess the training, validation and test datasets. By default, data will be downloaded to the /data/coco2017_tfrecords directory.

Training process

Training the SSD model is implemented in the object_detection/model_main.py script.

All training parameters are set in the config files. Because evaluation is relatively time consuming, it does not run every epoch. By default, evaluation is executed only once at the end of the training. The model is evaluated using pycocotools distributed with the COCO dataset. The number of evaluations can be changed using the eval_count parameter.

To run training with tensor cores, use ./examples/SSD320_FP16_{1,4,8}GPU.sh scripts. For more details see Enabling mixed precision section below.

Data preprocessing

Before we feed data to the model, both during training and inference, we perform:

Normalization
Encoding bounding boxes
Resize to 320x320

Data augmentation

During training we perform the following augmentation techniques:

Random crop
Random horizontal flip
Color jitter

Enabling mixed precision

Mixed precision training offers significant computational speedup by performing operations in half-precision format, while storing minimal information in single-precision to retain as much information as possible in critical parts of the network. Since the introduction of tensor cores in the Volta and Turing architectures, significant training speedups are experienced by switching to mixed precision -- up to 3x overall speedup on the most arithmetically intense model architectures. Using mixed precision training previously required two steps:

Porting the model to use the FP16 data type where appropriate.
Manually adding loss scaling to preserve small gradient values.

This can now be achieved using Automatic Mixed Precision (AMP) for TensorFlow to enable the full mixed precision methodology in your existing TensorFlow model code. AMP enables mixed precision training on Volta and Turing GPUs automatically. The TensorFlow framework code makes all necessary model changes internally.

In TF-AMP, the computational graph is optimized to use as few casts as necessary and maximize the use of FP16, and the loss scaling is automatically applied inside of supported optimizers. AMP can be configured to work with the existing tf.contrib loss scaling manager by disabling the AMP scaling with a single environment variable to perform only the automatic mixed-precision optimization. It accomplishes this by automatically rewriting all computation graphs with the necessary operations to enable mixed precision training and automatic loss scaling.

For information about:

How to train using mixed precision, see the Mixed Precision Training paper and Training With Mixed Precision documentation.
How to access and enable AMP for TensorFlow, see Using TF-AMP from the TensorFlow User Guide.
Techniques used for mixed precision training, see the Mixed-Precision Training of Deep Neural Networks blog.