With a ResNet-50 backbone and a number of architectural modifications, this version provides better accuracy and performance.
Despite the changes described in the previous section, the overall architecture, as described in the following diagram, has not changed.
Figure 1. The architecture of a Single Shot MultiBox Detector model. Image has been taken from the Single Shot MultiBox Detector paper.
The backbone is followed by 5 additional convolutional layers. In addition to the convolutional layers, we attached 6 detection heads:
- The first detection head is attached to the last conv4_x layer.
- The other five detection heads are attached to the corresponding 5 additional layers.
The following datasets were used to train this model:
- COCO 2017 - Dataset for large-scale object detection, segmentation and captioning.
Performance numbers for this model are available in NGC.