With a ResNet-50 backbone and a number of architectural modifications, this version provides better accuracy and performance.
Despite the changes described in the previous section, the overall architecture, as described in the following diagram, has not changed.
Figure 1. The architecture of a Single Shot MultiBox Detector model. Image has been taken from the Single Shot MultiBox Detector paper.
The backbone is followed by 5 additional convolutional layers. In addition to the convolutional layers, we attached 6 detection heads:
This model was trained using script available on NGC and in GitHub repo.
The following datasets were used to train this model:
Performance numbers for this model are available in NGC.