A natural progression from image classification would be classification and localization of the subject of the image. We can take this idea one step further and localize objects in a given image. Simply put, object detection refers to identifying which object(s) are there in an image.
Source: Joseph Redmon, Ali Farhadi, “YOLO9000:Better, Faster, Stronger”
In this section we will try to answer the following questions:
Object Detection is about not only detecting the presence and location of objects in images and videos, but also categorizing them into everyday objects. Oftentimes, there is a confusion between Image Classification and Object Detection. Simply put, the difference between them is the same as the difference between saying “This is a cat” and pointing to a cat and saying “There is the cat”.
To build autonomous systems, perception is the main challenge to be solved. Perception, in terms of autonomous systems refers to the ability of understanding the surroundings of the autonomous agent. This means that the agent needs to be able to figure out where and what objects are in its immediate vicinity.
Object detection can help keep humans away from toxic environments and hazardous situations. Challenges like garbage segregation, oil rig monitoring, nightly surveillance, cargo port maintenance and other high risk applications can be aided by robots/cameras which can detect objects. Essentially, any environment that requires visual inspection or analysis and is too dangerous for humans, object detection pipelines can be used to shield from any onsite hazard.
While this has been a topic of research since before Deep Learning became mainstream, the best performing models today use one or more Deep Neural Networks.
Many architectures have networks pretrained on a different, simpler task, like Image Classification. As one can imagine, the inputs to this task can be images or videos, and the outputs are usually a set of bounding box coordinates that enclose each of the detected objects, as well as a class label for each detected object. With advances in research and the use of GPUs, it is possible to have object detection in real time with really impressive accuracies!