Add to Favourites
To login click here

Object detection and recognition are essential tasks in computer vision, with applications in search and rescue, warehouse logistics, video surveillance, and more. In the last decade, deep-learning models have been adopted due to their superior performance compared to classical methods. These models employ convolutional neural networks (CNNs) and are subdivided into two types: two-shot detectors, which search with maximum accuracy but with a higher inference time, and one-shot detectors, which are oriented at a minimum inference time for real-time applications. Recently, Vision Transformers (ViTs) have also been applied to object detection and recognition tasks, making use of CNNs as a backbone for feature extraction.