Object detection is the task of detecting instances of objects of a certain class within an image. The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods. One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet. Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Many commonly-used detection frameworks aim to handle the multi-scale object detection problem.
Our results suggest that architectures with unconstrained latent representations and full-image object masks such as ViMON and OP3 are able to learn more powerful representations in terms of object detection, segmentation and tracking than the explicitly parameterized spatial transformer based architecture of TBA and SCALOR.
Therefore, a suitable quantization algorithm is important when deploying a DNN into CACIM systems to obtain less accuracy loss.
Specifically, the input of the module will be resized into different scales on which position and semantic information will be processed, and then they will be rescaled back and combined with the module input.
This paper analyzes the serious false positive problem in OSOD and proposes a Focus on Classification One-Shot Object Detection (FOC OSOD) framework, which is improved in two important aspects: (1) classification cascade head with the fixed IoU threshold can enhance the robustness of classification by comparing multiple close regions; (2) classification region deformation on the query feature and the reference feature to obtain a more effective comparison region.
Despite achieving promising performance at par with anchor-based detectors, the existing anchor-free detectors such as FCOS or CenterNet predict objects based on standard Cartesian coordinates, which often yield poor quality keypoints.
Empirically, models trained with our method achieve improved conditional/unconditional image generation along with improved anomaly detection capabilities.
We show that in the context of object detection, training variance networks with negative log likelihood (NLL) can lead to high entropy predictive distributions regardless of the correctness of the output mean.