Object detection is the task of detecting instances of objects of a certain class within an image. The state-of-the-art methods can be categorized into two main types: one-stage methods and two stage-methods. One-stage methods prioritize inference speed, and example models include YOLO, SSD and RetinaNet. Two-stage methods prioritize detection accuracy, and example models include Faster R-CNN, Mask R-CNN and Cascade R-CNN.
The most popular benchmark is the MSCOCO dataset. Models are typically evaluated according to a Mean Average Precision metric.
( Image credit: Detectron )
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
We present a general framework for capturing long-range interactions between an input and structured contextual information (e. g. a pixel surrounded by other pixels).
Ranked #24 on Image Classification on ImageNet
In this paper, we study what would happen when normalization layers are removed from the network, and show how to train deep neural networks without normalization layers and without performance degradation.
Attention modules have been demonstrated effective in strengthening the representation ability of a neural network via reweighting spatial or channel features or stacking both operations sequentially.
To address this problem, this paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only, thus dramatically alleviating human labor in training models.
However, recently the VOS community has deemed such a test time optimization and its impact on the test runtime as unfeasible.
This article presents a new dataset obtained from a real CCTV installed in a university and the generation of synthetic images, to which Faster R-CNN was applied using Feature Pyramid Network with ResNet-50 resulting in a weapon detection model able to be used in quasi-real-time CCTV (90 ms of inference time with an NVIDIA GeForce GTX-1080Ti card) improving the state of the art on weapon detection in a two stages training.
In this work, a single-stage keypoint-based network, named as FADNet, is presented to address the task of monocular 3D object detection.
In this paper, we propose a self-supervised framework to improve an object detector in unseen scenarios by moving an agent around in a 3D environment and aggregating multi-view RGB-D information.