Video object detection is the task of detecting objects from a video as opposed to images.
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Recently, image-level flow warping has been proposed to propagate features across different frames, aiming at achieving a better trade-off between accuracy and efficiency.
Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors.
The latency reduction by this hard attention mechanism comes at the cost of degraded accuracy.
Instead of relying on optical flow, this paper proposes a novel module called Progressive Sparse Local Attention (PSLA), which establishes the spatial correspondence between features across frames in a local region with progressively sparser stride and uses the correspondence to propagate features.
In this work, we propose the first object guided external memory network for online video object detection.
In this paper, we introduce a new design to capture the interactions across the objects in spatio-temporal context.
In this work, we argue that aggregating features in the full-sequence level will lead to more discriminative and robust features for video object detection.
Single-frame object detectors perform well on videos sometimes, even without temporal context.