IDENTIFYING CONCEALED OBJECTS FROM VIDEOS

Concealed objects are often hard to identify from still images, as often camouflaged objects exhibit patterns seamless to the background. In this work, we propose a novel video concealed object detection (VCOD) framework, called \textbf{\Ourmodel}, as the concealed state is likely to break when the object moves. The proposed SLT-Net leverages on both short-term dynamics and long-term temporal consistency to detect concealed objects in continuous video frames. Unlike previous methods that often utilize homography or optical flows to explicitly represent motions, we build a dense correlation volume to implicitly capture motions between neighbouring frames. To enforce the temporal consistency within a video sequence, we utilize a spatial-temporal transformer to jointly refine the short-term predictions. Extensive experiments on existing image and VCOD benchmarks demonstrate the architectural effectiveness of our approach. We further collect a large-scale VCOD dataset named MoCA-Mask with pixel-level handcrafted ground-truth masks and construct a comprehensive VCOD benchmark with previous methods. Videos and codes can be found at: https://anonymous.4open.science/r/long-short-vcod-C0AF/README.md.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here