Tehchniques for manipulating images are advancing rapidly; while these are helpful for many useful tasks, they also pose a threat to society with their ability to create believable misinformation.
However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and (b) missing object predictions.
We present a novel framework, Spatial Pyramid Attention Network (SPAN) for detection and localization of multiple types of image manipulations.
Weakly supervised object detection aims at reducing the amount of supervision required to train detection models.
Ranked #1 on Weakly Supervised Object Detection on Charades
Performance on the five tasks of depth estimation, optical flow estimation, odometry, moving object segmentation and scene flow estimation shows that our approach outperforms other SoTA methods.
Then the whole scene is decomposed into moving foreground and static background by compar- ing the estimated optical flow and rigid flow derived from the depth and ego-motion.
The four types of information, i. e. 2D flow, camera pose, segment mask and depth maps, are integrated into a differentiable holistic 3D motion parser (HMP), where per-pixel 3D motion for rigid background and moving objects are recovered.
Especially on KITTI dataset where abundant unlabeled samples exist, our unsupervised method outperforms its counterpart trained with supervised learning.
Learning to reconstruct depths in a single image by watching unlabeled videos via deep convolutional network (DCN) is attracting significant attention in recent years.
In this work, we address the problem of spatio-temporal action detection in temporally untrimmed videos.
RED takes multiple history representations as input and learns to anticipate a sequence of future representations.
For evaluation, we adopt TaCoS dataset, and build a new dataset for this task on top of Charades by adding sentence temporal annotations, called Charades-STA.
CBR uses temporal coordinate regression to refine the temporal boundaries of the sliding windows.
Ranked #13 on Temporal Action Localization on THUMOS’14
Temporal Action Proposal (TAP) generation is an important problem, as fast and accurate extraction of semantically important (e. g. human actions) segments from untrimmed videos is an important step for large-scale video analysis.
Ranked #8 on Action Recognition on THUMOS’14