Inspired by the observation that audiences have different visual preferences on foreground and background objects, we for the first time propose to use saliency masks in the evaluation processes of the task of video frame interpolation.
Physical-world adversarial attacks based on universal adversarial patches have been proved to be able to mislead deep convolutional neural networks (CNNs), exposing the vulnerability of real-world visual classification systems based on CNNs.
In order to solve this problem, the research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF).
Semantic segmentation tasks based on weakly supervised condition have been put forward to achieve a lightweight labeling process.
In this paper, we propose an iterative algorithm to learn such pairwise relations, which consists of two branches, a unary segmentation network which learns the label probabilities for each pixel, and a pairwise affinity network which learns affinity matrix and refines the probability map generated from the unary network.
To solve this problem, we added the box regression module to the weakly supervised object detection network and proposed a proposal scoring network (PSNet) to supervise it.
Ranked #12 on Weakly Supervised Object Detection on PASCAL VOC 2007
Pretraining reinforcement learning methods with demonstrations has been an important concept in the study of reinforcement learning since a large amount of computing power is spent on online simulations with existing reinforcement learning algorithms.
Then in the top-down step, the refined object regions are used as supervision to train the segmentation network and to predict object masks.
We apply our method to two of the typical actor-critic reinforcement learning algorithms, DDPG and ACER, and demonstrate with experiments that our method not only outperforms the RL algorithms without pretraining process, but also is more simulation efficient.
We encode the sparse 3D point cloud with a compact multi-view representation.
In this paper, we propose a novel edge preserving and multi-scale contextual neural network for salient object detection.
We then exploit a CNN on top of these proposals to perform object detection.
The focus of this paper is on proposal generation.
Ranked #8 on Vehicle Pose Estimation on KITTI Cars Hard
The goal of this paper is to generate high-quality 3D object proposals in the context of autonomous driving.
Ranked #10 on Vehicle Pose Estimation on KITTI Cars Hard
Based on the characteristics of superpixel tightness distribution, we propose an effective method, namely multi-thresholding straddling expansion (MTSE) to reduce localization bias via fast diversification.