In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction.
State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.
#6 best model for Object Detection on COCO
Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously.
The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.
Detecting actions in untrimmed videos is an important yet challenging task.
We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent.
Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer.
We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.
#4 best model for 3D Human Pose Estimation on Human3.6M
In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.