# Channel Pruning for Accelerating Very Deep Neural Networks

In this paper, we introduce a new channel pruning method to accelerate very deep convolutional neural networks. Given a trained CNN model, we propose an iterative two-step algorithm to effectively prune each layer, by a LASSO regression based channel selection and least square reconstruction.

726

# DSOD: Learning Deeply Supervised Object Detectors from Scratch

State-of-the-art object objectors rely heavily on the off-the-shelf networks pre-trained on large-scale classification datasets like ImageNet, which incurs learning bias due to the difference on both the loss functions and the category distributions between classification and detection tasks.

625

# Video Frame Interpolation via Adaptive Separable Convolution

Our method develops a deep fully convolutional neural network that takes two input frames and estimates pairs of 1D kernels for all pixels simultaneously.

548

# Flow-Guided Feature Aggregation for Video Object Detection

The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.

500

# Temporal Action Detection with Structured Segment Networks

Detecting actions in untrimmed videos is an important yet challenging task.

436

We propose a technique for producing "visual explanations" for decisions from a large class of CNN-based models, making them more transparent.

409

# Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization

Gatys et al. recently introduced a neural algorithm that renders a content image in the style of another image, achieving so-called style transfer.

384

# Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach

We propose a weakly-supervised transfer learning method that uses mixed 2D and 3D labels in a unified deep neutral network that presents two-stage cascaded structure.

349

# Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

283

# StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.

269