Search Results for author: Wanli Ouyang

Found 158 papers, 65 papers with code

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

1 code implementation14 Jun 2022 Jiajun Deng, Zhengyuan Yang, Daqing Liu, Tianlang Chen, Wengang Zhou, Yanyong Zhang, Houqiang Li, Wanli Ouyang

For another, we devise Language Conditioned Vision Transformer that removes external fusion modules and reuses the uni-modal ViT for vision-language fusion at the intermediate layers.

Visual Grounding

Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation

no code implementations13 Jun 2022 Zengyu Qiu, Xinzhu Ma, Kunlin Yang, Chunya Liu, Jun Hou, Shuai Yi, Wanli Ouyang

More importantly, our DPK makes the performance of the student model is positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.

Knowledge Distillation object-detection +1

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

no code implementations10 May 2022 Haiyang Yang, Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Wanli Ouyang

While recent self-supervised learning methods have achieved good performances with evaluation set on the same domain as the training set, they will have an undesirable performance decrease when tested on a different domain.

Self-Supervised Learning

Unsupervised Learning of Accurate Siamese Tracking

1 code implementation CVPR 2022 Qiuhong Shen, Lei Qiao, Jinyang Guo, Peixia Li, Xin Li, Bo Li, Weitao Feng, Weihao Gan, Wei Wu, Wanli Ouyang

As unlimited self-supervision signals can be obtained by tracking a video along a cycle in time, we investigate evolving a Siamese tracker by tracking videos forward-backward.

Visual Object Tracking

SeCo: Separating Unknown Musical Visual Sounds with Consistency Guidance

no code implementations25 Mar 2022 Xinchi Zhou, Dongzhan Zhou, Wanli Ouyang, Hang Zhou, Ziwei Liu, Di Hu

Recent years have witnessed the success of deep learning on the visual sound separation task.

DR.VIC: Decomposition and Reasoning for Video Individual Counting

1 code implementation CVPR 2022 Tao Han, Lei Bai, Junyu Gao, Qi Wang, Wanli Ouyang

Instead of relying on the Multiple Object Tracking (MOT) techniques, we propose to solve the problem by decomposing all pedestrians into the initial pedestrians who existed in the first frame and the new pedestrians with separate identities in each following frame.

Crowd Counting Density Estimation +2

Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking

no code implementations10 Mar 2022 BoYu Chen, Peixia Li, Lei Bai, Lei Qiao, Qiuhong Shen, Bo Li, Weihao Gan, Wei Wu, Wanli Ouyang

Exploiting a general-purpose neural architecture to replace hand-wired designs or inductive biases has recently drawn extensive interest.

Visual Object Tracking

Multi-class Token Transformer for Weakly Supervised Semantic Segmentation

1 code implementation CVPR 2022 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Dan Xu

To this end, we propose a Multi-class Token Transformer, termed as MCTformer, which uses multiple class tokens to learn interactions between the class tokens and the patch tokens.

Object Localization Weakly-Supervised Semantic Segmentation

$β$-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

1 code implementation3 Mar 2022 Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang

Neural Architecture Search~(NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural networks automatically.

Neural Architecture Search

3D Object Detection from Images for Autonomous Driving: A Survey

1 code implementation7 Feb 2022 Xinzhu Ma, Wanli Ouyang, Andrea Simonelli, Elisa Ricci

3D object detection from images, one of the fundamental and challenging problems in autonomous driving, has received increasing attention from both industry and academia in recent years.

3D Object Detection Autonomous Driving +1

Trajectory Forecasting from Detection with Uncertainty-Aware Motion Encoding

no code implementations3 Feb 2022 Pu Zhang, Lei Bai, Jianru Xue, Jianwu Fang, Nanning Zheng, Wanli Ouyang

Trajectories obtained from object detection and tracking are inevitably noisy, which could cause serious forecasting errors to predictors built on ground truth trajectories.

object-detection Object Detection +1

Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

no code implementations ICLR 2022 Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

PL approaches apply pseudo-labels to unlabeled data, and then train the model with a combination of the labeled and pseudo-labeled data iteratively.

RePre: Improving Self-Supervised Vision Transformer with Reconstructive Pre-training

no code implementations18 Jan 2022 Luya Wang, Feng Liang, Yangguang Li, Honggang Zhang, Wanli Ouyang, Jing Shao

Recently, self-supervised vision transformers have attracted unprecedented attention for their impressive representation learning ability.

Contrastive Learning Representation Learning

Accelerating Neural Network Optimization Through an Automated Control Theory Lens

no code implementations CVPR 2022 Jiahao Wang, Baoyuan Wu, Rui Su, Mingdeng Cao, Shuwei Shi, Wanli Ouyang, Yujiu Yang

We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets.

b-DARTS: Beta-Decay Regularization for Differentiable Architecture Search

1 code implementation CVPR 2022 Peng Ye, Baopu Li, Yikang Li, Tao Chen, Jiayuan Fan, Wanli Ouyang

Neural Architecture Search (NAS) has attracted increasingly more attention in recent years because of its capability to design deep neural network automatically.

Neural Architecture Search

A Continuous Mapping For Augmentation Design

no code implementations NeurIPS 2021 Keyu Tian, Chen Lin, Ser Nam Lim, Wanli Ouyang, Puneet Dokania, Philip Torr

Automated data augmentation (ADA) techniques have played an important role in boosting the performance of deep models.

Data Augmentation

Unsupervised Representation Learning for 3D Point Cloud Data

no code implementations13 Oct 2021 Jincen Jiang, Xuequan Lu, Wanli Ouyang, Meili Wang

Though a number of point cloud learning methods have been proposed to handle unordered points, most of them are supervised and require labels for training.

3D Object Classification Classification +2

Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm

2 code implementations ICLR 2022 Yangguang Li, Feng Liang, Lichen Zhao, Yufeng Cui, Wanli Ouyang, Jing Shao, Fengwei Yu, Junjie Yan

Recently, large-scale Contrastive Language-Image Pre-training (CLIP) has attracted unprecedented attention for its impressive zero-shot recognition ability and excellent transferability to downstream tasks.

Zero-Shot Learning

Deep Instance Segmentation with Automotive Radar Detection Points

no code implementations5 Oct 2021 Jianan Liu, Weiyi Xiong, Liping Bai, Yuxuan Xia, Tao Huang, Wanli Ouyang, Bing Zhu

Automotive radar provides reliable environmental perception in all-weather conditions with affordable cost, but it hardly supplies semantic and geometry information due to the sparsity of radar detection points.

Autonomous Driving Instance Segmentation +1

Towards Balanced Learning for Instance Recognition

no code implementations23 Aug 2021 Jiangmiao Pang, Kai Chen, Qi Li, Zhihai Xu, Huajun Feng, Jianping Shi, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

BN-NAS: Neural Architecture Search with Batch Normalization

1 code implementation ICCV 2021 BoYu Chen, Peixia Li, Baopu Li, Chen Lin, Chuming Li, Ming Sun, Junjie Yan, Wanli Ouyang

We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS).

Neural Architecture Search

PSViT: Better Vision Transformer via Token Pooling and Attention Sharing

no code implementations7 Aug 2021 BoYu Chen, Peixia Li, Baopu Li, Chuming Li, Lei Bai, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang

Then, a compact set of the possible combinations for different token pooling and attention sharing mechanisms are constructed.

Geometry Uncertainty Projection Network for Monocular 3D Object Detection

1 code implementation ICCV 2021 Yan Lu, Xinzhu Ma, Lei Yang, Tianzhu Zhang, Yating Liu, Qi Chu, Junjie Yan, Wanli Ouyang

In this paper, we propose a Geometry Uncertainty Projection Network (GUP Net) to tackle the error amplification problem at both inference and training stages.

Depth Estimation Monocular 3D Object Detection +1

Leveraging Auxiliary Tasks with Affinity Learning for Weakly Supervised Semantic Segmentation

1 code implementation ICCV 2021 Lian Xu, Wanli Ouyang, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel, Dan Xu

Motivated by the significant inter-task correlation, we propose a novel weakly supervised multi-task framework termed as AuxSegNet, to leverage saliency detection and multi-label image classification as auxiliary tasks to improve the primary task of semantic segmentation using only image-level ground-truth labels.

Auxiliary Learning Multi-Label Image Classification +3

Mutual CRF-GNN for Few-Shot Learning

no code implementations CVPR 2021 Shixiang Tang, Dapeng Chen, Lei Bai, Kaijian Liu, Yixiao Ge, Wanli Ouyang

In this MCGN, the labels and features of support data are used by the CRF for inferring GNN affinities in a principled and probabilistic way.

Few-Shot Learning

Layerwise Optimization by Gradient Decomposition for Continual Learning

no code implementations CVPR 2021 Shixiang Tang, Dapeng Chen, Jinguo Zhu, Shijie Yu, Wanli Ouyang

The gradient for update should be close to the gradient of the new task, consistent with the gradients shared by all old tasks, and orthogonal to the space spanned by the gradients specific to the old tasks.

Continual Learning

Delving into Localization Errors for Monocular 3D Object Detection

1 code implementation CVPR 2021 Xinzhu Ma, Yinmin Zhang, Dan Xu, Dongzhan Zhou, Shuai Yi, Haojie Li, Wanli Ouyang

Estimating 3D bounding boxes from monocular images is an essential component in autonomous driving, while accurate 3D object detection from this kind of data is very challenging.

Autonomous Driving Monocular 3D Object Detection +1

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations23 Mar 2021 Shixiang Tang, Peng Su, Dapeng Chen, Wanli Ouyang

To better understand this issue, we study the problem of continual domain adaptation, where the model is presented with a labelled source domain and a sequence of unlabelled target domains.

Contrastive Learning Domain Adaptation

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations8 Jan 2021 Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

BSDS500 Graph Attention +2

Inception Convolution with Efficient Dilation Search

1 code implementation CVPR 2021 Jie Liu, Chuming Li, Feng Liang, Chen Lin, Ming Sun, Junjie Yan, Wanli Ouyang, Dong Xu

To develop a practical method for learning complex inception convolution based on the data, a simple but effective search algorithm, referred to as efficient dilation optimization (EDO), is developed.

Human Detection Instance Segmentation +4

DETR for Crowd Pedestrian Detection

1 code implementation12 Dec 2020 Matthieu Lin, Chuming Li, Xingyuan Bu, Ming Sun, Chen Lin, Junjie Yan, Wanli Ouyang, Zhidong Deng

Furthermore, the bipartite match of ED harms the training efficiency due to the large ground truth number in crowd scenes.

Pedestrian Detection

Full Matching on Low Resolution for Disparity Estimation

no code implementations10 Dec 2020 Hong Zhang, Shenglun Chen, Zhihui Wang, Haojie Li, Wanli Ouyang

To this end, we first propose to decompose the full matching task into multiple stages of the cost aggregation module.

Disparity Estimation

Direct Depth Learning Network for Stereo Matching

no code implementations10 Dec 2020 Hong Zhang, Haojie Li, Shenglun Chen, Tiantian Yan, Zhihui Wang, Guo Lu, Wanli Ouyang

To make the Adaptive-Grained Depth Refinement stage robust to the coarse depth and adaptive to the depth range of the points, the Granularity Uncertainty is introduced to Adaptive-Grained Depth Refinement stage.

Autonomous Driving Depth Estimation +1

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving

no code implementations27 Nov 2020 Zhenxun Yuan, Xiao Song, Lei Bai, Wengang Zhou, Zhe Wang, Wanli Ouyang

As a special design of this transformer, the information encoded in the encoder is different from that in the decoder, i. e. the encoder encodes temporal-channel information of multiple frames while the decoder decodes the spatial-channel information for the current frame in a voxel-wise manner.

3D Object Detection Autonomous Driving +2

Adaptive Gradient Method with Resilience and Momentum

no code implementations21 Oct 2020 Jie Liu, Chen Lin, Chuming Li, Lu Sheng, Ming Sun, Junjie Yan, Wanli Ouyang

Several variants of stochastic gradient descent (SGD) have been proposed to improve the learning effectiveness and efficiency when training deep neural networks, among which some recent influential attempts would like to adaptively control the parameter-wise learning rate (e. g., Adam and RMSProp).

Category-specific Semantic Coherency Learning for Fine-grained Image Recognition

no code implementations12 Oct 2020 Shijie Wang, Zhihui Wang, Haojie Li, Wanli Ouyang

Existing deep learning based weakly supervised fine-grained image recognition (WFGIR) methods usually pick out the discriminative regions from the high-level feature (HLF) maps directly.

Fine-Grained Image Recognition

Improving Auto-Augment via Augmentation-Wise Weight Sharing

1 code implementation NeurIPS 2020 Keyu Tian, Chen Lin, Ming Sun, Luping Zhou, Junjie Yan, Wanli Ouyang

On CIFAR-10, we achieve a top-1 error rate of 1. 24%, which is currently the best performing single model without extra training data.

SAMOT: Switcher-Aware Multi-Object Tracking and Still Another MOT Measure

no code implementations22 Sep 2020 Weitao Feng, Zhihao Hu, Baopu Li, Weihao Gan, Wei Wu, Wanli Ouyang

Besides, we propose a new MOT evaluation measure, Still Another IDF score (SAIDF), aiming to focus more on identity issues. This new measure may overcome some problems of the previous measures and provide a better insight for identity issues in MOT.

Multi-Object Tracking

Improving Deep Video Compression by Resolution-adaptive Flow Coding

no code implementations ECCV 2020 Zhihao Hu, Zhenghao Chen, Dong Xu, Guo Lu, Wanli Ouyang, Shuhang Gu

In this work, we propose a new framework called Resolution-adaptive Flow Coding (RaFC) to effectively compress the flow maps globally and locally, in which we use multi-resolution representations instead of single-resolution representations for both the input flow maps and the output motion features of the MV encoder.

Optical Flow Estimation Video Compression

Exploring the Hierarchy in Relation Labels for Scene Graph Generation

no code implementations12 Sep 2020 Yi Zhou, Shuyang Sun, Chao Zhang, Yikang Li, Wanli Ouyang

By assigning each relationship a single label, current approaches formulate the relationship detection as a classification problem.

Graph Generation Scene Graph Generation

Rethinking Pseudo-LiDAR Representation

1 code implementation ECCV 2020 Xinzhu Ma, Shinan Liu, Zhiyi Xia, Hongwen Zhang, Xingyu Zeng, Wanli Ouyang

Based on this observation, we design an image based CNN detector named Patch-Net, which is more generalized and can be instantiated as pseudo-LiDAR based 3D detectors.

Differentiable Hierarchical Graph Grouping for Multi-Person Pose Estimation

no code implementations ECCV 2020 Sheng Jin, Wentao Liu, Enze Xie, Wenhai Wang, Chen Qian, Wanli Ouyang, Ping Luo

The modules of HGG can be trained end-to-end with the keypoint detection network and is able to supervise the grouping process in a hierarchical manner.

Graph Clustering Human Detection +1

Whole-Body Human Pose Estimation in the Wild

2 code implementations ECCV 2020 Sheng Jin, Lumin Xu, Jin Xu, Can Wang, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo

This paper investigates the task of 2D human whole-body pose estimation, which aims to localize dense landmarks on the entire human body including face, hands, body, and feet.

Facial Landmark Detection Hand Pose Estimation +1

3D Human Mesh Regression with Dense Correspondence

1 code implementation CVPR 2020 Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

3D Human Pose Estimation 3D Human Reconstruction

Scope Head for Accurate Localization in Object Detection

no code implementations11 May 2020 Geng Zhan, Dan Xu, Guo Lu, Wei Wu, Chunhua Shen, Wanli Ouyang

Existing anchor-based and anchor-free object detectors in multi-stage or one-stage pipelines have achieved very promising detection performance.

object-detection Object Detection

Content Adaptive and Error Propagation Aware Deep Video Compression

no code implementations ECCV 2020 Guo Lu, Chunlei Cai, Xiaoyun Zhang, Li Chen, Wanli Ouyang, Dong Xu, Zhiyong Gao

Therefore, the encoder is adaptive to different video contents and achieves better compression performance by reducing the domain gap between the training and testing datasets.

Video Compression

Channel Pruning Guided by Classification Loss and Feature Importance

no code implementations15 Mar 2020 Jinyang Guo, Wanli Ouyang, Dong Xu

To this end, we propose a new strategy to suppress the influence of unimportant features (i. e., the features will be removed at the next pruning stage).

Classification Feature Importance +1

Equalization Loss for Long-Tailed Object Recognition

1 code implementation CVPR 2020 Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li, Wanli Ouyang, Changqing Yin, Junjie Yan

Based on it, we propose a simple but effective loss, named equalization loss, to tackle the problem of long-tailed rare categories by simply ignoring those gradients for rare categories.

object-detection Object Detection +1

EcoNAS: Finding Proxies for Economical Neural Architecture Search

no code implementations CVPR 2020 Dongzhan Zhou, Xinchi Zhou, Wenwei Zhang, Chen Change Loy, Shuai Yi, Xuesen Zhang, Wanli Ouyang

While many methods have been proposed to improve the efficiency of NAS, the search progress is still laborious because training and evaluating plausible architectures over large search space is time-consuming.

Neural Architecture Search

Learning 3D Human Shape and Pose from Dense Body Parts

1 code implementation31 Dec 2019 Hongwen Zhang, Jie Cao, Guo Lu, Wanli Ouyang, Zhenan Sun

Reconstructing 3D human shape and pose from monocular images is challenging despite the promising results achieved by the most recent learning-based methods.

3D human pose and shape estimation 3D Human Reconstruction +2

Computation Reallocation for Object Detection

no code implementations ICLR 2020 Feng Liang, Chen Lin, Ronghao Guo, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

However, classification allocation pattern is usually adopted directly to object detector, which is proved to be sub-optimal.

Instance Segmentation Neural Architecture Search +3

A Shape Transformation-based Dataset Augmentation Framework for Pedestrian Detection

no code implementations15 Dec 2019 Zhe Chen, Wanli Ouyang, Tongliang Liu, DaCheng Tao

Alternatively, to access much more natural-looking pedestrians, we propose to augment pedestrian detection datasets by transforming real pedestrians from the same dataset into different shapes.

Pedestrian Detection

TRB: A Novel Triplet Representation for Understanding 2D Human Body

2 code implementations ICCV 2019 Haodong Duan, Kwan-Yee Lin, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang

In this paper, we propose the Triplet Representation for Body (TRB) -- a compact 2D human body representation, with skeleton keypoints capturing human pose information and contour keypoints containing human shape information.

Conditional Image Generation

Improving One-shot NAS by Suppressing the Posterior Fading

no code implementations CVPR 2020 Xiang Li, Chen Lin, Chuming Li, Ming Sun, Wei Wu, Junjie Yan, Wanli Ouyang

In this paper, we analyse existing weight sharing one-shot NAS approaches from a Bayesian point of view and identify the posterior fading problem, which compromises the effectiveness of shared weights.

Neural Architecture Search object-detection +2

IntersectGAN: Learning Domain Intersection for Generating Images with Multiple Attributes

no code implementations21 Sep 2019 Zehui Yao, Boyan Zhang, Zhiyong Wang, Wanli Ouyang, Dong Xu, Dagan Feng

For example, given two image domains $X_1$ and $X_2$ with certain attributes, the intersection $X_1 \cap X_2$ denotes a new domain where images possess the attributes from both $X_1$ and $X_2$ domains.

GradNet: Gradient-Guided Network for Visual Object Tracking

2 code implementations ICCV 2019 Peixia Li, Bo-Yu Chen, Wanli Ouyang, Dong Wang, Xiaoyun Yang, Huchuan Lu

In this work, we propose a novel gradient-guided network to exploit the discriminative information in gradients and update the template in the siamese network through feed-forward and backward operations.

Ranked #3 on Visual Object Tracking on OTB-2015 (Precision metric)

Template Matching Visual Object Tracking +1

Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection

1 code implementation ICCV 2019 Yingyue Xu, Dan Xu, Xiaopeng Hong, Wanli Ouyang, Rongrong Ji, Min Xu, Guoying Zhao

We formulate the CRF graphical model that involves message-passing of feature-feature, feature-prediction, and prediction-prediction, from the coarse scale to the finer scale, to update the features and the corresponding predictions.

object-detection RGB Salient Object Detection +1

Crowd Counting with Deep Structured Scale Integration Network

no code implementations ICCV 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Shufan Liu, Wanli Ouyang, Liang Lin

Automatic estimation of the number of people in unconstrained crowded scenes is a challenging task and one major difficulty stems from the huge scale variation of people.

Crowd Counting Representation Learning

Improving Action Localization by Progressive Cross-stream Cooperation

no code implementations CVPR 2019 Rui Su, Wanli Ouyang, Luping Zhou, Dong Xu

Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models.

Action Classification Action Detection +2

AM-LFS: AutoML for Loss Function Search

1 code implementation ICCV 2019 Chuming Li, Yuan Xin, Chen Lin, Minghao Guo, Wei Wu, Wanli Ouyang, Junjie Yan

The key contribution of this work is the design of search space which can guarantee the generalization and transferability on different vision tasks by including a bunch of existing prevailing loss functions in a unified formulation.


Contextualized Spatial-Temporal Network for Taxi Origin-Destination Demand Prediction

no code implementations15 May 2019 Lingbo Liu, Zhilin Qiu, Guanbin Li, Qing Wang, Wanli Ouyang, Liang Lin

Finally, a GCC module is applied to model the correlation between all regions by computing a global correlation feature as a weighted sum of all regional features, with the weights being calculated as the similarity between the corresponding region pairs.

Libra R-CNN: Towards Balanced Learning for Object Detection

5 code implementations CVPR 2019 Jiangmiao Pang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin

In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.

object-detection Object Detection

Feature Intertwiner for Object Detection

2 code implementations ICLR 2019 Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang

We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.

object-detection Object Detection

Accurate Monocular Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving

no code implementations27 Mar 2019 Xinzhu Ma, Zhihui Wang, Haojie Li, Peng-Bo Zhang, Xin Fan, Wanli Ouyang

To this end, we first leverage a stand-alone module to transform the input data from 2D image plane to 3D point clouds space for a better input representation, then we perform the 3D detection using PointNet backbone net to obtain objects 3D locations, dimensions and orientations.

3D Reconstruction Autonomous Driving +2

SR-LSTM: State Refinement for LSTM towards Pedestrian Trajectory Prediction

1 code implementation CVPR 2019 Pu Zhang, Wanli Ouyang, Pengfei Zhang, Jianru Xue, Nanning Zheng

In order to address this issue, we propose a data-driven state refinement module for LSTM network (SR-LSTM), which activates the utilization of the current intention of neighbors, and jointly and iteratively refines the current states of all participants in the crowd through a message passing mechanism.

Pedestrian Trajectory Prediction Trajectory Prediction

Hybrid Task Cascade for Instance Segmentation

5 code implementations CVPR 2019 Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin

In exploring a more effective approach, we find that the key to a successful instance segmentation cascade is to fully leverage the reciprocal relationship between detection and segmentation.

Instance Segmentation object-detection +2

Multi-Object Tracking with Multiple Cues and Switcher-Aware Classification

no code implementations18 Jan 2019 Weitao Feng, Zhihao Hu, Wei Wu, Junjie Yan, Wanli Ouyang

In this paper, we propose a unified Multi-Object Tracking (MOT) framework learning to make full use of long term and short term cues for handling complex cases in MOT scenes.

General Classification Multi-Object Tracking

FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction

4 code implementations NeurIPS 2018 Shuyang Sun, Jiangmiao Pang, Jianping Shi, Shuai Yi, Wanli Ouyang

The basic principles in designing convolutional neural network (CNN) structures for predicting objects on different levels, e. g., image-level, region-level, and pixel-level are diverging.

Image Classification

DVC: An End-to-end Deep Video Compression Framework

4 code implementations CVPR 2019 Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao

Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information.

MS-SSIM Optical Flow Estimation +2

Deep Learning for Generic Object Detection: A Survey

no code implementations6 Sep 2018 Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images.

object-detection Object Proposal Generation

Dividing and Aggregating Network for Multi-view Action Recognition

no code implementations ECCV 2018 Dongang Wang, Wanli Ouyang, Wen Li, Dong Xu

We then train view-specific action classifiers based on the view-specific representation for each view and a view classifier based on the shared representation at lower layers.

Action Recognition

Deep Kalman Filtering Network for Video Compression Artifact Reduction

1 code implementation ECCV 2018 Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Zhiyong Gao, Ming-Ting Sun

In this paper, we model the video artifact reduction task as a Kalman filtering procedure and restore decoded frames through a deep Kalman filtering network.

Video Compression

Neural Network Encapsulation

2 code implementations ECCV 2018 Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang

Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.

Person Search via A Mask-Guided Two-Stream CNN Model

no code implementations ECCV 2018 Di Chen, Shanshan Zhang, Wanli Ouyang, Jian Yang, Ying Tai

In this work, we tackle the problem of person search, which is a challenging task consisted of pedestrian detection and person re-identification~(re-ID).

Pedestrian Detection Person Re-Identification +1

Crowd Counting using Deep Recurrent Spatial-Aware Network

no code implementations2 Jul 2018 Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin

Crowd counting from unconstrained scene images is a crucial task in many real-world applications like urban surveillance and management, but it is greatly challenged by the camera's perspective that causes huge appearance variations in people's scales and rotations.

Crowd Counting

Collaborative and Adversarial Network for Unsupervised Domain Adaptation

1 code implementation CVPR 2018 Weichen Zhang, Wanli Ouyang, Wen Li, Dong Xu

In this paper, we propose a new unsupervised domain adaptation approach called Collaborative and Adversarial Network (CAN) through domain-collaborative and domain-adversarial training of neural networks.

Unsupervised Domain Adaptation

Mask-Guided Contrastive Attention Model for Person Re-Identification

1 code implementation CVPR 2018 Chunfeng Song, Yan Huang, Wanli Ouyang, Liang Wang

We may be the first one to successfully introduce the binary mask into person ReID task and the first one to propose region-level contrastive learning.

Contrastive Learning Person Re-Identification

Learnable Histogram: Statistical Context Features for Deep Neural Networks

no code implementations25 Apr 2018 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.

General Classification object-detection +2

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations CVPR 2018 Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

Monocular 3D Human Pose Estimation

Style Aggregated Network for Facial Landmark Detection

1 code implementation CVPR 2018 Xuanyi Dong, Yan Yan, Wanli Ouyang, Yi Yang

In this work, we propose a style-aggregated approach to deal with the large intrinsic variance of image styles for facial landmark detection.

 Ranked #1 on Facial Landmark Detection on AFLW-Front (Mean NME metric)

Face Alignment Facial Landmark Detection

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations NeurIPS 2017 Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

BSDS500 Contour Detection

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

1 code implementation CVPR 2018 Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei zhang

In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach.

Action Recognition Action Recognition In Videos +2

Visual Question Generation as Dual Task of Visual Question Answering

no code implementations CVPR 2018 Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang

Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately.

Question Answering Question Generation +2

Scene Graph Generation from Objects, Phrases and Region Captions

1 code implementation ICCV 2017 Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang

Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.

Graph Generation object-detection +3

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

no code implementations8 Jun 2017 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.

Scene Labeling

Quality Aware Network for Set to Set Recognition

1 code implementation CVPR 2017 Yu Liu, Junjie Yan, Wanli Ouyang

In this paper, the quality aware network (QAN) is proposed to confront this problem, where the quality of each sample can be automatically learned although such information is not explicitly provided in the training stage.

Face Verification Person Re-Identification

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

2 code implementations CVPR 2017 Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.

Pedestrian Detection

Multi-Context Attention for Human Pose Estimation

2 code implementations CVPR 2017 Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang

We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts.

Pose Estimation

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

1 code implementation23 Feb 2017 Wanli Ouyang, Ku Wang, Xin Zhu, Xiaogang Wang

In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages.

object-detection Object Detection +1

Object Detection in Videos with Tubelet Proposal Networks

no code implementations CVPR 2017 Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

object-detection Object Detection +1

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

2 code implementations CVPR 2017 Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang

Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

Classification General Classification +2

Zoom Out-and-In Network with Recursive Training for Object Proposal

1 code implementation19 Feb 2017 Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a zoom-out-and-in network for generating object proposals.

Crafting GBD-Net for Object Detection

1 code implementation8 Oct 2016 Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang

The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

object-detection Object Detection

End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

no code implementations CVPR 2016 Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts.

Pose Estimation

STCT: Sequentially Training Convolutional Networks for Visual Tracking

no code implementations CVPR 2016 Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu

To further improve the robustness of each base learner, we propose to train the convolutional layers with random binary masks, which serves as a regularization to enforce each base learner to focus on different input features.

Visual Tracking

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification

1 code implementation CVPR 2016 Tong Xiao, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.

Person Re-Identification

Object Detection from Video Tubelets with Convolutional Neural Networks

1 code implementation CVPR 2016 Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation.

Image Classification object-detection +3

Multi-Bias Non-linear Activation in Deep Neural Networks

no code implementations3 Apr 2016 Hongyang Li, Wanli Ouyang, Xiaogang Wang

It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers.

Structured Feature Learning for Pose Estimation

no code implementations CVPR 2016 Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a structured feature learning framework to reason the correlations among body joints at the feature level in human pose estimation.

Pose Estimation

Factors in Finetuning Deep Model for object detection

no code implementations20 Jan 2016 Wanli Ouyang, Xiaogang Wang, Cong Zhang, Xiaokang Yang

Our analysis and empirical results show that classes with more samples have higher impact on the feature learning.

object-detection Object Detection

Window-Object Relationship Guided Representation Learning for Generic Object Detections

no code implementations9 Dec 2015 Xingyu Zeng, Wanli Ouyang, Xiaogang Wang

We propose a representation learning pipeline to use the relationship as supervision for improving the learned representation in object detection.

object-detection Object Detection +1

Visual Tracking With Fully Convolutional Networks

no code implementations ICCV 2015 Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu

Instead of treating convolutional neural network (CNN) as a black-box feature extractor, we conduct in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet.

Object Tracking Visual Tracking

Learning Deep Representation With Large-Scale Attributes

no code implementations ICCV 2015 Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang

Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2. 6% in mAP on the ILSVRC 2014 object detection dataset and 2. 4% in mAP on PASCAL VOC 2007 object detection dataset.

object-detection Object Detection

Saliency Detection by Multi-Context Deep Learning

no code implementations CVPR 2015 Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification object-detection +3

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations11 Sep 2014 Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

object-detection Object Detection

Multi-source Deep Learning for Human Pose Estimation

no code implementations CVPR 2014 Wanli Ouyang, Xiao Chu, Xiaogang Wang

Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation.

Human Detection Pose Estimation

Learning Mid-level Filters for Person Re-identification

no code implementations CVPR 2014 Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.

Patch Matching Person Re-Identification

Single-Pedestrian Detection Aided by Multi-pedestrian Detection

no code implementations CVPR 2013 Wanli Ouyang, Xiaogang Wang

A probabilistic framework is proposed to model the relationship between the configurations estimated by singleand multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection.

Pedestrian Detection

Unsupervised Salience Learning for Person Re-identification

no code implementations CVPR 2013 Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.

Patch Matching Person Re-Identification

Modeling Mutual Visibility Relationship in Pedestrian Detection

no code implementations CVPR 2013 Wanli Ouyang, Xingyu Zeng, Xiaogang Wang

In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians.

Pedestrian Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.