Pyramid Scene Parsing Network

Face Identification Face Recognition +1

9,961

Paper
Code

Deep Learning Face Representation by Joint Identification-Verification

3 code implementations • NeurIPS 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang

The learned DeepID2 features can be well generalized to new identities unseen in the training data.

9,961

Paper
Code

Deep Learning Face Representation from Predicting 10,000 Classes

4 code implementations • 1 Jan 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang

When learned as classifiers to recognize about 10, 000 face identities in the training set and configured to keep reducing the neuron numbers along the feature extraction hierarchy, these deep ConvNets gradually form compact identity-related features in the top layers with only a small number of hidden neurons.

Ranked #6 on Face Verification on Labeled Faces in the Wild

Face Identification Face Verification

9,961

Paper
Code

Context Encoding for Semantic Segmentation

12 code implementations • CVPR 2018 • Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal

In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps.

Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 test

Image Classification Segmentation +2

8,238

Paper
Code

ViPNAS: Efficient Video Pose Estimation via Neural Architecture Search

4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Human pose estimation has achieved significant progress in recent years.

Ranked #23 on Pose Estimation on COCO test-dev (using extra training data)

Neural Architecture Search Pose Estimation

4,982

Paper
Code

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

6 code implementations • 8 Jul 2019 • Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.

3D Object Detection Object +2

4,790

Paper
Code

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

12 code implementations • CVPR 2020 • Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.

Ranked #1 on Birds Eye View Object Detection on KITTI Cyclists Easy

3D Object Detection Object +1

4,308

Paper
Code

PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

1 code implementation • 31 Jan 2021 • Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields.

Ranked #2 on 3D Object Detection on KITTI Cars Easy val

4,308

Paper
Code

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

2 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.

Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)

Classification Image Classification +3

2,303

Paper
Code

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks

2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai

In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.

Language Modelling Multi-Task Learning

2,303

Paper
Code

DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images

5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo

A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.

Pose Estimation Retrieval +1

2,156

Paper
Code

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

16 code implementations • 19 Oct 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

Ranked #5 on Text-to-Image Generation on Oxford 102 Flowers

Generative Adversarial Network Text-to-Image Generation

1,850

Paper
Code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

21 code implementations • ICCV 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.

Ranked #3 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)

Text-to-Image Generation

1,850

Paper
Code

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

13 code implementations • CVPR 2019 • Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

In this paper, we propose PointRCNN for 3D object detection from raw point cloud.

Ranked #2 on Object Detection on KITTI Cars Moderate

object-detection Object Proposal Generation +1

1,671

Paper
Code

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

2 code implementations • NeurIPS 2021 • Mingkai Zheng, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations.

Ranked #78 on Self-Supervised Image Classification on ImageNet

Contrastive Learning Relation +2

1,355

Paper
Code

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Ranked #3 on Person Re-Identification on CUHK03

Generative Adversarial Network Person Re-Identification

1,267

Paper
Code

Spatial As Deep: Spatial CNN for Traffic Scene Understanding

8 code implementations • 17 Dec 2017 • Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang

Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.

Ranked #50 on Lane Detection on CULane

Lane Detection Scene Understanding

1,022

Paper
Code

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation • CVPR 2021 • Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

904

Paper
Code

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

1 code implementation • 20 Jul 2018 • Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.

Lip Reading Retrieval +2

812

Paper
Code

Joint Detection and Identification Feature Learning for Person Search

2 code implementations • CVPR 2017 • Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, Xiaogang Wang

Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates.

Ranked #9 on Person Re-Identification on CUHK03

Pedestrian Detection Person Re-Identification +1

732

Paper
Code

ZoomNAS: Searching for Whole-body Human Pose Estimation in the Wild

1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.

Ranked #2 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation Neural Architecture Search +1

708

Paper
Code

Residual Attention Network for Image Classification

21 code implementations • CVPR 2017 • Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang

In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion.

Ranked #635 on Image Classification on ImageNet

General Classification Image Classification +1

663

Paper
Code

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

1 code implementation • 25 May 2023 • Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai

These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions.

Common Sense Reasoning Navigate +1

566

Paper
Code

3D Object Detection for Autonomous Driving: A Comprehensive Survey

1 code implementation • 19 Jun 2022 • Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Autonomous driving, in recent years, has been receiving increasing attention for its potential to relieve drivers' burdens and improve the safety of driving.

3D Object Detection Autonomous Driving +1

491

Paper
Code

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

1 code implementation • CVPR 2020 • Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang

Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.

3D Face Modelling Data Augmentation +1

482

Paper
Code

Multi-Context Attention for Human Pose Estimation

2 code implementations • CVPR 2017 • Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang

We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts.

Ranked #8 on Pose Estimation on Leeds Sports Poses

General Classification Instance Segmentation +6

480

Paper
Code

1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation

2 code implementations • 17 Mar 2020 • Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang

Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.

453

Paper
Code

Revisiting the Sibling Head in Object Detector

2 code implementations • CVPR 2020 • Guanglu Song, Yu Liu, Xiaogang Wang

The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.

Ranked #67 on Object Detection on COCO test-dev

Disentanglement General Classification +4

453

Paper
Code

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

1 code implementation • 9 Apr 2016 • Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang

Temporal and contextual information of videos are not fully investigated and utilized.

Novel Object Detection Object +3

369

Paper
Code

Fashion Landmark Detection in the Wild

4 code implementations • 10 Aug 2016 • Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, Xiaoou Tang

Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.

Attribute Pose Estimation +1

348

Paper
Code

Group-wise Correlation Stereo Network

2 code implementations • CVPR 2019 • Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.

Autonomous Driving Stereo Matching +1

316

Paper
Code

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

2 code implementations • CVPR 2017 • Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.

Image Captioning Image Classification +6

289

Paper
Code

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

1 code implementation • CVPR 2022 • Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

250

Paper
Code

Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

1 code implementation • 9 Jun 2022 • Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai

To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.

250

Paper
Code

HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis

2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang

Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.

Ranked #2 on Pedestrian Attribute Recognition on RAP

Attribute Pedestrian Attribute Recognition +1

243

Paper
Code

Recurrent Scale Approximation for Object Detection in CNN

1 code implementation • ICCV 2017 • Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang

To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid.

Ranked #3 on Face Detection on Annotated Faces in the Wild

Face Detection Object +2

238

Paper
Code

Learning Deep Feature Representations with Domain Guided Dropout for Person Re-identification

1 code implementation • CVPR 2016 • Tong Xiao, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.

Graph Generation object-detection +3

232

Paper
Code

Scene Graph Generation from Objects, Phrases and Region Captions

1 code implementation • ICCV 2017 • Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang

Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.

Ranked #2 on Object Detection on Visual Genome

226

Paper
Code

Learning Feature Pyramids for Human Pose Estimation

4 code implementations • ICCV 2017 • Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

We investigate our method on two standard benchmarks for human pose estimation.

Ranked #6 on Pose Estimation on Leeds Sports Poses

Clustering Graph Generation +3

221

Paper
Code

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation

1 code implementation • ECCV 2018 • Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, Xiaogang Wang

Generating scene graph to describe all the relations inside an image gains increasing interests these years.

Ranked #1 on Scene Graph Generation on VRD

213

Paper
Code

AdaCos: Adaptively Scaling Cosine Logits for Effectively Learning Deep Face Representations

3 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li

Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.

Ranked #6 on Face Verification on MegaFace

Face Recognition Face Verification

207

Paper
Code

Pose for Everything: Towards Category-Agnostic Pose Estimation

1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang

In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.

Ranked #4 on 2D Pose Estimation on MP-100

Category-Agnostic Pose Estimation Pose Estimation

183

Paper
Code

Crafting GBD-Net for Object Detection

1 code implementation • 8 Oct 2016 • Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang

The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.

Image Classification Object +4

182

Paper
Code

Object Detection from Video Tubelets with Convolutional Neural Networks

1 code implementation • CVPR 2016 • Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation.

182

Paper
Code

Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer

1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang

Vision transformers have achieved great successes in many computer vision tasks.

Ranked #4 on 2D Human Pose Estimation on COCO-WholeBody

2D Human Pose Estimation 3D Human Pose Estimation +1

180

Paper
Code

Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang

Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.

Image Generation Image Reconstruction +1

177

Paper
Code

KBNet: Kernel Basis Network for Image Restoration

1 code implementation • 6 Mar 2023 • Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li

In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation.

Ranked #1 on Color Image Denoising on McMaster sigma50

Color Image Denoising Deblurring +4

176

Paper
Code

Rethinking Feature Discrimination and Polymerization for Large-scale Recognition

1 code implementation • 2 Oct 2017 • Yu Liu, Hongyang Li, Xiaogang Wang

Feature matters.

Clustering Metric Learning

175

Paper
Code

Learning Deep Features via Congenerous Cosine Loss for Person Recognition

1 code implementation • 22 Feb 2017 • Yu Liu, Hongyang Li, Xiaogang Wang

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance.

Person Recognition

175

Paper
Code

3D Human Mesh Regression with Dense Correspondence

3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

Ranked #1 on 3D Human Reconstruction on Surreal

3D Human Pose Estimation 3D Human Reconstruction +1

164

Paper
Code

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Clustering Object +2

164

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

object-detection Object Detection

164

Paper
Code

Fast Convergence of DETR with Spatially Modulated Co-Attention

1 code implementation • ICCV 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

However, DETR suffers from its slow convergence.

object-detection Object Detection

164

Paper
Code

Monocular Depth Estimation using Multi-Scale Continuous CRFs as Sequential Deep Networks

1 code implementation • 1 Mar 2018 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe

Depth cues have been proved very useful in various computer vision and robotic tasks.

Monocular Depth Estimation

161

Paper
Code

Multi-Scale Continuous CRFs as Sequential Deep Networks for Monocular Depth Estimation

2 code implementations • CVPR 2017 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe

This paper addresses the problem of depth estimation from a single still image.

Ranked #14 on Depth Estimation on NYU-Depth V2

Monocular Depth Estimation

161

Paper
Code

Frozen CLIP Models are Efficient Video Learners

2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li

Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.

Ranked #26 on Action Classification on Kinetics-400 (using extra training data)

Action Classification Video Recognition

155

Paper
Code

Finding Task-Relevant Features for Few-Shot Learning by Category Traversal

1 code implementation • CVPR 2019 • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang

Few-shot learning is an important area of research.

Few-Shot Learning Metric Learning

154

Paper
Code

Person Search with Natural Language Description

1 code implementation • CVPR 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.

Attribute Person Search +1

143

Paper
Code

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

2 code implementations • CVPR 2017 • Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang

Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

Ranked #6 on Multi-Label Classification on NUS-WIDE

Classification General Classification +2

142

Paper
Code

Video Generation from Single Semantic Label Map

2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Image Generation Image to Video Generation +1

139

Paper
Code

RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust Correspondence Field Estimation and Pose Optimization

1 code implementation • CVPR 2022 • Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Wang, Hongsheng Li

The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover the object poses.

Ranked #1 on 6D Pose Estimation using RGB on LineMOD

6D Pose Estimation using RGB Object

136

Paper
Code

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis

1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li

Semantic image synthesis aims at generating photorealistic images from semantic layouts.

Ranked #5 on Image-to-Image Translation on Cityscapes Labels-to-Photo

Image-to-Image Translation Semantic Segmentation

128

Paper
Code

CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval

1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao

Text-image cross-modal retrieval is a challenging task in the field of language and vision.

Ranked #9 on Image Retrieval on Flickr30K 1K test

Cross-Modal Retrieval Image Retrieval +1

124

Paper
Code

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

1 code implementation • 14 Dec 2023 • Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD).

Autonomous Driving Motion Planning

119

Paper
Code

Spindle Net: Person Re-Identification With Human Body Region Guided Feature Decomposition and Fusion

1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang

Person re-identification (ReID) is an important task in video surveillance and has various applications.

115

Paper
Code

IDR: Self-Supervised Image Denoising via Iterative Data Refinement

1 code implementation • CVPR 2022 • Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

To evaluate raw image denoising performance in real-world applications, we build a high-quality raw image dataset SenseNoise-500 that contains 500 real-life scenes.

Image Denoising

110

Paper
Code

Shape2Motion: Joint Analysis of Motion Parts and Attributes from 3D Shapes

1 code implementation • CVPR 2019 • Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu

For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.

Attribute Segmentation

109

Paper
Code

Feature Intertwiner for Object Detection

2 code implementations • ICLR 2019 • Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang

We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.

Ranked #134 on Object Detection on COCO test-dev

107

Paper
Code

Rethinking Noise Synthesis and Modeling in Raw Denoising

1 code implementation • ICCV 2021 • Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li

However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors.

Ranked #2 on Image Denoising on SID SonyA7S2 x100

Image Denoising

106

Paper
Code

End-to-End Deep Kronecker-Product Matching for Person Re-identification

1 code implementation • CVPR 2018 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang

Person re-identification aims to robustly measure similarities between person images.

Person Re-Identification Retrieval

103

Paper
Code

Deep Group-shuffling Random Walk for Person Re-identification

1 code implementation • CVPR 2018 • Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.

103

Paper
Code

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

1 code implementation • ICCV 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.

Ranked #3 on Video Inpainting on DAVIS

Seeing Beyond the Visible Video Inpainting

102

Paper
Code

Convolutional neural networks with low-rank regularization

2 code implementations • 19 Nov 2015 • Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E

Recently, tensor decompositions have been used for speeding up CNNs.

Data Augmentation Tensor Decomposition

Paper
Code

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

1 code implementation • ECCV 2018 • Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang

Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.

Autonomous Driving Monocular Depth Estimation +3

Paper
Code

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

1 code implementation • ICCV 2021 • Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10. 44%, 5. 69%, 5. 97% mAP respectively on the official KITTI benchmark.

Ranked #2 on 3D Object Detection From Stereo Images on KITTI Cyclists Moderate

3D Object Detection From Stereo Images Stereo Matching

Paper
Code

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

2 code implementations • 7 Aug 2017 • Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.

Paper
Code

A Simple Baseline for Video Restoration with Grouped Spatial-temporal Shift

1 code implementation • CVPR 2023 • Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li

In this study, we propose a simple yet effective framework for video restoration.

Ranked #1 on Deblurring on GoPro (using extra training data)

Deblurring Denoising +3

Paper
Code

Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information

1 code implementation • CVPR 2023 • Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie zhou, Jifeng Dai

It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models.

Ranked #2 on Semantic Segmentation on ADE20K (using extra training data)

Image Classification Long-tailed Object Detection +3

Paper
Code

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

1 code implementation • CVPR 2021 • Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.

Contrastive Learning Generative Adversarial Network +1

Paper
Code

Object Detection in Videos with Tubelet Proposal Networks

1 code implementation • CVPR 2017 • Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

Object object-detection +2

Paper
Code

Demystify Transformers & Convolutions in Modern Image Deep Networks

1 code implementation • 10 Nov 2022 • Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai

Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.

Image Deep Networks Spatial Token Mixer

Paper
Code

SSN: Learning Sparse Switchable Normalization via SparsestMax

1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.

Paper
Code

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

1 code implementation • ICLR 2021 • Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai

In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.

Semantic Segmentation

Paper
Code

Point2Seq: Detecting 3D Objects as Sequences

1 code implementation • CVPR 2022 • Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang

We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.

3D Object Detection Object +1

Paper
Code

Decoupled Spatial-Temporal Transformer for Video Inpainting

1 code implementation • 14 Apr 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

Video Inpainting

Paper
Code

Structured Domain Adaptation with Online Relation Regularization for Unsupervised Person Re-ID

4 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li

To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.

Ranked #4 on Unsupervised Domain Adaptation on Market to MSMT

Pseudo Label Relation +3

Paper
Code

Deep Learning Face Attributes in the Wild

2 code implementations • ICCV 2015 • Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang

LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.

Ranked #6 on Facial Attribute Classification on LFWA

Attribute Facial Attribute Classification

Paper
Code

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

1 code implementation • ECCV 2020 • Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.

Image Manipulation

Paper
Code

Learning Degradation Representations for Image Deblurring

1 code implementation • 10 Aug 2022 • Dasong Li, Yi Zhang, Ka Chun Cheung, Xiaogang Wang, Hongwei Qin, Hongsheng Li

With the integration, MSDI-Net can handle various and complicated blurry patterns adaptively.

Ranked #13 on Image Deblurring on GoPro

Deblurring Image Deblurring +3

Paper
Code

Neural Network Encapsulation

2 code implementations • ECCV 2018 • Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang

Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.

Paper
Code

PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph

1 code implementation • NeurIPS 2019 • Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang

Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops.

Image Generation Object

Paper
Code

STAR: A Structure-Aware Lightweight Transformer for Real-Time Image Enhancement

1 code implementation • ICCV 2021 • Zhaoyang Zhang, Yitong Jiang, Jun Jiang, Xiaogang Wang, Ping Luo, Jinwei Gu

STAR is a general architecture that can be easily adapted to different image enhancement tasks.

Color Constancy Image Enhancement +3

Paper
Code

ViTAS: Vision Transformer Architecture Search

1 code implementation • 25 Jun 2021 • Xiu Su, Shan You, Jiyang Xie, Mingkai Zheng, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

Vision transformers (ViTs) inherited the success of NLP but their structures have not been sufficiently investigated and optimized for visual tasks.

Inductive Bias Neural Architecture Search

Paper
Code

Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling

1 code implementation • 27 Feb 2022 • Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li

The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans.

Motion Estimation

Paper
Code

Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering

1 code implementation • 18 Nov 2017 • Pan Lu, Hongsheng Li, Wei zhang, Jianyong Wang, Xiaogang Wang

Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering.

Ranked #2 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 open ended

Visual Question Answering

Paper
Code

Weakly Supervised Contrastive Learning

1 code implementation • ICCV 2021 • Mingkai Zheng, Fei Wang, Shan You, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

Specifically, our proposed framework is based on two projection heads, one of which will perform the regular instance discrimination task.

Ranked #24 on Semi-Supervised Image Classification on ImageNet - 1% labeled data

Contrastive Learning Representation Learning +2

Paper
Code

Siamese Image Modeling for Self-Supervised Vision Representation Learning

2 code implementations • CVPR 2023 • Chenxin Tao, Xizhou Zhu, Weijie Su, Gao Huang, Bin Li, Jie zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai

Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations.

Representation Learning Self-Supervised Learning +1

Paper
Code

Dynamic Token Normalization Improves Vision Transformers

1 code implementation • ICLR 2022 • Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo

It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.

Inductive Bias ListOps +2

Paper
Code

Graph Degree Linkage: Agglomerative Clustering on a Directed Graph

2 code implementations • 25 Aug 2012 • Wei Zhang, Xiaogang Wang, Deli Zhao, Xiaoou Tang

We explore the different roles of two fundamental concepts in graph theory, indegree and outdegree, in the context of clustering.

Ranked #1 on Image Clustering on Coil-20 (Accuracy metric)

Clustering Computational Efficiency +1

Paper
Code

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

1 code implementation • 13 Sep 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

A key observation is that it is difficult to classify anchors of different sizes with the same set of features.

Ranked #2 on Region Proposal on COCO test-dev

object-detection Object Detection +1

Paper
Code

Zoom Out-and-In Network with Recursive Training for Object Proposal

1 code implementation • 19 Feb 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a zoom-out-and-in network for generating object proposals.

Paper
Code

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

1 code implementation • 23 Feb 2017 • Wanli Ouyang, Ku Wang, Xin Zhu, Xiaogang Wang

In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages.

object-detection Object Detection +1

Paper
Code

Cascaded Refinement Network for Point Cloud Completion

1 code implementation • CVPR 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee

Point clouds are often sparse and incomplete.

Point Cloud Completion

Paper
Code

Cascaded Refinement Network for Point Cloud Completion with Self-supervision

1 code implementation • 17 Oct 2020 • Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee

This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications.

3D Object Classification Point Cloud Completion

Paper
Code

Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space

1 code implementation • 7 Jul 2022 • Wenqi Shao, Xun Zhao, Yixiao Ge, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo

It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive.

Ranked #2 on Transferability on classification benchmark

Transferability

Paper
Code

Channel Equilibrium Networks for Learning Deep Representation

1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.

Paper
Code

Real-time Controllable Denoising for Image and Video

1 code implementation • CVPR 2023 • Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, Xiaogang Wang, Ping Luo, Kaimo Lin, Jinwei Gu

Controllable image denoising aims to generate clean samples with human perceptual priors and balance sharpness and smoothness.

Image Denoising Video Denoising

Paper
Code

Voxel-based Network for Shape Completion by Leveraging Edge Generation

1 code implementation • ICCV 2021 • Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee

Deep learning technique has yielded significant improvements in point cloud completion with the aim of completing missing object shapes from partial inputs.

Point Cloud Completion

Paper
Code

Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

1 code implementation • CVPR 2021 • Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li

The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.

Ranked #1 on 3D Semantic Scene Completion on NYUv2

3D Semantic Scene Completion Scene Understanding

Paper
Code

Cached Transformers: Improving Transformers with Differentiable Memory Cache

1 code implementation • 20 Dec 2023 • Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo

This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.

Image Classification Instance Segmentation +6

Paper
Code

Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize

1 code implementation • 4 Mar 2019 • Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song

Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.

Image-to-Image Translation Stereo Matching +2

Paper
Code

CoNe: Contrast Your Neighbours for Supervised Image Classification

1 code implementation • 21 Aug 2023 • Mingkai Zheng, Shan You, Lang Huang, Xiu Su, Fei Wang, Chen Qian, Xiaogang Wang, Chang Xu

Moreover, to further boost the performance, we propose ``distributional consistency" as a more informative regularization to enable similar instances to have a similar probability distribution.

Classification Image Classification

Paper
Code

ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process

1 code implementation • 8 Jun 2023 • Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu

In each denoising step, our method first decodes pixels from previous VQ tokens, then generates new VQ tokens from the decoded pixels.

Denoising Representation Learning

Paper
Code

Point Cloud Completion by Learning Shape Priors

1 code implementation • 2 Aug 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee

Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses.

Generative Adversarial Network Point Cloud Completion

Paper
Code

Adaptive Momentum Coefficient for Neural Network Optimization

1 code implementation • 4 Jun 2020 • Zana Rashidi, Kasra Ahmadi K. A., Aijun An, Xiaogang Wang

We propose a novel and efficient momentum-based first-order algorithm for optimizing neural networks which uses an adaptive coefficient for the momentum term.

Paper
Code

Deep Continuous Conditional Random Fields with Asymmetric Inter-object Constraints for Online Multi-object Tracking

no code implementations • 4 Jun 2018 • Hui Zhou, Wanli Ouyang, Jian Cheng, Xiaogang Wang, Hongsheng Li

In addition, inter-object relations are mostly modeled in a symmetric way, which we argue is not an optimal setting.

Autonomous Driving Multi-Object Tracking +3

Paper
Add Code

Lehmer Transform and its Theoretical Properties

no code implementations • 13 May 2018 • Masoud Ataei, Shengyuan Chen, Xiaogang Wang

We propose a new class of transforms that we call {\it Lehmer Transform} which is motivated by the {\it Lehmer mean function}.

EEG

Paper
Add Code

PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing

no code implementations • CVPR 2018 • Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe

Depth estimation and scene parsing are two particularly important tasks in visual scene understanding.

Ranked #15 on Depth Estimation on NYU-Depth V2

Depth Estimation Multi-Task Learning +2

Paper
Add Code

Learnable Histogram: Statistical Context Features for Deep Neural Networks

no code implementations • 25 Apr 2018 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.

General Classification object-detection +2

Paper
Add Code

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations • CVPR 2018 • Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)

Monocular 3D Human Pose Estimation valid

Paper
Add Code

Exploring Disentangled Feature Representation Beyond Face Identification

no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.

Attribute Face Generation +1

Paper
Add Code

Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification

no code implementations • CVPR 2018 • Shuang Li, Slawomir Bak, Peter Carr, Xiaogang Wang

As a result, the network learns latent representations of the face, torso and other body parts using the best available image patches from the entire video sequence.

Video-Based Person Re-Identification

Paper
Add Code

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang

The aim of image captioning is to generate captions by machine to describe image contents.

Image Captioning Retrieval

Paper
Add Code

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations • NeurIPS 2017 • Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

Contour Detection

Paper
Add Code

Spontaneous Symmetry Breaking in Neural Networks

no code implementations • 17 Oct 2017 • Ricky Fok, Aijun An, Xiaogang Wang

We propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory.

Paper
Add Code

Visual Question Generation as Dual Task of Visual Question Answering

no code implementations • CVPR 2018 • Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang

Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately.

Question Answering Question Generation +2

Paper
Add Code

Optimization assisted MCMC

no code implementations • 9 Sep 2017 • Ricky Fok, Aijun An, Xiaogang Wang

The global optimization method first reduces a high dimensional search to an one dimensional geodesic to find a starting point close to a local mode.

Paper
Add Code

Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism

no code implementations • ICCV 2017 • Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, Nenghai Yu

The visibility map of the target is learned and used for inferring the spatial attention map.

Computational Efficiency Multi-Object Tracking +2

Paper
Add Code

Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals

no code implementations • ICCV 2017 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang

Vehicle re-identification is an important problem and has many applications in video surveillance and intelligent transportation.

Person Re-Identification Vehicle Re-Identification

Paper
Add Code

Identity-Aware Textual-Visual Matching with Latent Co-attention

no code implementations • ICCV 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, Xiaogang Wang

The stage-2 CNN-LSTM network refines the matching results with a latent co-attention mechanism.

Sentence Text based Person Retrieval

Paper
Add Code

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection

no code implementations • 14 Jun 2017 • Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang, Hongsheng Li, Xiaogang Wang

We propose a convolution neural network based algorithm for simultaneously diagnosing diabetic retinopathy and highlighting suspicious regions.

Clustering Diabetic Retinopathy Detection

Paper
Add Code

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

no code implementations • 8 Jun 2017 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.

Scene Labeling

Paper
Add Code

ViP-CNN: Visual Phrase Guided Convolutional Neural Network

no code implementations • CVPR 2017 • Yikang Li, Wanli Ouyang, Xiaogang Wang, Xiao'ou Tang

In this paper, each visual relationship is considered as a phrase with three components.

Descriptive Image Captioning +4

Paper
Add Code

Progressively Diffused Networks for Semantic Image Segmentation

no code implementations • 20 Feb 2017 • Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin

This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.

Image Segmentation Segmentation +1

Paper
Add Code

Automatic Discoveries of Physical and Semantic Concepts via Association Priors of Neuron Groups

no code implementations • 30 Dec 2016 • Shuai Li, Kui Jia, Xiaogang Wang

The recent successful deep neural networks are largely trained in a supervised manner.

Paper
Add Code

CRF-CNN: Modeling Structured Information in Human Pose Estimation

no code implementations • NeurIPS 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In a classical neural network, there is no message passing between neurons in the same layer.

General Classification Pedestrian Detection +1

Paper
Add Code

LCrowdV: Generating Labeled Videos for Simulation-based Crowd Behavior Learning

no code implementations • 29 Jun 2016 • Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha

We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV).

Paper
Add Code

Factors in Finetuning Deep Model for object detection

no code implementations • 20 Jan 2016 • Wanli Ouyang, Xiaogang Wang, Cong Zhang, Xiaokang Yang

Our analysis and empirical results show that classes with more samples have higher impact on the feature learning.

Multi-Label Classification

Paper
Add Code

Multi-Bias Non-linear Activation in Deep Neural Networks

no code implementations • 3 Apr 2016 • Hongyang Li, Wanli Ouyang, Xiaogang Wang

It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers.

Paper
Add Code

Exemplar-AMMs: Recognizing Crowd Movements from Pedestrian Trajectories

no code implementations • 31 Mar 2016 • Wenxi Liu, Rynson W. H. Lau, Xiaogang Wang, Dinesh Manocha

Specifically, we propose an optimization framework that filters out the unknown noise in the crowd trajectories and measures their similarity to the exemplar-AMMs to produce a crowd motion feature.

Paper
Add Code

Structured Feature Learning for Pose Estimation

no code implementations • CVPR 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a structured feature learning framework to reason the correlations among body joints at the feature level in human pose estimation.

Object object-detection +2

Paper
Add Code

Window-Object Relationship Guided Representation Learning for Generic Object Detections

no code implementations • 9 Dec 2015 • Xingyu Zeng, Wanli Ouyang, Xiaogang Wang

We propose a representation learning pipeline to use the relationship as supervision for improving the learned representation in object detection.

Paper
Add Code

Sparsifying Neural Network Connections for Face Recognition

no code implementations • CVPR 2016 • Yi Sun, Xiaogang Wang, Xiaoou Tang

This paper proposes to learn high-performance deep ConvNets with sparse neural connections, referred to as sparse ConvNets, for face recognition.

Paper
Add Code

DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection

no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang

In this paper, we propose deformable deep convolutional neural networks for generic object detection.

Classification General Classification +5

Paper
Add Code

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

no code implementations • 15 Dec 2014 • Hongsheng Li, Rui Zhao, Xiaogang Wang

The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.

Paper
Add Code

Person Re-identification by Saliency Learning

no code implementations • 5 Dec 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

(3) saliency matching is proposed based on patch matching.

Patch Matching Person Re-Identification

Paper
Add Code

Pedestrian Detection aided by Deep Learning Semantic Tasks

no code implementations • CVPR 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang

Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.

Ranked #30 on Pedestrian Detection on Caltech

Pedestrian Detection Scene Segmentation

Paper
Add Code

Fully Convolutional Neural Networks for Crowd Segmentation

no code implementations • 17 Nov 2014 • Kai Kang, Xiaogang Wang

Based on FCNN, a multi-stage deep learning is proposed to integrate appearance and motion cues for crowd segmentation.

Image Segmentation Segmentation +2

Paper
Add Code

DeepID-Net: multi-stage and deformable deep convolutional neural networks for object detection

no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang

In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.

Paper
Add Code

Deep Learning Multi-View Representation for Face Recognition

no code implementations • 26 Jun 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.

Face Reconstruction Face Verification

Paper
Add Code

Recover Canonical-View Faces in the Wild with Deep Neural Networks

no code implementations • 14 Apr 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.

Paper
Add Code

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy

We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.

Object

Paper
Add Code

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Paper
Add Code

Person Re-identification with Deep Similarity-Guided Graph Neural Network

no code implementations • ECCV 2018 • Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang

However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.

Ranked #2 on Person Re-Identification on CUHK03

Person Re-Identification Relation

Paper
Add Code

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

no code implementations • ECCV 2018 • Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian yuan, Xiaogang Wang

Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities.

Ranked #22 on Text based Person Retrieval on CUHK-PEDES

Person Re-Identification Text based Person Retrieval

Paper
Add Code

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Ranked #14 on Visual Question Answering (VQA) on CLEVR

Question Answering Visual Question Answering

Paper
Add Code

HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion

no code implementations • 27 Aug 2018 • Zixuan Huang, Junming Fan, Shenggan Cheng, Shuai Yi, Xiaogang Wang, Hongsheng Li

Dense depth cues are important and have wide applications in various computer vision tasks.

Ranked #10 on Depth Completion on KITTI Depth Completion

Autonomous Driving Depth Completion

Paper
Add Code

Deep Learning for Generic Object Detection: A Survey

no code implementations • 6 Sep 2018 • Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images.

Question Answering Visual Question Answering

Paper
Add Code

Learning to Group and Label Fine-Grained Shape Components

no code implementations • 13 Sep 2018 • Xiaogang Wang, Bin Zhou, Haiyue Fang, Xiaowu Chen, Qinping Zhao, Kai Xu

We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components.

Segmentation

Paper
Add Code

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Paper
Add Code

Multi-View Perceptron: a Deep Model for Learning Face Identity and View Representations

no code implementations • NeurIPS 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang

Video-Based Person Re-Identification

Paper
Add Code

Spatial Latent Dirichlet Allocation

no code implementations • NeurIPS 2007 • Xiaogang Wang, Eric Grimson

In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems.

Language Modelling

Paper
Add Code

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Paper
Add Code

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding

no code implementations • CVPR 2018 • Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang

The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.

Ranked #4 on Person Re-Identification on PRID2011

Paper
Add Code

Eliminating Background-Bias for Robust Person Re-Identification

no code implementations • CVPR 2018 • Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang

State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances.

Human Parsing Person Re-Identification

Paper
Add Code

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification

no code implementations • CVPR 2018 • Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang

Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.

Paper
Add Code

Spontaneous Symmetry Breaking in Deep Neural Networks

no code implementations • ICLR 2018 • Ricky Fok, Aijun An, Xiaogang Wang

In the layer decoupling limit applicable to residual networks (He et al., 2015), we show that the remnant symmetries that survive the non-linear layers are spontaneously broken based on empirical results.

Paper
Add Code

Decoupling the Layers in Residual Networks

no code implementations • ICLR 2018 • Ricky Fok, Aijun An, Zana Rashidi, Xiaogang Wang

We propose a Warped Residual Network (WarpNet) using a parallelizable warp operator for forward and backward propagation to distant layers that trains faster than the original residual neural network.

Paper
Add Code

Measuring Crowd Collectiveness

no code implementations • CVPR 2013 • Bolei Zhou, Xiaoou Tang, Xiaogang Wang

Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields.

Paper
Add Code

Modeling Mutual Visibility Relationship in Pedestrian Detection

no code implementations • CVPR 2013 • Wanli Ouyang, Xingyu Zeng, Xiaogang Wang

In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians.

Paper
Add Code

Single-Pedestrian Detection Aided by Multi-pedestrian Detection

no code implementations • CVPR 2013 • Wanli Ouyang, Xiaogang Wang

A probabilistic framework is proposed to model the relationship between the configurations estimated by singleand multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection.

Patch Matching Person Re-Identification

Paper
Add Code

Unsupervised Salience Learning for Person Re-identification

no code implementations • CVPR 2013 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.

Paper
Add Code

Locally Aligned Feature Transforms across Views

no code implementations • CVPR 2013 • Wei Li, Xiaogang Wang

In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification.

Clustering Metric Learning +1

Paper
Add Code

Deep Convolutional Network Cascade for Facial Point Detection

no code implementations • CVPR 2013 • Yi Sun, Xiaogang Wang, Xiaoou Tang

At each level, the outputs of multiple networks are fused for robust and accurate estimation.

Paper
Add Code

Learning Mid-level Filters for Person Re-identification

no code implementations • CVPR 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.

Clustering Patch Matching +1

Paper
Add Code

DeepReID: Deep Filter Pairing Neural Network for Person Re-Identification

no code implementations • CVPR 2014 • Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang

In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter.

Paper
Add Code

Switchable Deep Network for Pedestrian Detection

no code implementations • CVPR 2014 • Ping Luo, Yonglong Tian, Xiaogang Wang, Xiaoou Tang

In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection.

Face Identification Face Verification

Paper
Add Code

Deep Learning Face Representation from Predicting 10,000 Classes

no code implementations • Conference 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang

Paper
Add Code

L0 Regularized Stationary Time Estimation for Crowd Group Analysis

no code implementations • CVPR 2014 • Shuai Yi, Xiaogang Wang, Cewu Lu, Jiaya Jia

We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance.

Paper
Add Code

Scene-Independent Group Profiling in Crowd

no code implementations • CVPR 2014 • Jing Shao, Chen Change Loy, Xiaogang Wang

Groups are the primary entities that make up a crowd.

Scene Understanding

Paper
Add Code

Multi-source Deep Learning for Human Pose Estimation

no code implementations • CVPR 2014 • Wanli Ouyang, Xiao Chu, Xiaogang Wang

Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation.

Human Detection Pose Estimation

Paper
Add Code

Cross-Scene Crowd Counting via Deep Convolutional Neural Networks

no code implementations • CVPR 2015 • Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang

To address this problem, we propose a deep convolutional neural network (CNN) for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count.

Ranked #15 on Crowd Counting on WorldExpo’10

Crowd Counting

Paper
Add Code

Saliency Detection by Multi-Context Deep Learning

no code implementations • CVPR 2015 • Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.

Image Classification object-detection +3

Paper
Add Code

Learning From Massive Noisy Labeled Data for Image Classification

no code implementations • CVPR 2015 • Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang

To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels.

Classification General Classification +1

Paper
Add Code

Understanding Pedestrian Behaviors From Stationary Crowd Groups

no code implementations • CVPR 2015 • Shuai Yi, Hongsheng Li, Xiaogang Wang

Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance.

Event Detection Scene Understanding

Paper
Add Code

Deeply Learned Attributes for Crowded Scene Understanding

no code implementations • CVPR 2015 • Jing Shao, Kai Kang, Chen Change Loy, Xiaogang Wang

We further measure user study performance on WWW and compare this with the proposed deep models.

Attribute Multi-Task Learning +1

Paper
Add Code

Factors in Finetuning Deep Model for Object Detection With Long-Tail Distribution

no code implementations • CVPR 2016 • Wanli Ouyang, Xiaogang Wang, Cong Zhang, Xiaokang Yang

Our analysis and empirical results show that classes with more samples have higher impact on the feature learning.

Paper
Add Code

DeepFashion: Powering Robust Clothes Recognition and Retrieval With Rich Annotations

no code implementations • CVPR 2016 • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks.

Retrieval

Paper
Add Code

STCT: Sequentially Training Convolutional Networks for Visual Tracking

no code implementations • CVPR 2016 • Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu

To further improve the robustness of each base learner, we propose to train the convolutional layers with random binary masks, which serves as a regularization to enforce each base learner to focus on different input features.

Visual Tracking

Paper
Add Code

End-To-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

no code implementations • CVPR 2016 • Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts.