Search Results for author: Xiaogang Wang

Found 226 papers, 97 papers with code

Efficient Burst Raw Denoising with Variance Stabilization and Multi-frequency Denoising Network

no code implementations10 May 2022 Dasong Li, Yi Zhang, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

As for each sub-network, we propose an efficient multi-frequency denoising network to remove noise of different frequencies.

Denoising Frame

Learning a Structured Latent Space for Unsupervised Point Cloud Completion

no code implementations29 Mar 2022 Yingjie Cai, Kwan-Yee Lin, Chao Zhang, Qiang Wang, Xiaogang Wang, Hongsheng Li

Specifically, we map a series of related partial point clouds into multiple complete shape and occlusion code pairs and fuse the codes to obtain their representations in the unified latent space.

Point Cloud Completion

Point2Seq: Detecting 3D Objects as Sequences

1 code implementation25 Mar 2022 Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang

We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.

3D Object Detection

Relational Self-Supervised Learning

no code implementations16 Mar 2022 Mingkai Zheng, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations.

Contrastive Learning Self-Supervised Learning

Robust Self-Supervised LiDAR Odometry via Representative Structure Discovery and 3D Inherent Error Modeling

1 code implementation27 Feb 2022 Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li

The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans.

Motion Estimation

Learning Semantic Abstraction of Shape via 3D Region of Interest

no code implementations13 Jan 2022 Haiyue Fang, Xiaogang Wang, Zheyuan Cai, Yahao Shi, Xun Sun, Shilin Wu, Bin Zhou

This is in contrast to current methods, which focus solely on either 3D shape abstraction or semantic analysis.

Dynamic Token Normalization Improves Vision Transformer

1 code implementation5 Dec 2021 Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo

It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.

Object Detection

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

no code implementations2 Dec 2021 Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

IDR: Self-Supervised Image Denoising via Iterative Data Refinement

1 code implementation29 Nov 2021 Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li

To evaluate raw image denoising performance in real-world applications, we build a high-quality raw image dataset SenseNoise-500 that contains 500 real-life scenes.

Image Denoising

GreedyNASv2: Greedier Search with a Greedy Path Filter

no code implementations24 Nov 2021 Tao Huang, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently.

INTERN: A New Learning Paradigm Towards General Vision

no code implementations16 Nov 2021 Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao

Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.

Rethinking Noise Synthesis and Modeling in Raw Denoising

1 code implementation ICCV 2021 Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li

However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors.

Image Denoising

Dynamic Token Normalization improves Vision Transformers

no code implementations ICLR 2022 Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo

It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

1 code implementation ICCV 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.

Video Inpainting

Voxel-based Network for Shape Completion by Leveraging Edge Generation

1 code implementation ICCV 2021 Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee

Deep learning technique has yielded significant improvements in point cloud completion with the aim of completing missing object shapes from partial inputs.

Point Cloud Completion

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector

no code implementations ICCV 2021 Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li

Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10. 44%, 5. 69%, 5. 97% mAP respectively on the official KITTI benchmark.

Stereo Matching

ViTAS: Vision Transformer Architecture Search

1 code implementation25 Jun 2021 Xiu Su, Shan You, Jiyang Xie, Mingkai Zheng, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu

Vision transformers (ViTs) inherited the success of NLP but their structures have not been sufficiently investigated and optimized for visual tasks.

Neural Architecture Search

Scalable Transformers for Neural Machine Translation

no code implementations4 Jun 2021 Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation Translation

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation

1 code implementation CVPR 2021 Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu

While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.

Talking Face Generation

Decoupled Spatial-Temporal Transformer for Video Inpainting

1 code implementation14 Apr 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

Video Inpainting

Visually Informed Binaural Audio Generation without Binaural Audios

no code implementations CVPR 2021 Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin

Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.

Audio Generation

Semantic Scene Completion via Integrating Instances and Scene in-the-Loop

1 code implementation CVPR 2021 Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li

The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.

Scene Understanding

Fixing the Teacher-Student Knowledge Discrepancy in Distillation

no code implementations31 Mar 2021 Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang

To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation.

Knowledge Distillation Object Detection

Learning Fine-Grained Segmentation of 3D Shapes without Part Labels

no code implementations CVPR 2021 Xiaogang Wang, Xun Sun, Xinyu Cao, Kai Xu, Bin Zhou

Learning-based 3D shape segmentation is usually formulated as a semantic labeling problem, assuming that all parts of training shapes are annotated with a given set of tags.

Deep Clustering

DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network

1 code implementation CVPR 2021 Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li

Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.

Contrastive Learning Image Generation

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations19 Jan 2021 Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

Object Detection

Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction

no code implementations8 Jan 2021 Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.

BSDS500 Graph Attention +2

Learning With Privileged Tasks

no code implementations ICCV 2021 Yuru Song, Zan Lou, Shan You, Erkun Yang, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang

Concretely, we introduce a privileged parameter so that the optimization direction does not necessarily follow the gradient from the privileged tasks, but concentrates more on the target tasks.

Multi-Task Learning

A Holistically-Guided Decoder for Deep Representation Learning with Applications to Semantic Segmentation and Object Detection

no code implementations18 Dec 2020 Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li

With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation.

Instance Segmentation Object Detection +2

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation18 Nov 2020 Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Object Detection

Cascaded Refinement Network for Point Cloud Completion with Self-supervision

1 code implementation17 Oct 2020 Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee

This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications.

3D Object Classification Point Cloud Completion

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

1 code implementation ICLR 2021 Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai

In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.

Semantic Segmentation

Deformable DETR: Deformable Transformers for End-to-End Object Detection

12 code implementations ICLR 2021 Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance.

Object Detection

Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions

1 code implementation ECCV 2020 Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li

We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.

Image Manipulation

Point Cloud Completion by Learning Shape Priors

1 code implementation2 Aug 2020 Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee

Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses.

Point Cloud Completion

Gradient Regularized Contrastive Learning for Continual Domain Adaptation

no code implementations25 Jul 2020 Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang

At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.

Contrastive Learning Domain Adaptation

3D Human Mesh Regression with Dense Correspondence

1 code implementation CVPR 2020 Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang

This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).

Adaptive Momentum Coefficient for Neural Network Optimization

1 code implementation4 Jun 2020 Zana Rashidi, Kasra Ahmadi K. A., Aijun An, Xiaogang Wang

We propose a novel and efficient momentum-based first-order algorithm for optimizing neural networks which uses an adaptive coefficient for the momentum term.

Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

1 code implementation CVPR 2020 Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang

Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.

3D FACE MODELING Data Augmentation +1

1st Place Solutions for OpenImage2019 -- Object Detection and Instance Segmentation

2 code implementations17 Mar 2020 Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang

Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.

General Classification Instance Segmentation +2

KPNet: Towards Minimal Face Detector

no code implementations17 Mar 2020 Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan

The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors.

Face Detection

Revisiting the Sibling Head in Object Detector

2 code implementations CVPR 2020 Guanglu Song, Yu Liu, Xiaogang Wang

The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.

Disentanglement General Classification +1

Adapting Object Detectors with Conditional Domain Normalization

no code implementations ECCV 2020 Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang

Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute.

3D Object Detection Unsupervised Domain Adaptation

Channel Equilibrium Networks for Learning Deep Representation

1 code implementation ICML 2020 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.

Single Image Dehazing Using Ranking Convolutional Neural Network

no code implementations15 Jan 2020 Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen

To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN).

Image Dehazing Single Image Dehazing

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection

4 code implementations CVPR 2020 Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.

3D Object Detection

Search to Distill: Pearls are Everywhere but not the Eyes

no code implementations CVPR 2020 Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture.

Ensemble Learning Face Recognition +3

Vision-Infused Deep Audio Inpainting

no code implementations ICCV 2019 Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang

Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.

Audio inpainting Image Inpainting

Channel Equilibrium Networks

no code implementations25 Sep 2019 Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo

However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.

Differentiable Learning-to-Group Channels via Groupable Convolutional Neural Networks

no code implementations ICCV 2019 Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo

ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.

Once a MAN: Towards Multi-Target Attack via Learning Multi-Target Adversarial Network Once

no code implementations ICCV 2019 Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang

Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.

Classification General Classification

Multi-modality Latent Interaction Network for Visual Question Answering

no code implementations ICCV 2019 Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li

The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.

Language Modelling Question Answering +2

Deep Self-Learning From Noisy Labels

no code implementations ICCV 2019 Jiangfan Han, Ping Luo, Xiaogang Wang

Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.

Learning with noisy labels Self-Learning

From Points to Parts: 3D Object Detection from Point Cloud with Part-aware and Part-aggregation Network

3 code implementations8 Jul 2019 Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li

3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.

3D Object Detection Scene Understanding

PasteGAN: A Semi-Parametric Method to Generate Image from Scene Graph

1 code implementation NeurIPS 2019 Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang

Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops.

Image Generation

Disentangling Pose from Appearance in Monochrome Hand Images

no code implementations16 Apr 2019 Yikang Li, Chris Twigg, Yuting Ye, Lingling Tao, Xiaogang Wang

Hand pose estimation from the monocular 2D image is challenging due to the variation in lighting, appearance, and background.

Disentanglement Hand Pose Estimation

Conditional Adversarial Generative Flow for Controllable Image Synthesis

no code implementations CVPR 2019 Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li

Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.

Image Generation

Context and Attribute Grounded Dense Captioning

no code implementations CVPR 2019 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao

Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.

Feature Intertwiner for Object Detection

2 code implementations ICLR 2019 Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang

We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.

Object Detection

Weakly-Supervised Discovery of Geometry-Aware Representation for 3D Human Pose Estimation

no code implementations CVPR 2019 Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Xiaogang Wang, Liang Lin

Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.

3D Human Pose Estimation

Video Generation from Single Semantic Label Map

2 code implementations CVPR 2019 Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang

This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.

Frame Image Generation +2

Group-wise Correlation Stereo Network

1 code implementation CVPR 2019 Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li

Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.

Autonomous Driving Stereo Matching +1

Shape2Motion: Joint Analysis of Motion Parts and Attributes from 3D Shapes

1 code implementation CVPR 2019 Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu

For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.

SSN: Learning Sparse Switchable Normalization via SparsestMax

1 code implementation CVPR 2019 Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo

Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.

Unsupervised Cross-spectral Stereo Matching by Learning to Synthesize

1 code implementation4 Mar 2019 Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song

Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.

Image-to-Image Translation Stereo Matching +2

Unsupervised Bi-directional Flow-based Video Generation from one Snapshot

no code implementations3 Mar 2019 Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy

Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.

Video Generation

Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing

no code implementations CVPR 2019 Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li

Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.

Referring Expression

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

no code implementations13 Dec 2018 Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li

It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.

Question Answering Visual Question Answering +1

Gradient Harmonized Single-stage Detector

9 code implementations13 Nov 2018 Buyu Li, Yu Liu, Xiaogang Wang

Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i. e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples.

General Classification Object Detection

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations NeurIPS 2018 Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Person Re-Identification

Learning to Group and Label Fine-Grained Shape Components

no code implementations13 Sep 2018 Xiaogang Wang, Bin Zhou, Haiyue Fang, Xiaowu Chen, Qinping Zhao, Kai Xu

We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components.

Deep Learning for Generic Object Detection: A Survey

no code implementations6 Sep 2018 Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen

Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images.

Object Proposal Generation

Learning Monocular Depth by Distilling Cross-domain Stereo Networks

1 code implementation ECCV 2018 Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang

Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.

Autonomous Driving Monocular Depth Estimation +3

Neural Network Encapsulation

2 code implementations ECCV 2018 Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang

Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.

Question-Guided Hybrid Convolution for Visual Question Answering

no code implementations ECCV 2018 Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang

Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.

Question Answering Visual Question Answering +1

Deep Group-shuffling Random Walk for Person Re-identification

1 code implementation CVPR 2018 Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.

Person Re-Identification

Person Re-identification with Deep Similarity-Guided Graph Neural Network

no code implementations ECCV 2018 Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang

However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.

Person Re-Identification

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

1 code implementation20 Jul 2018 Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang

Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.

Lip Reading Talking Face Generation +1

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification

no code implementations16 Jul 2018 Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang

To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).

Video-Based Person Re-Identification

Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition

no code implementations ECCV 2018 Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy

We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.

FaceID-GAN: Learning a Symmetry Three-Player GAN for Identity-Preserving Face Synthesis

no code implementations CVPR 2018 Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang

Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.

Face Generation

Group Consistent Similarity Learning via Deep CRF for Person Re-Identification

no code implementations CVPR 2018 Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang

Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.

Person Re-Identification

Video Person Re-Identification With Competitive Snippet-Similarity Aggregation and Co-Attentive Snippet Embedding

no code implementations CVPR 2018 Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang

The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.

Video-Based Person Re-Identification

Lehmer Transform and its Theoretical Properties

no code implementations13 May 2018 Masoud Ataei, Shengyuan Chen, Xiaogang Wang

We propose a new class of transforms that we call {\it Lehmer Transform} which is motivated by the {\it Lehmer mean function}.


Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration

3 code implementations CVPR 2018 Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang

Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.

Image Generation Image Reconstruction +1

Learnable Histogram: Statistical Context Features for Deep Neural Networks

no code implementations25 Apr 2018 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.

General Classification Object Detection +1

Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification

no code implementations CVPR 2018 Shuang Li, Slawomir Bak, Peter Carr, Xiaogang Wang

As a result, the network learns latent representations of the face, torso and other body parts using the best available image patches from the entire video sequence.

Frame Video-Based Person Re-Identification

3D Human Pose Estimation in the Wild by Adversarial Learning

no code implementations CVPR 2018 Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang

Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.

Monocular 3D Human Pose Estimation

Context Encoding for Semantic Segmentation

12 code implementations CVPR 2018 Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal

In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps.

Ranked #15 on Semantic Segmentation on PASCAL VOC 2012 test (using extra training data)

Image Classification Semantic Segmentation +1

Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction

no code implementations NeurIPS 2017 Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.

BSDS500 Contour Detection

Decoupling the Layers in Residual Networks

no code implementations ICLR 2018 Ricky Fok, Aijun An, Zana Rashidi, Xiaogang Wang

We propose a Warped Residual Network (WarpNet) using a parallelizable warp operator for forward and backward propagation to distant layers that trains faster than the original residual neural network.

Spontaneous Symmetry Breaking in Deep Neural Networks

no code implementations ICLR 2018 Ricky Fok, Aijun An, Xiaogang Wang

In the layer decoupling limit applicable to residual networks (He et al., 2015), we show that the remnant symmetries that survive the non-linear layers are spontaneously broken based on empirical results.

Spatial As Deep: Spatial CNN for Traffic Scene Understanding

7 code implementations17 Dec 2017 Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang

Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.

Lane Detection Scene Understanding

StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks

16 code implementations19 Oct 2017 Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas

In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.

Text-to-Image Generation

Spontaneous Symmetry Breaking in Neural Networks

no code implementations17 Oct 2017 Ricky Fok, Aijun An, Xiaogang Wang

We propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory.

Deep Dual Learning for Semantic Image Segmentation

no code implementations ICCV 2017 Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang

The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.

Semantic Segmentation

Visual Question Generation as Dual Task of Visual Question Answering

no code implementations CVPR 2018 Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang

Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately.

Question Answering Question Generation +2

Optimization assisted MCMC

no code implementations9 Sep 2017 Ricky Fok, Aijun An, Xiaogang Wang

The global optimization method first reduces a high dimensional search to an one dimensional geodesic to find a starting point close to a local mode.

Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks

2 code implementations7 Aug 2017 Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang

This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.

Scene Graph Generation from Objects, Phrases and Region Captions

1 code implementation ICCV 2017 Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang

Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.

Graph Generation Object Detection +2

Recurrent Scale Approximation for Object Detection in CNN

1 code implementation ICCV 2017 Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang

To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid.

Face Detection Object Detection

Learning Object Interactions and Descriptions for Semantic Image Segmentation

no code implementations CVPR 2017 Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang

This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.

Image Captioning Semantic Segmentation

Zoom-in-Net: Deep Mining Lesions for Diabetic Retinopathy Detection

no code implementations14 Jun 2017 Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang, Hongsheng Li, Xiaogang Wang

We propose a convolution neural network based algorithm for simultaneously diagnosing diabetic retinopathy and highlighting suspicious regions.

Diabetic Retinopathy Detection

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

no code implementations8 Jun 2017 Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang

The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.

Scene Labeling

Residual Attention Network for Image Classification

14 code implementations CVPR 2017 Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang

In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion.

Classification General Classification +2

Learning Cross-Modal Deep Representations for Robust Pedestrian Detection

2 code implementations CVPR 2017 Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe

Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.

Pedestrian Detection

Multi-Context Attention for Human Pose Estimation

2 code implementations CVPR 2017 Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang

We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts.

Pose Estimation

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

1 code implementation23 Feb 2017 Wanli Ouyang, Ku Wang, Xin Zhu, Xiaogang Wang

In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages.

Object Detection Region Proposal

Learning Deep Features via Congenerous Cosine Loss for Person Recognition

1 code implementation22 Feb 2017 Yu Liu, Hongyang Li, Xiaogang Wang

Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance.

Person Recognition

Object Detection in Videos with Tubelet Proposal Networks

no code implementations CVPR 2017 Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang

Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.

Object Detection Object Tracking

Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification

2 code implementations CVPR 2017 Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang

Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.

Classification General Classification +2

Progressively Diffused Networks for Semantic Image Segmentation

no code implementations20 Feb 2017 Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin

This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.

Semantic Segmentation

Zoom Out-and-In Network with Recursive Training for Object Proposal

1 code implementation19 Feb 2017 Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang

In this paper, we propose a zoom-out-and-in network for generating object proposals.