no code implementations • NeurIPS 2007 • Xiaogang Wang, Eric Grimson
In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems.
2 code implementations • 25 Aug 2012 • Wei Zhang, Xiaogang Wang, Deli Zhao, Xiaoou Tang
We explore the different roles of two fundamental concepts in graph theory, indegree and outdegree, in the context of clustering.
Ranked #1 on Image Clustering on Coil-20 (Accuracy metric)
no code implementations • CVPR 2013 • Wei Li, Xiaogang Wang
In this paper, we propose a new approach for matching images observed in different camera views with complex cross-view transforms and apply it to person reidentification.
no code implementations • CVPR 2013 • Yi Sun, Xiaogang Wang, Xiaoou Tang
At each level, the outputs of multiple networks are fused for robust and accurate estimation.
no code implementations • CVPR 2013 • Wanli Ouyang, Xingyu Zeng, Xiaogang Wang
In this paper, we propose a mutual visibility deep model that jointly estimates the visibility statuses of overlapping pedestrians.
no code implementations • CVPR 2013 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
In this paper, we propose a novel perspective for person re-identification based on unsupervised salience learning.
no code implementations • CVPR 2013 • Bolei Zhou, Xiaoou Tang, Xiaogang Wang
Collective motions are common in crowd systems and have attracted a great deal of attention in a variety of multidisciplinary fields.
no code implementations • CVPR 2013 • Wanli Ouyang, Xiaogang Wang
A probabilistic framework is proposed to model the relationship between the configurations estimated by singleand multi-pedestrian detectors, and to refine the single-pedestrian detection result with multi-pedestrian detection.
4 code implementations • 1 Jan 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang
When learned as classifiers to recognize about 10, 000 face identities in the training set and configured to keep reducing the neuron numbers along the feature extraction hierarchy, these deep ConvNets gradually form compact identity-related features in the top layers with only a small number of hidden neurons.
Ranked #6 on Face Verification on Labeled Faces in the Wild
no code implementations • 14 Apr 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Face images in the wild undergo large intra-personal variations, such as poses, illuminations, occlusions, and low resolutions, which cause great challenges to face-related applications.
no code implementations • CVPR 2014 • Ping Luo, Yonglong Tian, Xiaogang Wang, Xiaoou Tang
In this paper, we propose a Switchable Deep Network (SDN) for pedestrian detection.
no code implementations • CVPR 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
In this paper, we propose a novel approach of learning mid-level filters from automatically discovered patch clusters for person re-identification.
no code implementations • CVPR 2014 • Shuai Yi, Xiaogang Wang, Cewu Lu, Jiaya Jia
We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance.
no code implementations • CVPR 2014 • Wanli Ouyang, Xiao Chu, Xiaogang Wang
Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation.
no code implementations • CVPR 2014 • Jing Shao, Chen Change Loy, Xiaogang Wang
Groups are the primary entities that make up a crowd.
no code implementations • Conference 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang
When learned as classifiers to recognize about 10, 000 face identities in the training set and configured to keep reducing the neuron numbers along the feature extraction hierarchy, these deep ConvNets gradually form compact identity-related features in the top layers with only a small number of hidden neurons.
no code implementations • CVPR 2014 • Wei Li, Rui Zhao, Tong Xiao, Xiaogang Wang
In this paper, we propose a novel filter pairing neural network (FPNN) to jointly handle misalignment, photometric and geometric transforms, occlusions and background clutter.
4 code implementations • NeurIPS 2014 • Yi Sun, Xiaogang Wang, Xiaoou Tang
The learned DeepID2 features can be well generalized to new identities unseen in the training data.
no code implementations • 26 Jun 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
no code implementations • 11 Sep 2014 • Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Yuanjun Xiong, Chen Qian, Zhenyao Zhu, Ruohui Wang, Chen-Change Loy, Xiaogang Wang, Xiaoou Tang
In the proposed new deep architecture, a new deformation constrained pooling (def-pooling) layer models the deformation of object parts with geometric constraint and penalty.
no code implementations • 17 Nov 2014 • Kai Kang, Xiaogang Wang
Based on FCNN, a multi-stage deep learning is proposed to integrate appearance and motion cues for crowd segmentation.
2 code implementations • ICCV 2015 • Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang
LNet is pre-trained by massive general object categories for face localization, while ANet is pre-trained by massive face identities for attribute prediction.
Ranked #6 on Facial Attribute Classification on LFWA
no code implementations • CVPR 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Rather than expensively annotating scene attributes, we transfer attributes information from existing scene segmentation datasets to the pedestrian dataset, by proposing a novel deep model to learn high-level features from multiple tasks and multiple data sources.
Ranked #30 on Pedestrian Detection on Caltech
no code implementations • NeurIPS 2014 • Zhenyao Zhu, Ping Luo, Xiaogang Wang, Xiaoou Tang
Intriguingly, even without accessing 3D data, human not only can recognize face identity, but can also imagine face images of a person under different viewpoints given a single 2D image, making face perception in the brain robust to view changes.
1 code implementation • CVPR 2015 • Yi Sun, Xiaogang Wang, Xiaoou Tang
(2) Its neurons in higher layers are highly selective to identities and identity-related attributes.
Ranked #1 on Face Verification on Oulu-CASIA
no code implementations • 5 Dec 2014 • Rui Zhao, Wanli Ouyang, Xiaogang Wang
(3) saliency matching is proposed based on patch matching.
no code implementations • 15 Dec 2014 • Hongsheng Li, Rui Zhao, Xiaogang Wang
The proposed algorithms eliminate all the redundant computation in convolution and pooling on images by introducing novel d-regularly sparse kernels.
no code implementations • CVPR 2015 • Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, Xiaoou Tang
In this paper, we propose deformable deep convolutional neural networks for generic object detection.
10 code implementations • 3 Feb 2015 • Yi Sun, Ding Liang, Xiaogang Wang, Xiaoou Tang
Very deep neural networks recently achieved great success on general object recognition because of their superb learning capacity.
no code implementations • CVPR 2015 • Jing Shao, Kai Kang, Chen Change Loy, Xiaogang Wang
We further measure user study performance on WWW and compare this with the proposed deep models.
no code implementations • CVPR 2015 • Shuai Yi, Hongsheng Li, Xiaogang Wang
Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance.
no code implementations • CVPR 2015 • Tong Xiao, Tian Xia, Yi Yang, Chang Huang, Xiaogang Wang
To demonstrate the effectiveness of our approach, we collect a large-scale real-world clothing classification dataset with both noisy and clean labels.
no code implementations • CVPR 2015 • Rui Zhao, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
Low-level saliency cues or priors do not produce good enough saliency detection results especially when the salient object presents in a low-contrast background with confusing visual appearance.
no code implementations • CVPR 2015 • Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang
To address this problem, we propose a deep convolutional neural network (CNN) for crowd counting, and it is trained alternatively with two related learning objectives, crowd density and crowd count.
Ranked #15 on Crowd Counting on WorldExpo’10
2 code implementations • 19 Nov 2015 • Cheng Tai, Tong Xiao, Yi Zhang, Xiaogang Wang, Weinan E
Recently, tensor decompositions have been used for speeding up CNNs.
no code implementations • ICCV 2015 • Dongqing Zou, Xiaowu Chen, Guangying Cao, Xiaogang Wang
We introduce a novel method of video matting via sparse and low-rank representation.
no code implementations • ICCV 2015 • Wanli Ouyang, Hongyang Li, Xingyu Zeng, Xiaogang Wang
Experimental results show that the attributes are helpful in learning better features and improving the object detection accuracy by 2. 6% in mAP on the ILSVRC 2014 object detection dataset and 2. 4% in mAP on PASCAL VOC 2007 object detection dataset.
no code implementations • ICCV 2015 • Yonglong Tian, Ping Luo, Xiaogang Wang, Xiaoou Tang
Third, each part detector in DeepParts is a strong detector that can detect pedestrian by observing only a part of a proposal.
no code implementations • ICCV 2015 • Xiao Chu, Wanli Ouyang, Wei Yang, Xiaogang Wang
In this paper, we propose to predict immediacy for interacting persons from still images.
no code implementations • ICCV 2015 • Shuai Yi, Hongsheng Li, Xiaogang Wang
In this paper, we target on the problem of estimating the statistic of pedestrian travel time within a period from an entrance to a destination in a crowded scene.
no code implementations • ICCV 2015 • Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu
Instead of treating convolutional neural network (CNN) as a black-box feature extractor, we conduct in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet.
no code implementations • CVPR 2016 • Yi Sun, Xiaogang Wang, Xiaoou Tang
This paper proposes to learn high-performance deep ConvNets with sparse neural connections, referred to as sparse ConvNets, for face recognition.
no code implementations • 9 Dec 2015 • Xingyu Zeng, Wanli Ouyang, Xiaogang Wang
We propose a representation learning pipeline to use the relationship as supervision for improving the learned representation in object detection.
no code implementations • 20 Jan 2016 • Wanli Ouyang, Xiaogang Wang, Cong Zhang, Xiaokang Yang
Our analysis and empirical results show that classes with more samples have higher impact on the feature learning.
no code implementations • CVPR 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
In this paper, we propose a structured feature learning framework to reason the correlations among body joints at the feature level in human pose estimation.
no code implementations • 31 Mar 2016 • Wenxi Liu, Rynson W. H. Lau, Xiaogang Wang, Dinesh Manocha
Specifically, we propose an optimization framework that filters out the unknown noise in the crowd trajectories and measures their similarity to the exemplar-AMMs to produce a crowd motion feature.
no code implementations • 3 Apr 2016 • Hongyang Li, Wanli Ouyang, Xiaogang Wang
It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers.
2 code implementations • CVPR 2017 • Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, Xiaogang Wang
Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates.
Ranked #9 on Person Re-Identification on CUHK03
1 code implementation • 9 Apr 2016 • Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, Wanli Ouyang
Temporal and contextual information of videos are not fully investigated and utilized.
1 code implementation • CVPR 2016 • Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
Deep Convolution Neural Networks (CNNs) have shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation.
1 code implementation • CVPR 2016 • Tong Xiao, Hongsheng Li, Wanli Ouyang, Xiaogang Wang
Learning generic and robust feature representations with data from multiple domains for the same problem is of great value, especially for the problems that have multiple datasets but none of them are large enough to provide abundant data variations.
no code implementations • CVPR 2016 • Lijun Wang, Wanli Ouyang, Xiaogang Wang, Huchuan Lu
To further improve the robustness of each base learner, we propose to train the convolutional layers with random binary masks, which serves as a regularization to enforce each base learner to focus on different input features.
no code implementations • CVPR 2016 • Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang
To demonstrate the advantages of DeepFashion, we propose a new deep model, namely FashionNet, which learns clothing features by jointly predicting clothing attributes and landmarks.
no code implementations • CVPR 2016 • Wei Yang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
In this paper, we propose a novel end-to-end framework for human pose estimation that combines DCNNs with the expressive deformable mixture of parts.
no code implementations • CVPR 2016 • Wanli Ouyang, Xiaogang Wang, Cong Zhang, Xiaokang Yang
Our analysis and empirical results show that classes with more samples have higher impact on the feature learning.
no code implementations • CVPR 2016 • Jing Shao, Chen-Change Loy, Kai Kang, Xiaogang Wang
Learning and capturing both appearance and dynamic representations are pivotal for crowd video understanding.
no code implementations • 29 Jun 2016 • Ernest Cheung, Tsan Kwong Wong, Aniket Bera, Xiaogang Wang, Dinesh Manocha
We present a novel procedural framework to generate an arbitrary number of labeled crowd videos (LCrowdV).
4 code implementations • 10 Aug 2016 • Ziwei Liu, Sijie Yan, Ping Luo, Xiaogang Wang, Xiaoou Tang
Fashion landmark is also compared to clothing bounding boxes and human joints in two applications, fashion attribute prediction and clothes retrieval, showing that fashion landmark is a more discriminative representation to understand fashion images.
1 code implementation • 8 Oct 2016 • Xingyu Zeng, Wanli Ouyang, Junjie Yan, Hongsheng Li, Tong Xiao, Kun Wang, Yu Liu, Yucong Zhou, Bin Yang, Zhe Wang, Hui Zhou, Xiaogang Wang
The effectiveness of GBD-Net is shown through experiments on three object detection datasets, ImageNet, Pascal VOC2007 and Microsoft COCO.
no code implementations • NeurIPS 2016 • Xiao Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
In a classical neural network, there is no message passing between neurons in the same layer.
67 code implementations • CVPR 2017 • Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia
Scene parsing is challenging for unrestricted open vocabulary and diverse scenes.
Ranked #4 on Video Semantic Segmentation on Cityscapes val
21 code implementations • ICCV 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
Synthesizing high-quality images from text descriptions is a challenging problem in computer vision and has many practical applications.
Ranked #3 on Text-to-Image Generation on Oxford 102 Flowers (Inception score metric)
no code implementations • 30 Dec 2016 • Shuai Li, Kui Jia, Xiaogang Wang
The recent successful deep neural networks are largely trained in a supervised manner.
1 code implementation • 19 Feb 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang
In this paper, we propose a zoom-out-and-in network for generating object proposals.
1 code implementation • CVPR 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, Dayu Yue, Xiaogang Wang
Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance.
2 code implementations • CVPR 2017 • Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, Xiaogang Wang
Analysis of the learned SRN model demonstrates that it can effectively capture both semantic and spatial relations of labels for improving classification performance.
Ranked #6 on Multi-Label Classification on NUS-WIDE
no code implementations • 20 Feb 2017 • Ruimao Zhang, Wei Yang, Zhanglin Peng, Xiaogang Wang, Liang Lin
This paper introduces Progressively Diffused Networks (PDNs) for unifying multi-scale context modeling with deep feature learning, by taking semantic image segmentation as an exemplar application.
1 code implementation • CVPR 2017 • Kai Kang, Hongsheng Li, Tong Xiao, Wanli Ouyang, Junjie Yan, Xihui Liu, Xiaogang Wang
Object detection in videos has drawn increasing attention recently with the introduction of the large-scale ImageNet VID dataset.
1 code implementation • 22 Feb 2017 • Yu Liu, Hongyang Li, Xiaogang Wang
Person recognition aims at recognizing the same identity across time and space with complicated scenes and similar appearance.
1 code implementation • 23 Feb 2017 • Wanli Ouyang, Ku Wang, Xin Zhu, Xiaogang Wang
In this CC-Net, the cascaded classifier at a stage is aided by the classification scores in previous stages.
no code implementations • CVPR 2017 • Yikang Li, Wanli Ouyang, Xiaogang Wang, Xiao'ou Tang
In this paper, each visual relationship is considered as a phrase with three components.
2 code implementations • CVPR 2017 • Xiao Chu, Wei Yang, Wanli Ouyang, Cheng Ma, Alan L. Yuille, Xiaogang Wang
We further combine the holistic attention model, which focuses on the global consistency of the full human body, and the body part attention model, which focuses on the detailed description for different body parts.
Ranked #8 on Pose Estimation on Leeds Sports Poses
2 code implementations • CVPR 2017 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
This paper addresses the problem of depth estimation from a single still image.
Ranked #14 on Depth Estimation on NYU-Depth V2
2 code implementations • CVPR 2017 • Dan Xu, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Then, the learned feature representations are transferred to a second deep network, which receives as input an RGB image and outputs the detection results.
21 code implementations • CVPR 2017 • Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, Xiaoou Tang
In this work, we propose "Residual Attention Network", a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion.
Ranked #638 on Image Classification on ImageNet
no code implementations • 8 Jun 2017 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang
The experiments show that our proposed method makes deep models learn more discriminative feature representations without increasing model size or complexity.
no code implementations • 14 Jun 2017 • Zhe Wang, Yanxin Yin, Jianping Shi, Wei Fang, Hongsheng Li, Xiaogang Wang
We propose a convolution neural network based algorithm for simultaneously diagnosing diabetic retinopathy and highlighting suspicious regions.
no code implementations • CVPR 2017 • Guangrun Wang, Ping Luo, Liang Lin, Xiaogang Wang
This work significantly increases segmentation accuracy of CNNs by learning from an Image Descriptions in the Wild (IDW) dataset.
1 code implementation • CVPR 2017 • Haiyu Zhao, Maoqing Tian, Shuyang Sun, Jing Shao, Junjie Yan, Shuai Yi, Xiaogang Wang, Xiaoou Tang
Person re-identification (ReID) is an important task in video surveillance and has various applications.
1 code implementation • ICCV 2017 • Yu Liu, Hongyang Li, Junjie Yan, Fangyin Wei, Xiaogang Wang, Xiaoou Tang
To further increase efficiency and accuracy, we (a): design a scale-forecast network to globally predict potential scales in the image since there is no need to compute maps on all levels of the pyramid.
Ranked #3 on Face Detection on Annotated Faces in the Wild
1 code implementation • ICCV 2017 • Yikang Li, Wanli Ouyang, Bolei Zhou, Kun Wang, Xiaogang Wang
Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information.
Ranked #2 on Object Detection on Visual Genome
4 code implementations • ICCV 2017 • Wei Yang, Shuang Li, Wanli Ouyang, Hongsheng Li, Xiaogang Wang
We investigate our method on two standard benchmarks for human pose estimation.
Ranked #6 on Pose Estimation on Leeds Sports Poses
no code implementations • ICCV 2017 • Shuang Li, Tong Xiao, Hongsheng Li, Wei Yang, Xiaogang Wang
The stage-2 CNN-LSTM network refines the matching results with a latent co-attention mechanism.
2 code implementations • 7 Aug 2017 • Sijie Yan, Ziwei Liu, Ping Luo, Shi Qiu, Xiaogang Wang, Xiaoou Tang
This work addresses unconstrained fashion landmark detection, where clothing bounding boxes are not provided in both training and test.
no code implementations • ICCV 2017 • Qi Chu, Wanli Ouyang, Hongsheng Li, Xiaogang Wang, Bin Liu, Nenghai Yu
The visibility map of the target is learned and used for inferring the spatial attention map.
no code implementations • ICCV 2017 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang
Vehicle re-identification is an important problem and has many applications in video surveillance and intelligent transportation.
no code implementations • 9 Sep 2017 • Ricky Fok, Aijun An, Xiaogang Wang
The global optimization method first reduces a high dimensional search to an one dimensional geodesic to find a starting point close to a local mode.
1 code implementation • 13 Sep 2017 • Hongyang Li, Yu Liu, Wanli Ouyang, Xiaogang Wang
A key observation is that it is difficult to classify anchors of different sizes with the same set of features.
Ranked #2 on Region Proposal on COCO test-dev
no code implementations • CVPR 2018 • Yikang Li, Nan Duan, Bolei Zhou, Xiao Chu, Wanli Ouyang, Xiaogang Wang
Recently visual question answering (VQA) and visual question generation (VQG) are two trending topics in the computer vision, which have been explored separately.
2 code implementations • ICCV 2017 • Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, Xiaogang Wang
Pedestrian analysis plays a vital role in intelligent video surveillance and is a key component for security-centric computer vision systems.
Ranked #2 on Pedestrian Attribute Recognition on RAP
no code implementations • ICCV 2017 • Wanli Ouyang, Kun Wang, Xin Zhu, Xiaogang Wang
In this CC-Net, there are many cascade stages.
no code implementations • ICCV 2017 • Zhongdao Wang, Luming Tang, Xihui Liu, Zhuliang Yao, Shuai Yi, Jing Shao, Junjie Yan, Shengjin Wang, Hongsheng Li, Xiaogang Wang
In our vehicle ReID framework, an orientation invariant feature embedding module and a spatial-temporal regularization module are proposed.
no code implementations • ICCV 2017 • Ping Luo, Guangrun Wang, Liang Lin, Xiaogang Wang
The estimated labelmaps that capture accurate object classes and boundaries are used as ground truths in training to boost performance.
1 code implementation • 2 Oct 2017 • Yu Liu, Hongyang Li, Xiaogang Wang
Feature matters.
no code implementations • 17 Oct 2017 • Ricky Fok, Aijun An, Xiaogang Wang
We propose a framework to understand the unprecedented performance and robustness of deep neural networks using field theory.
16 code implementations • 19 Oct 2017 • Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, Dimitris Metaxas
In this paper, we propose Stacked Generative Adversarial Networks (StackGAN) aiming at generating high-resolution photo-realistic images.
Ranked #5 on Text-to-Image Generation on Oxford 102 Flowers
1 code implementation • 18 Nov 2017 • Pan Lu, Hongsheng Li, Wei zhang, Jianyong Wang, Xiaogang Wang
Existing VQA methods mainly adopt the visual attention mechanism to associate the input question with corresponding image regions for effective question answering.
8 code implementations • 17 Dec 2017 • Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, Xiaoou Tang
Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored.
Ranked #50 on Lane Detection on CULane
no code implementations • ICLR 2018 • Ricky Fok, Aijun An, Xiaogang Wang
In the layer decoupling limit applicable to residual networks (He et al., 2015), we show that the remnant symmetries that survive the non-linear layers are spontaneously broken based on empirical results.
no code implementations • NeurIPS 2017 • Dan Xu, Wanli Ouyang, Xavier Alameda-Pineda, Elisa Ricci, Xiaogang Wang, Nicu Sebe
Recent works have shown that exploiting multi-scale representations deeply learned via convolutional neural networks (CNN) is of tremendous importance for accurate contour detection.
no code implementations • ICLR 2018 • Ricky Fok, Aijun An, Zana Rashidi, Xiaogang Wang
We propose a Warped Residual Network (WarpNet) using a parallelizable warp operator for forward and backward propagation to distant layers that trains faster than the original residual neural network.
1 code implementation • 1 Mar 2018 • Dan Xu, Elisa Ricci, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
Depth cues have been proved very useful in various computer vision and robotic tasks.
no code implementations • ECCV 2018 • Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang
The aim of image captioning is to generate captions by machine to describe image contents.
12 code implementations • CVPR 2018 • Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal
In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps.
Ranked #7 on Semantic Segmentation on PASCAL VOC 2012 test
no code implementations • CVPR 2018 • Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren, Hongsheng Li, Xiaogang Wang
Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild.
Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric)
no code implementations • CVPR 2018 • Shuang Li, Slawomir Bak, Peter Carr, Xiaogang Wang
As a result, the network learns latent representations of the face, torso and other body parts using the best available image patches from the entire video sequence.
no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.
no code implementations • 25 Apr 2018 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang
Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.
3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang
Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.
no code implementations • CVPR 2018 • Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
Depth estimation and scene parsing are two particularly important tasks in visual scene understanding.
Ranked #15 on Depth Estimation on NYU-Depth V2
no code implementations • 13 May 2018 • Masoud Ataei, Shengyuan Chen, Xiaogang Wang
We propose a new class of transforms that we call {\it Lehmer Transform} which is motivated by the {\it Lehmer mean function}.
no code implementations • CVPR 2018 • Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang
State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances.
no code implementations • CVPR 2018 • Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang
Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.
no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang
Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.
no code implementations • CVPR 2018 • Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang
The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.
Ranked #4 on Person Re-Identification on PRID2011
no code implementations • 4 Jun 2018 • Hui Zhou, Wanli Ouyang, Jian Cheng, Xiaogang Wang, Hongsheng Li
In addition, inter-object relations are mostly modeled in a symmetric way, which we argue is not an optimal setting.
1 code implementation • ECCV 2018 • Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, Xiaogang Wang
Generating scene graph to describe all the relations inside an image gains increasing interests these years.
Ranked #1 on Scene Graph Generation on VRD
no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy
We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.
no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang
To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).
1 code implementation • 20 Jul 2018 • Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
no code implementations • ECCV 2018 • Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang
However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.
Ranked #2 on Person Re-Identification on CUHK03
1 code implementation • CVPR 2018 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang
Person re-identification aims to robustly measure similarities between person images.
1 code implementation • CVPR 2018 • Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang
Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.
no code implementations • ECCV 2018 • Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian yuan, Xiaogang Wang
Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities.
Ranked #22 on Text based Person Retrieval on CUHK-PEDES
no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang
Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.
Ranked #14 on Visual Question Answering (VQA) on CLEVR
2 code implementations • ECCV 2018 • Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang
Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.
1 code implementation • ECCV 2018 • Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang
Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.
no code implementations • 27 Aug 2018 • Zixuan Huang, Junming Fan, Shenggan Cheng, Shuai Yi, Xiaogang Wang, Hongsheng Li
Dense depth cues are important and have wide applications in various computer vision tasks.
Ranked #10 on Depth Completion on KITTI Depth Completion
no code implementations • 6 Sep 2018 • Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen
Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images.
no code implementations • 13 Sep 2018 • Xiaogang Wang, Bin Zhou, Haiyue Fang, Xiaowu Chen, Qinping Zhao, Kai Xu
We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components.
2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.
Ranked #3 on Person Re-Identification on CUHK03
9 code implementations • 13 Nov 2018 • Buyu Li, Yu Liu, Xiaogang Wang
Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i. e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples.
Ranked #165 on Object Detection on COCO test-dev
13 code implementations • CVPR 2019 • Shaoshuai Shi, Xiaogang Wang, Hongsheng Li
In this paper, we propose PointRCNN for 3D object detection from raw point cloud.
Ranked #2 on Object Detection on KITTI Cars Moderate
no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.
5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo
A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.
no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.
no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy
Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.
1 code implementation • 4 Mar 2019 • Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song
Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.
1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
2 code implementations • CVPR 2019 • Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li
Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.
1 code implementation • CVPR 2019 • Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu
For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.
2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
no code implementations • CVPR 2019 • Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Xiaogang Wang, Liang Lin
Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.
no code implementations • CVPR 2019 • Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang
We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving.
Ranked #18 on Vehicle Pose Estimation on KITTI Cars Hard
2 code implementations • ICLR 2019 • Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang
We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.
Ranked #134 on Object Detection on COCO test-dev
no code implementations • CVPR 2019 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao
Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.
Ranked #3 on Dense Captioning on Visual Genome
no code implementations • CVPR 2019 • Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao
Synthesizing photo-realistic images from text descriptions is a challenging problem.
no code implementations • CVPR 2019 • Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li
Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.
no code implementations • 16 Apr 2019 • Yikang Li, Chris Twigg, Yuting Ye, Lingling Tao, Xiaogang Wang
Hand pose estimation from the monocular 2D image is challenging due to the variation in lighting, appearance, and background.
3 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.
Ranked #6 on Face Verification on MegaFace
1 code implementation • NeurIPS 2019 • Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang
Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops.
no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Cosine-based softmax losses significantly improve the performance of deep face recognition networks.
1 code implementation • CVPR 2019 • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang
Few-shot learning is an important area of research.
6 code implementations • 8 Jul 2019 • Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.
no code implementations • ICCV 2019 • Jiangfan Han, Ping Luo, Xiaogang Wang
Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.
no code implementations • ICCV 2019 • Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li
The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.
no code implementations • ICCV 2019 • Jiageng Mao, Xiaogang Wang, Hongsheng Li
Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs.
Ranked #27 on 3D Part Segmentation on ShapeNet-Part
no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.
no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo
ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.
1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Ranked #9 on Image Retrieval on Flickr30K 1K test
no code implementations • 25 Sep 2019 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.
1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
Semantic image synthesis aims at generating photorealistic images from semantic layouts.
no code implementations • ICCV 2019 • Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.
no code implementations • CVPR 2020 • Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang
Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture.
12 code implementations • CVPR 2020 • Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.
no code implementations • 15 Jan 2020 • Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen
To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN).
no code implementations • 5 Feb 2020 • Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, Xiaogang Wang
Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images.
1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.
4 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.
Ranked #4 on Unsupervised Domain Adaptation on Market to MSMT
no code implementations • ECCV 2020 • Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang
Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute.
Ranked #1 on Unsupervised Domain Adaptation on SIM10K to BDD100K
2 code implementations • CVPR 2020 • Guanglu Song, Yu Liu, Xiaogang Wang
The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.
Ranked #67 on Object Detection on COCO test-dev
2 code implementations • 17 Mar 2020 • Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.
no code implementations • 17 Mar 2020 • Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors.
1 code implementation • CVPR 2020 • Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang
Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.
1 code implementation • CVPR 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee
Point clouds are often sparse and incomplete.
no code implementations • CVPR 2020 • Rui Liu, Chengxi Yang, Wenxiu Sun, Xiaogang Wang, Hongsheng Li
Large-scale synthetic datasets are beneficial to stereo matching but usually introduce known domain bias.
1 code implementation • 4 Jun 2020 • Zana Rashidi, Kasra Ahmadi K. A., Aijun An, Xiaogang Wang
We propose a novel and efficient momentum-based first-order algorithm for optimizing neural networks which uses an adaptive coefficient for the momentum term.
3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang
This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).
Ranked #1 on 3D Human Reconstruction on Surreal
no code implementations • NeurIPS 2020 • Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, Hao Zhang
We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data.
no code implementations • ECCV 2020 • Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, Ziwei Liu
Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
no code implementations • 25 Jul 2020 • Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang
At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.
1 code implementation • 2 Aug 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee
Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses.
1 code implementation • ECCV 2020 • Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
17 code implementations • ICLR 2021 • Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance.
Ranked #6 on 2D Object Detection on SARDet-100K
1 code implementation • ICLR 2021 • Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai
In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.
1 code implementation • 17 Oct 2020 • Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee
This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications.
1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong
In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.
no code implementations • 18 Dec 2020 • Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li
With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation.
no code implementations • ICCV 2021 • Yuru Song, Zan Lou, Shan You, Erkun Yang, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang
Concretely, we introduce a privileged parameter so that the optimization direction does not necessarily follow the gradient from the privileged tasks, but concentrates more on the target tasks.
1 code implementation • ICCV 2021 • Zhaoyang Zhang, Yitong Jiang, Jun Jiang, Xiaogang Wang, Ping Luo, Jinwei Gu
STAR is a general architecture that can be easily adapted to different image enhancement tasks.
no code implementations • 8 Jan 2021 • Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.
1 code implementation • 31 Jan 2021 • Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li
3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields.
Ranked #2 on 3D Object Detection on KITTI Cars Easy val
1 code implementation • CVPR 2021 • Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.
no code implementations • CVPR 2021 • Xiaogang Wang, Xun Sun, Xinyu Cao, Kai Xu, Bin Zhou
Learning-based 3D shape segmentation is usually formulated as a semantic labeling problem, assuming that all parts of training shapes are annotated with a given set of tags.
no code implementations • 31 Mar 2021 • Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang
To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation.
1 code implementation • CVPR 2021 • Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li
The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.
Ranked #1 on 3D Semantic Scene Completion on NYUv2
no code implementations • CVPR 2021 • Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin
Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.
1 code implementation • 14 Apr 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.
1 code implementation • CVPR 2021 • Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu
While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.