1 code implementation • 16 Jul 2024 • Wang Zeng, Sheng Jin, Lumin Xu, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
Our dynamic tokens possess two crucial characteristics: (1) Representing image regions with similar semantic meanings using the same vision token, even if those regions are not adjacent, and (2) concentrating on regions with valuable details and represent them using fine tokens.
no code implementations • 29 Jun 2024 • Xiaogang Wang, Liang Wang, Hongyu Wu, GuoQiang Xiao, Kai Xu
This framework consists of a primitive network and a constraint network, transforming the sketch analysis task into a set prediction problem to enhance the effective handling of primitives and constraints.
1 code implementation • CVPR 2024 • Yinglong Li, Hongyu Wu, Xiaogang Wang, Qingzhao Qin, Yijiao Zhao, Yong Wang, Aimin Hao
Our method can be used in medical prosthetic fabrication and the registration of deficient scanning data.
no code implementations • 28 May 2024 • Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang
The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models.
1 code implementation • 20 Dec 2023 • Zhaoyang Zhang, Wenqi Shao, Yixiao Ge, Xiaogang Wang, Jinwei Gu, Ping Luo
This work introduces a new Transformer model called Cached Transformer, which uses Gated Recurrent Cached (GRC) attention to extend the self-attention mechanism with a differentiable memory cache of tokens.
1 code implementation • 14 Dec 2023 • Wenhai Wang, Jiangwei Xie, Chuanyang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD).
no code implementations • CVPR 2024 • Hao Li, Xue Yang, Zhaokai Wang, Xizhou Zhu, Jie zhou, Yu Qiao, Xiaogang Wang, Hongsheng Li, Lewei Lu, Jifeng Dai
Many reinforcement learning environments (e. g., Minecraft) provide only sparse rewards that indicate task completion or failure with binary values.
no code implementations • CVPR 2024 • Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu
In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.
Ranked #2 on Motion Synthesis on InterHuman
1 code implementation • 21 Aug 2023 • Mingkai Zheng, Shan You, Lang Huang, Xiu Su, Fei Wang, Chen Qian, Xiaogang Wang, Chang Xu
Moreover, to further boost the performance, we propose ``distributional consistency" as a more informative regularization to enable similar instances to have a similar probability distribution.
no code implementations • 3 Aug 2023 • Liang Wang, Xiaogang Wang
In engineering applications, line, circle, arc, and point are collectively referred to as primitives, and they play a crucial role in path planning, simulation analysis, and manufacturing.
no code implementations • 8 Jun 2023 • Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Yijin Li, Hongwei Qin, Jifeng Dai, Xiaogang Wang, Hongsheng Li
This paper introduces a novel transformer-based network architecture, FlowFormer, along with the Masked Cost Volume AutoEncoding (MCVA) for pretraining it to tackle the problem of optical flow estimation.
1 code implementation • 8 Jun 2023 • Changyao Tian, Chenxin Tao, Jifeng Dai, Hao Li, Ziheng Li, Lewei Lu, Xiaogang Wang, Hongsheng Li, Gao Huang, Xizhou Zhu
In each denoising step, our method first decodes pixels from previous VQ tokens, then generates new VQ tokens from the decoded pixels.
1 code implementation • 25 May 2023 • Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, Jifeng Dai
These agents, equipped with the logic and common sense capabilities of LLMs, can skillfully navigate complex, sparse-reward environments with text-based interactions.
1 code implementation • CVPR 2023 • Zhaoyang Zhang, Yitong Jiang, Wenqi Shao, Xiaogang Wang, Ping Luo, Kaimo Lin, Jinwei Gu
Controllable image denoising aims to generate clean samples with human perceptual priors and balance sharpness and smoothness.
1 code implementation • 6 Mar 2023 • Yi Zhang, Dasong Li, Xiaoyu Shi, Dailan He, Kangning Song, Xiaogang Wang, Hongwei Qin, Hongsheng Li
In this paper, we propose a kernel basis attention (KBA) module, which introduces learnable kernel bases to model representative image patterns for spatial information aggregation.
Ranked #1 on Color Image Denoising on McMaster sigma50
no code implementations • 12 Jan 2023 • Xiaogang Wang, Yuhang Cheng, Liang Wang, Jiangbo Lu, Kai Xu, GuoQiang Xiao
Among them, the differential Laplican regularizer can effectively alleviate the implicit surface unsmoothness caused by the point cloud quality deteriorates; Meanwhile, in order to reduce the excessive smoothing at the edge regions of implicit suface, we proposed a dynamic edge extract strategy for sampling near the sharp edge of point cloud, which can effectively avoid the Laplacian regularizer from smoothing all regions.
1 code implementation • CVPR 2023 • Weijie Su, Xizhou Zhu, Chenxin Tao, Lewei Lu, Bin Li, Gao Huang, Yu Qiao, Xiaogang Wang, Jie zhou, Jifeng Dai
It has been proved that combining multiple pre-training strategies and data from various modalities/sources can greatly boost the training of large-scale models.
Ranked #2 on Object Detection on COCO test-dev
2 code implementations • CVPR 2023 • Hao Li, Jinguo Zhu, Xiaohu Jiang, Xizhou Zhu, Hongsheng Li, Chun Yuan, Xiaohua Wang, Yu Qiao, Xiaogang Wang, Wenhai Wang, Jifeng Dai
In this paper, we propose Uni-Perceiver v2, which is the first generalist model capable of handling major large-scale vision and vision-language tasks with competitive performance.
3 code implementations • CVPR 2023 • Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state.
Ranked #1 on Instance Segmentation on COCO test-dev (AP50 metric, using extra training data)
1 code implementation • 10 Nov 2022 • Xiaowei Hu, Min Shi, Weiyun Wang, Sitong Wu, Linjie Xing, Wenhai Wang, Xizhou Zhu, Lewei Lu, Jie zhou, Xiaogang Wang, Yu Qiao, Jifeng Dai
Our experiments on various tasks and an analysis of inductive bias show a significant performance boost due to advanced network-level and block-level designs, but performance differences persist among different STMs.
no code implementations • 19 Sep 2022 • Zhe Wang, Hongsheng Li, Qinwei Zhang, Jing Yuan, Xiaogang Wang
Adaptively learning a distance metric from the undersampled training data can significantly improve the matching accuracy of the query fingerprints.
1 code implementation • 23 Aug 2022 • Lumin Xu, Sheng Jin, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
We propose a single-network approach, termed ZoomNet, to take into account the hierarchical structure of the full human body and solve the scale variation of different body parts.
Ranked #5 on 2D Human Pose Estimation on COCO-WholeBody
1 code implementation • 10 Aug 2022 • Dasong Li, Yi Zhang, Ka Chun Cheung, Xiaogang Wang, Hongwei Qin, Hongsheng Li
With the integration, MSDI-Net can handle various and complicated blurry patterns adaptively.
Ranked #20 on Image Deblurring on GoPro
2 code implementations • 6 Aug 2022 • Ziyi Lin, Shijie Geng, Renrui Zhang, Peng Gao, Gerard de Melo, Xiaogang Wang, Jifeng Dai, Yu Qiao, Hongsheng Li
Video recognition has been dominated by the end-to-end learning paradigm -- first initializing a video recognition model with weights of a pretrained image model and then conducting end-to-end training on videos.
Ranked #27 on Action Classification on Kinetics-400 (using extra training data)
no code implementations • 28 Jul 2022 • Hang Du, Rebecca Pillai Riddell, Xiaogang Wang
In this article, we present a new EEG signal classification framework by integrating the complex-valued and real-valued Convolutional Neural Network(CNN) with discrete Fourier transform (DFT).
1 code implementation • 21 Jul 2022 • Lumin Xu, Sheng Jin, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, Xiaogang Wang
In this paper, we introduce the task of Category-Agnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition.
Ranked #1 on Category-Agnostic Pose Estimation on MP100
1 code implementation • 7 Jul 2022 • Wenqi Shao, Xun Zhao, Yixiao Ge, Zhaoyang Zhang, Lei Yang, Xiaogang Wang, Ying Shan, Ping Luo
It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive.
Ranked #2 on Transferability on classification benchmark
no code implementations • 25 Jun 2022 • Yechao Bai, Xiaogang Wang, Marcelo H. Ang Jr, Daniela Rus
The learning and aggregation of multi-scale features are essential in empowering neural networks to capture the fine-grained geometric details in the point cloud upsampling task.
1 code implementation • CVPR 2023 • Dasong Li, Xiaoyu Shi, Yi Zhang, Ka Chun Cheung, Simon See, Xiaogang Wang, Hongwei Qin, Hongsheng Li
In this study, we propose a simple yet effective framework for video restoration.
Ranked #2 on Deblurring on GoPro (using extra training data)
1 code implementation • 19 Jun 2022 • Jiageng Mao, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li
Autonomous driving, in recent years, has been receiving increasing attention for its potential to relieve drivers' burdens and improve the safety of driving.
1 code implementation • 9 Jun 2022 • Jinguo Zhu, Xizhou Zhu, Wenhai Wang, Xiaohua Wang, Hongsheng Li, Xiaogang Wang, Jifeng Dai
To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models.
2 code implementations • CVPR 2023 • Chenxin Tao, Xizhou Zhu, Weijie Su, Gao Huang, Bin Li, Jie zhou, Yu Qiao, Xiaogang Wang, Jifeng Dai
Driven by these analysis, we propose Siamese Image Modeling (SiameseIM), which predicts the dense representations of an augmented view, based on another masked view from the same image but with different augmentations.
no code implementations • 10 May 2022 • Dasong Li, Yi Zhang, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li
As for each sub-network, we propose an efficient multi-frequency denoising network to remove noise of different frequencies.
1 code implementation • CVPR 2022 • Wang Zeng, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Vision transformers have achieved great successes in many computer vision tasks.
Ranked #7 on 2D Human Pose Estimation on COCO-WholeBody
no code implementations • CVPR 2022 • Yingjie Cai, Kwan-Yee Lin, Chao Zhang, Qiang Wang, Xiaogang Wang, Hongsheng Li
Specifically, we map a series of related partial point clouds into multiple complete shape and occlusion code pairs and fuse the codes to obtain their representations in the unified latent space.
1 code implementation • CVPR 2022 • Yujing Xue, Jiageng Mao, Minzhe Niu, Hang Xu, Michael Bi Mi, Wei zhang, Xiaogang Wang, Xinchao Wang
We further propose a lightweight scene-to-sequence decoder that can auto-regressively generate words conditioned on features from a 3D scene as well as cues from the preceding words.
1 code implementation • CVPR 2022 • Yan Xu, Kwan-Yee Lin, Guofeng Zhang, Xiaogang Wang, Hongsheng Li
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover the object poses.
Ranked #1 on 6D Pose Estimation using RGB on LineMOD
1 code implementation • 16 Mar 2022 • Mingkai Zheng, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu
Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations.
Ranked #62 on Self-Supervised Image Classification on ImageNet
1 code implementation • 27 Feb 2022 • Yan Xu, Junyi Lin, Jianping Shi, Guofeng Zhang, Xiaogang Wang, Hongsheng Li
The correct ego-motion estimation basically relies on the understanding of correspondences between adjacent LiDAR scans.
no code implementations • 13 Jan 2022 • Haiyue Fang, Xiaogang Wang, Zheyuan Cai, Yahao Shi, Xun Sun, Shilin Wu, Bin Zhou
This is in contrast to current methods, which focus solely on either 3D shape abstraction or semantic analysis.
1 code implementation • ICLR 2022 • Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, Xuyuan Xu, Xiaogang Wang, Ying Shan, Ping Luo
It is difficult for Transformers to capture inductive bias such as the positional context in an image with LN.
1 code implementation • CVPR 2022 • Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai
The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.
1 code implementation • CVPR 2022 • Yi Zhang, Dasong Li, Ka Lung Law, Xiaogang Wang, Hongwei Qin, Hongsheng Li
To evaluate raw image denoising performance in real-world applications, we build a high-quality raw image dataset SenseNoise-500 that contains 500 real-life scenes.
no code implementations • CVPR 2022 • Tao Huang, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu
In this paper, we leverage an explicit path filter to capture the characteristics of paths and directly filter those weak ones, so that the search can be thus implemented on the shrunk space more greedily and efficiently.
no code implementations • 16 Nov 2021 • Jing Shao, Siyu Chen, Yangguang Li, Kun Wang, Zhenfei Yin, Yinan He, Jianing Teng, Qinghong Sun, Mengya Gao, Jihao Liu, Gengshi Huang, Guanglu Song, Yichao Wu, Yuming Huang, Fenggang Liu, Huan Peng, Shuo Qin, Chengyu Wang, Yujie Wang, Conghui He, Ding Liang, Yu Liu, Fengwei Yu, Junjie Yan, Dahua Lin, Xiaogang Wang, Yu Qiao
Enormous waves of technological innovations over the past several years, marked by the advances in AI technologies, are profoundly reshaping the industry and the society.
1 code implementation • ICCV 2021 • Mingkai Zheng, Fei Wang, Shan You, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu
Specifically, our proposed framework is based on two projection heads, one of which will perform the regular instance discrimination task.
1 code implementation • ICCV 2021 • Yi Zhang, Hongwei Qin, Xiaogang Wang, Hongsheng Li
However, the real raw image noise is contributed by many noise sources and varies greatly among different sensors.
Ranked #2 on Image Denoising on SID SonyA7S2 x100
1 code implementation • ICCV 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.
Ranked #3 on Video Inpainting on DAVIS
1 code implementation • ICCV 2021 • Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee
Deep learning technique has yielded significant improvements in point cloud completion with the aim of completing missing object shapes from partial inputs.
1 code implementation • ICCV 2021 • Xiaoyang Guo, Shaoshuai Shi, Xiaogang Wang, Hongsheng Li
Compared with the state-of-the-art stereo detector, our method has improved the 3D detection performance of cars, pedestrians, cyclists by 10. 44%, 5. 69%, 5. 97% mAP respectively on the official KITTI benchmark.
1 code implementation • ICCV 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
However, DETR suffers from its slow convergence.
2 code implementations • NeurIPS 2021 • Mingkai Zheng, Shan You, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu
Self-supervised Learning (SSL) including the mainstream contrastive learning has achieved great success in learning visual representations without data annotations.
Ranked #80 on Self-Supervised Image Classification on ImageNet
1 code implementation • 25 Jun 2021 • Xiu Su, Shan You, Jiyang Xie, Mingkai Zheng, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang, Chang Xu
Vision transformers (ViTs) inherited the success of NLP but their structures have not been sufficiently investigated and optimized for visual tasks.
no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li
In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
4 code implementations • CVPR 2021 • Lumin Xu, Yingda Guan, Sheng Jin, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang, Xiaogang Wang
Human pose estimation has achieved significant progress in recent years.
Ranked #23 on Pose Estimation on COCO test-dev
no code implementations • 27 Apr 2021 • Yixiao Ge, Xiao Zhang, Ching Lam Choi, Ka Chun Cheung, Peipei Zhao, Feng Zhu, Xiaogang Wang, Rui Zhao, Hongsheng Li
In this way, our BAKE framework achieves online knowledge ensembling across multiple samples with only a single network.
1 code implementation • CVPR 2021 • Hang Zhou, Yasheng Sun, Wayne Wu, Chen Change Loy, Xiaogang Wang, Ziwei Liu
While speech content information can be defined by learning the intrinsic synchronization between audio-visual modalities, we identify that a pose code will be complementarily learned in a modulated convolution-based reconstruction framework.
1 code implementation • 14 Apr 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.
no code implementations • CVPR 2021 • Xudong Xu, Hang Zhou, Ziwei Liu, Bo Dai, Xiaogang Wang, Dahua Lin
Moreover, combined with binaural recordings, our method is able to further boost the performance of binaural audio generation under supervised settings.
1 code implementation • CVPR 2021 • Yingjie Cai, Xuesong Chen, Chao Zhang, Kwan-Yee Lin, Xiaogang Wang, Hongsheng Li
The key insight is that we decouple the instances from a coarsely completed semantic scene instead of a raw input image to guide the reconstruction of instances and the overall scene.
Ranked #2 on 3D Semantic Scene Completion on NYUv2
no code implementations • 31 Mar 2021 • Jiangfan Han, Mengya Gao, Yujie Wang, Quanquan Li, Hongsheng Li, Xiaogang Wang
To solve this problem, in this paper, we propose a novel student-dependent distillation method, knowledge consistent distillation, which makes teacher's knowledge more consistent with the student and provides the best suitable knowledge to different student networks for distillation.
no code implementations • CVPR 2021 • Xiaogang Wang, Xun Sun, Xinyu Cao, Kai Xu, Bin Zhou
Learning-based 3D shape segmentation is usually formulated as a semantic labeling problem, assuming that all parts of training shapes are annotated with a given set of tags.
1 code implementation • CVPR 2021 • Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, Hongsheng Li
Conditional generative adversarial networks (cGANs) target at synthesizing diverse images given the input conditions and latent codes, but unfortunately, they usually suffer from the issue of mode collapse.
1 code implementation • 31 Jan 2021 • Shaoshuai Shi, Li Jiang, Jiajun Deng, Zhe Wang, Chaoxu Guo, Jianping Shi, Xiaogang Wang, Hongsheng Li
3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields.
Ranked #2 on 3D Object Detection on KITTI Cars Easy val
2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.
no code implementations • 8 Jan 2021 • Dan Xu, Xavier Alameda-Pineda, Wanli Ouyang, Elisa Ricci, Xiaogang Wang, Nicu Sebe
In contrast to previous works directly considering multi-scale feature maps obtained from the inner layers of a primary CNN architecture, and simply fusing the features with weighted averaging or concatenation, we propose a probabilistic graph attention network structure based on a novel Attention-Gated Conditional Random Fields (AG-CRFs) model for learning and fusing multi-scale representations in a principled manner.
1 code implementation • ICCV 2021 • Zhaoyang Zhang, Yitong Jiang, Jun Jiang, Xiaogang Wang, Ping Luo, Jinwei Gu
STAR is a general architecture that can be easily adapted to different image enhancement tasks.
no code implementations • ICCV 2021 • Yuru Song, Zan Lou, Shan You, Erkun Yang, Fei Wang, Chen Qian, ChangShui Zhang, Xiaogang Wang
Concretely, we introduce a privileged parameter so that the optimization direction does not necessarily follow the gradient from the privileged tasks, but concentrates more on the target tasks.
no code implementations • 18 Dec 2020 • Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li
With the proposed holistically-guided decoder, we implement the EfficientFCN architecture for semantic segmentation and HGD-FPN for object detection and instance segmentation.
1 code implementation • 18 Nov 2020 • Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong
In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.
1 code implementation • 17 Oct 2020 • Xiaogang Wang, Marcelo H Ang Jr, Gim Hee Lee
This is to mitigate the dependence of existing approaches on large amounts of ground truth training data that are often difficult to obtain in real-world applications.
1 code implementation • ICLR 2021 • Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai
In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.
18 code implementations • ICLR 2021 • Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance.
Ranked #8 on 2D Object Detection on SARDet-100K
1 code implementation • ECCV 2020 • Xihui Liu, Zhe Lin, Jianming Zhang, Handong Zhao, Quan Tran, Xiaogang Wang, Hongsheng Li
We propose a novel algorithm, named Open-Edit, which is the first attempt on open-domain image manipulation with open-vocabulary instructions.
1 code implementation • 2 Aug 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee
Then we learn a mapping to transfer the point features from partial points to that of the complete points by optimizing feature alignment losses.
no code implementations • 25 Jul 2020 • Peng Su, Shixiang Tang, Peng Gao, Di Qiu, Ni Zhao, Xiaogang Wang
At the core of our method, gradient regularization plays two key roles: (1) enforces the gradient of contrastive loss not to increase the supervised training loss on the source domain, which maintains the discriminative power of learned features; (2) regularizes the gradient update on the new domain not to increase the classification loss on the old target domains, which enables the model to adapt to an in-coming target domain while preserving the performance of previously observed domains.
no code implementations • ECCV 2020 • Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, Ziwei Liu
Stereophonic audio is an indispensable ingredient to enhance human auditory experience.
no code implementations • NeurIPS 2020 • Xiaogang Wang, Yuelang Xu, Kai Xu, Andrea Tagliasacchi, Bin Zhou, Ali Mahdavi-Amiri, Hao Zhang
We introduce an end-to-end learnable technique to robustly identify feature edges in 3D point cloud data.
3 code implementations • CVPR 2020 • Wang Zeng, Wanli Ouyang, Ping Luo, Wentao Liu, Xiaogang Wang
This paper proposes a model-free 3D human mesh estimation framework, named DecoMR, which explicitly establishes the dense correspondence between the mesh and the local image features in the UV space (i. e. a 2D space used for texture mapping of 3D mesh).
Ranked #1 on 3D Human Reconstruction on Surreal
1 code implementation • 4 Jun 2020 • Zana Rashidi, Kasra Ahmadi K. A., Aijun An, Xiaogang Wang
We propose a novel and efficient momentum-based first-order algorithm for optimizing neural networks which uses an adaptive coefficient for the momentum term.
1 code implementation • CVPR 2020 • Rui Liu, Chengxi Yang, Wenxiu Sun, Xiaogang Wang, Hongsheng Li
Large-scale synthetic datasets are beneficial to stereo matching but usually introduce known domain bias.
1 code implementation • CVPR 2020 • Xiaogang Wang, Marcelo H. Ang Jr, Gim Hee Lee
Point clouds are often sparse and incomplete.
1 code implementation • CVPR 2020 • Hang Zhou, Jihao Liu, Ziwei Liu, Yu Liu, Xiaogang Wang
Though face rotation has achieved rapid progress in recent years, the lack of high-quality paired training data remains a great hurdle for existing methods.
2 code implementations • 17 Mar 2020 • Yu Liu, Guanglu Song, Yuhang Zang, Yan Gao, Enze Xie, Junjie Yan, Chen Change Loy, Xiaogang Wang
Given such good instance bounding box, we further design a simple instance-level semantic segmentation pipeline and achieve the 1st place on the segmentation challenge.
2 code implementations • CVPR 2020 • Guanglu Song, Yu Liu, Xiaogang Wang
The ``shared head for classification and localization'' (sibling head), firstly denominated in Fast RCNN~\cite{girshick2015fast}, has been leading the fashion of the object detection community in the past five years.
Ranked #79 on Object Detection on COCO test-dev
no code implementations • 17 Mar 2020 • Guanglu Song, Yu Liu, Yuhang Zang, Xiaogang Wang, Biao Leng, Qingsheng Yuan
The small receptive field and capacity of minimal neural networks limit their performance when using them to be the backbone of detectors.
no code implementations • ECCV 2020 • Peng Su, Kun Wang, Xingyu Zeng, Shixiang Tang, Dapeng Chen, Di Qiu, Xiaogang Wang
Then this domain-vector is used to encode the features from another domain through a conditional normalization, resulting in different domains' features carrying the same domain attribute.
Ranked #1 on Unsupervised Domain Adaptation on SIM10K to BDD100K
3 code implementations • 14 Mar 2020 • Yixiao Ge, Feng Zhu, Dapeng Chen, Rui Zhao, Xiaogang Wang, Hongsheng Li
To tackle the challenges, we propose an end-to-end structured domain adaptation framework with an online relation-consistency regularization term.
Ranked #5 on Unsupervised Domain Adaptation on Market to MSMT
1 code implementation • ICML 2020 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
Unlike prior arts that simply removed the inhibited channels, we propose to "wake them up" during training by designing a novel neural building block, termed Channel Equilibrium (CE) block, which enables channels at the same layer to contribute equally to the learned representation.
no code implementations • 5 Feb 2020 • Yingjie Cai, Buyu Li, Zeyu Jiao, Hongsheng Li, Xingyu Zeng, Xiaogang Wang
Monocular 3D object detection task aims to predict the 3D bounding boxes of objects based on monocular RGB images.
no code implementations • 15 Jan 2020 • Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen
To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN).
12 code implementations • CVPR 2020 • Shaoshuai Shi, Chaoxu Guo, Li Jiang, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
We present a novel and high-performance 3D object detection framework, named PointVoxel-RCNN (PV-RCNN), for accurate 3D object detection from point clouds.
no code implementations • CVPR 2020 • Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang
Standard Knowledge Distillation (KD) approaches distill the knowledge of a cumbersome teacher model into the parameters of a student model with a pre-defined architecture.
no code implementations • ICCV 2019 • Hang Zhou, Ziwei Liu, Xudong Xu, Ping Luo, Xiaogang Wang
Extensive experiments demonstrate that our framework is capable of inpainting realistic and varying audio segments with or without visual contexts.
1 code implementation • NeurIPS 2019 • Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
Semantic image synthesis aims at generating photorealistic images from semantic layouts.
no code implementations • 25 Sep 2019 • Wenqi Shao, Shitao Tang, Xingang Pan, Ping Tan, Xiaogang Wang, Ping Luo
However, over-sparse CNNs have many collapsed channels (i. e. many channels with undesired zero values), impeding their learning ability.
1 code implementation • ICCV 2019 • Zihao Wang, Xihui Liu, Hongsheng Li, Lu Sheng, Junjie Yan, Xiaogang Wang, Jing Shao
Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Ranked #9 on Image Retrieval on Flickr30K 1K test
no code implementations • ICCV 2019 • Zhaoyang Zhang, Jingyu Li, Wenqi Shao, Zhanglin Peng, Ruimao Zhang, Xiaogang Wang, Ping Luo
ResNeXt, still suffers from the sub-optimal performance due to manually defining the number of groups as a constant over all of the layers.
no code implementations • ICCV 2019 • Jiangfan Han, Xiaoyi Dong, Ruimao Zhang, Dong-Dong Chen, Weiming Zhang, Nenghai Yu, Ping Luo, Xiaogang Wang
Recently, generation-based methods have received much attention since they directly use feed-forward networks to generate the adversarial samples, which avoid the time-consuming iterative attacking procedure in optimization-based and gradient-based methods.
no code implementations • ICCV 2019 • Jiageng Mao, Xiaogang Wang, Hongsheng Li
Our InterpConv is shown to be permutation and sparsity invariant, and can directly handle irregular inputs.
Ranked #28 on 3D Part Segmentation on ShapeNet-Part
no code implementations • ICCV 2019 • Peng Gao, Haoxuan You, Zhanpeng Zhang, Xiaogang Wang, Hongsheng Li
The proposed module learns the cross-modality relationships between latent visual and language summarizations, which summarize visual regions and question into a small number of latent representations to avoid modeling uninformative individual region-word relations.
no code implementations • ICCV 2019 • Jiangfan Han, Ping Luo, Xiaogang Wang
Unlike previous works constrained by many conditions, making them infeasible to real noisy cases, this work presents a novel deep self-learning framework to train a robust network on the real noisy datasets without extra supervision.
6 code implementations • 8 Jul 2019 • Shaoshuai Shi, Zhe Wang, Jianping Shi, Xiaogang Wang, Hongsheng Li
3D object detection from LiDAR point cloud is a challenging problem in 3D scene understanding and has many practical applications.
1 code implementation • CVPR 2019 • Hongyang Li, David Eigen, Samuel Dodge, Matthew Zeiler, Xiaogang Wang
Few-shot learning is an important area of research.
no code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Junjie Yan, Mengya Gao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Cosine-based softmax losses significantly improve the performance of deep face recognition networks.
1 code implementation • NeurIPS 2019 • Yikang Li, Tao Ma, Yeqi Bai, Nan Duan, Sining Wei, Xiaogang Wang
Therefore, to generate the images with preferred objects and rich interactions, we propose a semi-parametric method, PasteGAN, for generating the image from the scene graph and the image crops, where spatial arrangements of the objects and their pair-wise relationships are defined by the scene graph and the object appearances are determined by the given object crops.
5 code implementations • CVPR 2019 • Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li
Our results show that training deep neural networks with the AdaCos loss is stable and able to achieve high face recognition accuracy.
Ranked #6 on Face Verification on MegaFace
no code implementations • 16 Apr 2019 • Yikang Li, Chris Twigg, Yuting Ye, Lingling Tao, Xiaogang Wang
Hand pose estimation from the monocular 2D image is challenging due to the variation in lighting, appearance, and background.
no code implementations • CVPR 2019 • Rui Liu, Yu Liu, Xinyu Gong, Xiaogang Wang, Hongsheng Li
Flow-based generative models show great potential in image synthesis due to its reversible pipeline and exact log-likelihood target, yet it suffers from weak ability for conditional image synthesis, especially for multi-label or unaware conditions.
no code implementations • CVPR 2019 • Guojun Yin, Bin Liu, Lu Sheng, Nenghai Yu, Xiaogang Wang, Jing Shao
Synthesizing photo-realistic images from text descriptions is a challenging problem.
no code implementations • CVPR 2019 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao
Dense captioning aims at simultaneously localizing semantic regions and describing these regions-of-interest (ROIs) with short phrases or sentences in natural language.
Ranked #3 on Dense Captioning on Visual Genome
2 code implementations • ICLR 2019 • Hongyang Li, Bo Dai, Shaoshuai Shi, Wanli Ouyang, Xiaogang Wang
We argue that the reliable set could guide the feature learning of the less reliable set during training - in spirit of student mimicking teacher behavior and thus pushing towards a more compact class centroid in the feature space.
Ranked #145 on Object Detection on COCO test-dev
no code implementations • CVPR 2019 • Buyu Li, Wanli Ouyang, Lu Sheng, Xingyu Zeng, Xiaogang Wang
We present an efficient 3D object detection framework based on a single RGB image in the scenario of autonomous driving.
Ranked #18 on Vehicle Pose Estimation on KITTI Cars Hard
no code implementations • CVPR 2019 • Xipeng Chen, Kwan-Yee Lin, Wentao Liu, Chen Qian, Xiaogang Wang, Liang Lin
Recent studies have shown remarkable advances in 3D human pose estimation from monocular images, with the help of large-scale in-door 3D datasets and sophisticated network architectures.
2 code implementations • CVPR 2019 • Junting Pan, Chengyu Wang, Xu Jia, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes the novel task of video generation conditioned on a SINGLE semantic label map, which provides a good balance between flexibility and quality in the generation process.
2 code implementations • CVPR 2019 • Xiaoyang Guo, Kai Yang, Wukui Yang, Xiaogang Wang, Hongsheng Li
Previous works built cost volumes with cross-correlation or concatenation of left and right features across all disparity levels, and then a 2D or 3D convolutional neural network is utilized to regress the disparity maps.
1 code implementation • CVPR 2019 • Xiaogang Wang, Bin Zhou, Yahao Shi, Xiaowu Chen, Qinping Zhao, Kai Xu
For the task of mobility analysis of 3D shapes, we propose joint analysis for simultaneous motion part segmentation and motion attribute estimation, taking a single 3D model as input.
1 code implementation • CVPR 2019 • Wenqi Shao, Tianjian Meng, Jingyu Li, Ruimao Zhang, Yudian Li, Xiaogang Wang, Ping Luo
Unlike $\ell_1$ and $\ell_0$ constraints that impose difficulties in optimization, we turn this constrained optimization problem into feed-forward computation by proposing SparsestMax, which is a sparse version of softmax.
1 code implementation • 4 Mar 2019 • Mingyang Liang, Xiaoyang Guo, Hongsheng Li, Xiaogang Wang, You Song
Unsupervised cross-spectral stereo matching aims at recovering disparity given cross-spectral image pairs without any supervision in the form of ground truth disparity or depth.
no code implementations • CVPR 2019 • Xihui Liu, ZiHao Wang, Jing Shao, Xiaogang Wang, Hongsheng Li
Referring expression grounding aims at locating certain objects or persons in an image with a referring expression, where the key challenge is to comprehend and align various types of information from visual and textual domain, such as visual attributes, location and interactions with surrounding regions.
no code implementations • 3 Mar 2019 • Lu Sheng, Junting Pan, Jiaming Guo, Jing Shao, Xiaogang Wang, Chen Change Loy
Imagining multiple consecutive frames given one single snapshot is challenging, since it is difficult to simultaneously predict diverse motions from a single image and faithfully generate novel frames without visual distortions.
5 code implementations • CVPR 2019 • Yuying Ge, Ruimao Zhang, Lingyun Wu, Xiaogang Wang, Xiaoou Tang, Ping Luo
A strong baseline is proposed, called Match R-CNN, which builds upon Mask R-CNN to solve the above four tasks in an end-to-end manner.
no code implementations • 13 Dec 2018 • Gao Peng, Zhengkai Jiang, Haoxuan You, Pan Lu, Steven Hoi, Xiaogang Wang, Hongsheng Li
It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering.
13 code implementations • CVPR 2019 • Shaoshuai Shi, Xiaogang Wang, Hongsheng Li
In this paper, we propose PointRCNN for 3D object detection from raw point cloud.
Ranked #2 on Object Detection on KITTI Cars Moderate
9 code implementations • 13 Nov 2018 • Buyu Li, Yu Liu, Xiaogang Wang
Despite the great success of two-stage detectors, single-stage detector is still a more elegant and efficient way, yet suffers from the two well-known disharmonies during training, i. e. the huge difference in quantity between positive and negative examples as well as between easy and hard examples.
Ranked #176 on Object Detection on COCO test-dev
2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.
Ranked #4 on Person Re-Identification on CUHK03
no code implementations • 13 Sep 2018 • Xiaogang Wang, Bin Zhou, Haiyue Fang, Xiaowu Chen, Qinping Zhao, Kai Xu
We propose to generate part hypotheses from the components based on a hierarchical grouping strategy, and perform labeling on those part groups instead of directly on the components.
no code implementations • 6 Sep 2018 • Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, Matti Pietikäinen
Object detection, one of the most fundamental and challenging problems in computer vision, seeks to locate object instances from a large number of predefined categories in natural images.
no code implementations • 27 Aug 2018 • Zixuan Huang, Junming Fan, Shenggan Cheng, Shuai Yi, Xiaogang Wang, Hongsheng Li
Dense depth cues are important and have wide applications in various computer vision tasks.
Ranked #10 on Depth Completion on KITTI Depth Completion
1 code implementation • ECCV 2018 • Xiaoyang Guo, Hongsheng Li, Shuai Yi, Jimmy Ren, Xiaogang Wang
Monocular depth estimation aims at estimating a pixelwise depth map for a single image, which has wide applications in scene understanding and autonomous driving.
2 code implementations • ECCV 2018 • Hongyang Li, Xiaoyang Guo, Bo Dai, Wanli Ouyang, Xiaogang Wang
Motivated by the routing to make higher capsule have agreement with lower capsule, we extend the mechanism as a compensation for the rapid loss of information in nearby layers.
no code implementations • ECCV 2018 • Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven Hoi, Xiaogang Wang
Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features. To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage.
Ranked #14 on Visual Question Answering (VQA) on CLEVR
no code implementations • ECCV 2018 • Dapeng Chen, Hongsheng Li, Xihui Liu, Yantao Shen, Zejian yuan, Xiaogang Wang
Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities.
Ranked #24 on Text based Person Retrieval on CUHK-PEDES
1 code implementation • CVPR 2018 • Yantao Shen, Tong Xiao, Hongsheng Li, Shuai Yi, Xiaogang Wang
Person re-identification aims to robustly measure similarities between person images.
1 code implementation • CVPR 2018 • Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi, Dapeng Chen, Xiaogang Wang
Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images.
no code implementations • ECCV 2018 • Yantao Shen, Hongsheng Li, Shuai Yi, Dapeng Chen, Xiaogang Wang
However, existing person re-identification models mostly estimate the similarities of different image pairs of probe and gallery images independently while ignores the relationship information between different probe-gallery pairs.
Ranked #2 on Person Re-Identification on CUHK03
1 code implementation • 20 Jul 2018 • Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang
Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech.
no code implementations • 16 Jul 2018 • Ruimao Zhang, Hongbin Sun, Jingyu Li, Yuying Ge, Liang Lin, Ping Luo, Xiaogang Wang
To address the above issues, we present a novel and practical deep architecture for video person re-identification termed Self-and-Collaborative Attention Network (SCAN).
no code implementations • ECCV 2018 • Guojun Yin, Lu Sheng, Bin Liu, Nenghai Yu, Xiaogang Wang, Jing Shao, Chen Change Loy
We show that by encouraging deep message propagation and interactions between local object features and global predicate features, one can achieve compelling performance in recognizing complex relationships without using any linguistic priors.
1 code implementation • ECCV 2018 • Yikang Li, Wanli Ouyang, Bolei Zhou, Jianping Shi, Chao Zhang, Xiaogang Wang
Generating scene graph to describe all the relations inside an image gains increasing interests these years.
Ranked #1 on Scene Graph Generation on VRD
no code implementations • 4 Jun 2018 • Hui Zhou, Wanli Ouyang, Jian Cheng, Xiaogang Wang, Hongsheng Li
In addition, inter-object relations are mostly modeled in a symmetric way, which we argue is not an optimal setting.
no code implementations • CVPR 2018 • Maoqing Tian, Shuai Yi, Hongsheng Li, Shihua Li, Xuesen Zhang, Jianping Shi, Junjie Yan, Xiaogang Wang
State-of-the-art methods mainly utilize deep learning based approaches for learning visual features for describing person appearances.
no code implementations • CVPR 2018 • Dapeng Chen, Hongsheng Li, Tong Xiao, Shuai Yi, Xiaogang Wang
The attention weights are obtained based on a query feature, which is learned from the whole probe snippet by an LSTM network, making the resulting embeddings less affected by noisy frames.
Ranked #4 on Person Re-Identification on PRID2011
no code implementations • CVPR 2018 • Yujun Shen, Ping Luo, Junjie Yan, Xiaogang Wang, Xiaoou Tang
Existing methods typically formulate GAN as a two-player game, where a discriminator distinguishes face images from the real and synthesized domains, while a generator reduces its discriminativeness by synthesizing a face of photo-realistic quality.
no code implementations • CVPR 2018 • Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, Xiaogang Wang
Extensive experiments demonstrate the effectiveness of our model that combines DNN and CRF for learning robust multi-scale local similarities.
no code implementations • 13 May 2018 • Masoud Ataei, Shengyuan Chen, Xiaogang Wang
We propose a new class of transforms that we call {\it Lehmer Transform} which is motivated by the {\it Lehmer mean function}.
no code implementations • CVPR 2018 • Dan Xu, Wanli Ouyang, Xiaogang Wang, Nicu Sebe
Depth estimation and scene parsing are two particularly important tasks in visual scene understanding.
Ranked #15 on Depth Estimation on NYU-Depth V2
3 code implementations • CVPR 2018 • Lu Sheng, Ziyi Lin, Jing Shao, Xiaogang Wang
Zero-shot artistic style transfer is an important image synthesis problem aiming at transferring arbitrary style into content images.
no code implementations • 25 Apr 2018 • Zhe Wang, Hongsheng Li, Wanli Ouyang, Xiaogang Wang
Statistical features, such as histogram, Bag-of-Words (BoW) and Fisher Vector, were commonly used with hand-crafted features in conventional classification methods, but attract less attention since the popularity of deep learning methods.
no code implementations • CVPR 2018 • Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, Xiaogang Wang
This paper proposes learning disentangled but complementary face features with minimal supervision by face identification.
no code implementations • CVPR 2018 • Shuang Li, Slawomir Bak, Peter Carr, Xiaogang Wang
As a result, the network learns latent representations of the face, torso and other body parts using the best available image patches from the entire video sequence.