1 code implementation • 17 May 2022 • Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao
When fine-tuning on downstream tasks, a modality-specific adapter is used to introduce the data and tasks' prior information into the model, making it suitable for these tasks.
Ranked #1 on
Semantic Segmentation
on ADE20K val
1 code implementation • 8 May 2022 • Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao
Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.
1 code implementation • 31 Mar 2022 • Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai
In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.
Ranked #84 on
3D Object Detection
on nuScenes
1 code implementation • 30 Mar 2022 • Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li
We introduce Optical Flow TransFormer (FlowFormer), a transformer-based neural network architecture for learning optical flow.
Ranked #1 on
Optical Flow Estimation
on Sintel-clean
1 code implementation • NeurIPS 2021 • Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong liu, Jifeng Dai
In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation.
no code implementations • 9 Dec 2021 • Chenxin Tao, Honghui Wang, Xizhou Zhu, Jiahua Dong, Shiji Song, Gao Huang, Jifeng Dai
These methods appear to be quite different in the designed loss functions from various motivations.
no code implementations • 2 Dec 2021 • Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai
The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.
1 code implementation • 26 Nov 2021 • Changyao Tian, Wenhai Wang, Xizhou Zhu, Xiaogang Wang, Jifeng Dai, Yu Qiao
Deep learning-based models encounter challenges when processing long-tailed data in the real world.
Ranked #1 on
Long-tail Learning
on ImageNet-LT
(using extra training data)
1 code implementation • 6 Nov 2021 • Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li
To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.
1 code implementation • ICCV 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.
Ranked #2 on
Video Inpainting
on DAVIS
1 code implementation • ICCV 2021 • Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, Conghui He
To obtain the Influence of the unlabeled sample in the active learning scenario, we design the Untrained Unlabeled sample Influence Calculation(UUIC) to estimate the unlabeled sample's expected gradient with which we calculate its Influence.
1 code implementation • ICCV 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
However, DETR suffers from its slow convergence.
1 code implementation • 2 Jul 2021 • Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang
As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.
no code implementations • 4 Jun 2021 • Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li
In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.
1 code implementation • 14 Apr 2021 • Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li
Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.
no code implementations • 25 Mar 2021 • Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu
However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated.
5 code implementations • ICCV 2021 • Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool
Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
2 code implementations • 19 Jan 2021 • Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li
The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.
no code implementations • CVPR 2021 • Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu
We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated.
1 code implementation • ICLR 2021 • Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai
In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.
12 code implementations • ICLR 2021 • Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai
DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance.
Ranked #43 on
Object Detection
on COCO test-dev
no code implementations • 3 Sep 2020 • Jingru Tan, Gang Zhang, Hanming Deng, Changbao Wang, Lewei Lu, Quanquan Li, Jifeng Dai
This article introduces the solutions of the team lvisTraveler for LVIS Challenge 2020.
Ranked #1 on
Instance Segmentation
on LVIS v1.0 test-dev
2 code implementations • ECCV 2020 • Guolei Sun, Wenguan Wang, Jifeng Dai, Luc van Gool
Moreover, our approach ranked 1st place in the Weakly-Supervised Semantic Segmentation Track of CVPR2020 Learning from Imperfect Data Challenge.
1 code implementation • CVPR 2020 • Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, Gao Huang
Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks.
1 code implementation • CVPR 2020 • Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao
As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.
2 code implementations • ICLR 2020 • Hang Gao, Xizhou Zhu, Steve Lin, Jifeng Dai
This is typically done by augmenting static operators with learned free-form sampling grids in the image space, dynamically tuned to the data and task for adapting the receptive field.
Ranked #153 on
Object Detection
on COCO test-dev
3 code implementations • ICLR 2020 • Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).
Ranked #1 on
Visual Question Answering
on VCR (Q-A) dev
144 code implementations • 17 Jun 2019 • Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, Dahua Lin
In this paper, we introduce the various features of this toolbox.
1 code implementation • ICCV 2019 • Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai
Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance.
19 code implementations • CVPR 2019 • Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai
The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects.
Ranked #96 on
Object Detection
on COCO minival
no code implementations • 27 Nov 2018 • Zheng Zhang, Dazhi Cheng, Xizhou Zhu, Stephen Lin, Jifeng Dai
Accurate detection and tracking of objects is vital for effective video understanding.
Ranked #7 on
Video Object Detection
on ImageNet VID
3 code implementations • 16 Apr 2018 • Xizhou Zhu, Jifeng Dai, Xingchi Zhu, Yichen Wei, Lu Yuan
In this paper, we present a light weight network architecture for video object detection on mobiles.
no code implementations • ECCV 2018 • Jiayuan Gu, Han Hu, Li-Wei Wang, Yichen Wei, Jifeng Dai
While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely hand-crafted, featured by RoI pooling methods.
6 code implementations • CVPR 2018 • Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei
Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era.
no code implementations • CVPR 2018 • Xizhou Zhu, Jifeng Dai, Lu Yuan, Yichen Wei
There has been significant progresses for image object detection in recent years.
2 code implementations • ICCV 2017 • Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.
Ranked #11 on
Video Object Detection
on ImageNet VID
37 code implementations • ICCV 2017 • Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei
Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules.
Ranked #181 on
Object Detection
on COCO test-dev
3 code implementations • CVPR 2017 • Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei
Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.
Ranked #8 on
Video Semantic Segmentation
on Cityscapes val
3 code implementations • CVPR 2017 • Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei
It inherits all the merits of FCNs for semantic segmentation and instance mask proposal.
Ranked #59 on
Instance Segmentation
on COCO test-dev
45 code implementations • NeurIPS 2016 • Jifeng Dai, Yi Li, Kaiming He, Jian Sun
In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image.
Ranked #4 on
Real-Time Object Detection
on PASCAL VOC 2007
no code implementations • CVPR 2016 • Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun
Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure.
no code implementations • 29 Mar 2016 • Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun
In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances.
2 code implementations • CVPR 2016 • Jifeng Dai, Kaiming He, Jian Sun
We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure.
Ranked #3 on
Multi-Human Parsing
on PASCAL-Part
no code implementations • ICCV 2015 • Jifeng Dai, Kaiming He, Jian Sun
Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks.
Ranked #48 on
Semantic Segmentation
on PASCAL Context
no code implementations • 19 Dec 2014 • Jifeng Dai, Yang Lu, Ying-Nian Wu
(2) We propose a generative gradient for pre-training CNNs by a non-parametric importance sampling scheme, which is fundamentally different from the commonly used discriminative gradient, and yet has the same computational architecture and cost as the latter.
1 code implementation • CVPR 2015 • Jifeng Dai, Kaiming He, Jian Sun
The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.
Ranked #52 on
Semantic Segmentation
on PASCAL Context
no code implementations • CVPR 2014 • Jifeng Dai, Yi Hong, Wenze Hu, Song-Chun Zhu, Ying Nian Wu
Given a set of unannotated training images, a dictionary of such hierarchical templates are learned so that each training image can be represented by a small number of templates that are spatially translated, rotated and scaled versions of the templates in the learned dictionary.