Search Results for author: Jifeng Dai

Found 47 papers, 33 papers with code

Vision Transformer Adapter for Dense Predictions

1 code implementation17 May 2022 Zhe Chen, Yuchen Duan, Wenhai Wang, Junjun He, Tong Lu, Jifeng Dai, Yu Qiao

When fine-tuning on downstream tasks, a modality-specific adapter is used to introduce the data and tasks' prior information into the model, making it suitable for these tasks.

Instance Segmentation Object Detection +1

ConvMAE: Masked Convolution Meets Masked Autoencoders

1 code implementation8 May 2022 Peng Gao, Teli Ma, Hongsheng Li, Ziyi Lin, Jifeng Dai, Yu Qiao

Masked auto-encoding for feature pretraining and multi-scale hybrid convolution-transformer architectures can further unleash the potentials of ViT, leading to state-of-the-art performances on image classification, detection and semantic segmentation.

Image Classification Object Detection +1

BEVFormer: Learning Bird's-Eye-View Representation from Multi-Camera Images via Spatiotemporal Transformers

1 code implementation31 Mar 2022 Zhiqi Li, Wenhai Wang, Hongyang Li, Enze Xie, Chonghao Sima, Tong Lu, Qiao Yu, Jifeng Dai

In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries.

3D Object Detection Autonomous Driving

FlowFormer: A Transformer Architecture for Optical Flow

1 code implementation30 Mar 2022 Zhaoyang Huang, Xiaoyu Shi, Chao Zhang, Qiang Wang, Ka Chun Cheung, Hongwei Qin, Jifeng Dai, Hongsheng Li

We introduce Optical Flow TransFormer (FlowFormer), a transformer-based neural network architecture for learning optical flow.

Optical Flow Estimation

Searching Parameterized AP Loss for Object Detection

1 code implementation NeurIPS 2021 Chenxin Tao, Zizhang Li, Xizhou Zhu, Gao Huang, Yong liu, Jifeng Dai

In this paper, we propose Parameterized AP Loss, where parameterized functions are introduced to substitute the non-differentiable components in the AP calculation.

Object Detection

Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks

no code implementations2 Dec 2021 Xizhou Zhu, Jinguo Zhu, Hao Li, Xiaoshi Wu, Xiaogang Wang, Hongsheng Li, Xiaohua Wang, Jifeng Dai

The model is pre-trained on several uni-modal and multi-modal tasks, and evaluated on a variety of downstream tasks, including novel tasks that did not appear in the pre-training stage.

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation6 Nov 2021 Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting

1 code implementation ICCV 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

On the contrary, the soft composition operates by stitching different patches into a whole feature map where pixels in overlapping regions are summed up.

Video Inpainting

Influence Selection for Active Learning

1 code implementation ICCV 2021 Zhuoming Liu, Hao Ding, Huaping Zhong, Weijia Li, Jifeng Dai, Conghui He

To obtain the Influence of the unlabeled sample in the active learning scenario, we design the Untrained Unlabeled sample Influence Calculation(UUIC) to estimate the unlabeled sample's expected gradient with which we calculate its Influence.

Active Learning

Collaborative Visual Navigation

1 code implementation2 Jul 2021 Haiyang Wang, Wenguan Wang, Xizhou Zhu, Jifeng Dai, LiWei Wang

As a fundamental problem for Artificial Intelligence, multi-agent system (MAS) is making rapid progress, mainly driven by multi-agent reinforcement learning (MARL) techniques.

Multi-agent Reinforcement Learning Visual Navigation

Scalable Transformers for Neural Machine Translation

no code implementations4 Jun 2021 Peng Gao, Shijie Geng, Yu Qiao, Xiaogang Wang, Jifeng Dai, Hongsheng Li

In this paper, we propose a novel Scalable Transformers, which naturally contains sub-Transformers of different scales and have shared parameters.

Machine Translation Translation

Decoupled Spatial-Temporal Transformer for Video Inpainting

1 code implementation14 Apr 2021 Rui Liu, Hanming Deng, Yangyi Huang, Xiaoyu Shi, Lewei Lu, Wenxiu Sun, Xiaogang Wang, Jifeng Dai, Hongsheng Li

Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

Video Inpainting

AutoLoss-Zero: Searching Loss Functions from Scratch for Generic Tasks

no code implementations25 Mar 2021 Hao Li, Tianwen Fu, Jifeng Dai, Hongsheng Li, Gao Huang, Xizhou Zhu

However, the automatic design of loss functions for generic tasks with various evaluation metrics remains under-investigated.

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

5 code implementations ICCV 2021 Wenguan Wang, Tianfei Zhou, Fisher Yu, Jifeng Dai, Ender Konukoglu, Luc van Gool

Inspired by the recent advance in unsupervised contrastive representation learning, we propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.

Metric Learning Optical Character Recognition +2

Fast Convergence of DETR with Spatially Modulated Co-Attention

2 code implementations19 Jan 2021 Peng Gao, Minghang Zheng, Xiaogang Wang, Jifeng Dai, Hongsheng Li

The recently proposed Detection Transformer (DETR) model successfully applies Transformer to objects detection and achieves comparable performance with two-stage object detection frameworks, such as Faster-RCNN.

Object Detection

Unsupervised Object Detection with LiDAR Clues

no code implementations CVPR 2021 Hao Tian, Yuntao Chen, Jifeng Dai, Zhaoxiang Zhang, Xizhou Zhu

We further identify another major issue, seldom noticed by the community, that the long-tailed and open-ended (sub-)category distribution should be accommodated.

Object Detection

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

1 code implementation ICLR 2021 Hao Li, Chenxin Tao, Xizhou Zhu, Xiaogang Wang, Gao Huang, Jifeng Dai

In this paper, we propose to automate the design of metric-specific loss functions by searching differentiable surrogate losses for each metric.

Semantic Segmentation

Deformable DETR: Deformable Transformers for End-to-End Object Detection

12 code implementations ICLR 2021 Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai

DETR has been recently proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance.

Object Detection

Resolution Adaptive Networks for Efficient Inference

1 code implementation CVPR 2020 Le Yang, Yizeng Han, Xi Chen, Shiji Song, Jifeng Dai, Gao Huang

Adaptive inference is an effective mechanism to achieve a dynamic tradeoff between accuracy and computational cost in deep networks.

Hierarchical Human Parsing with Typed Part-Relation Reasoning

1 code implementation CVPR 2020 Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, Ling Shao

As human bodies are underlying hierarchically structured, how to model human structures is the central theme in this task.

Human Parsing

Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation

2 code implementations ICLR 2020 Hang Gao, Xizhou Zhu, Steve Lin, Jifeng Dai

This is typically done by augmenting static operators with learned free-form sampling grids in the image space, dynamically tuned to the data and task for adapting the receptive field.

Image Classification Object Detection

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

1 code implementation ICCV 2019 Xizhou Zhu, Dazhi Cheng, Zheng Zhang, Stephen Lin, Jifeng Dai

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance.

Deformable ConvNets v2: More Deformable, Better Results

19 code implementations CVPR 2019 Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai

The superior performance of Deformable Convolutional Networks arises from its ability to adapt to the geometric variations of objects.

Instance Segmentation Object Detection +1

Towards High Performance Video Object Detection for Mobiles

3 code implementations16 Apr 2018 Xizhou Zhu, Jifeng Dai, Xingchi Zhu, Yichen Wei, Lu Yuan

In this paper, we present a light weight network architecture for video object detection on mobiles.

Frame Video Object Detection

Learning Region Features for Object Detection

no code implementations ECCV 2018 Jiayuan Gu, Han Hu, Li-Wei Wang, Yichen Wei, Jifeng Dai

While most steps in the modern object detection methods are learnable, the region feature extraction step remains largely hand-crafted, featured by RoI pooling methods.

Object Detection

Relation Networks for Object Detection

6 code implementations CVPR 2018 Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei

Although it is well believed for years that modeling relations between objects would help object recognition, there has not been evidence that the idea is working in the deep learning era.

Object Detection Object Recognition

Flow-Guided Feature Aggregation for Video Object Detection

2 code implementations ICCV 2017 Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei

The accuracy of detection suffers from degenerated object appearances in videos, e. g., motion blur, video defocus, rare poses, etc.

Frame Video Object Detection +1

Deformable Convolutional Networks

37 code implementations ICCV 2017 Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in its building modules.

Object Detection Semantic Segmentation

Deep Feature Flow for Video Recognition

3 code implementations CVPR 2017 Xizhou Zhu, Yuwen Xiong, Jifeng Dai, Lu Yuan, Yichen Wei

Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable.

Frame Video Recognition +1

R-FCN: Object Detection via Region-based Fully Convolutional Networks

45 code implementations NeurIPS 2016 Jifeng Dai, Yi Li, Kaiming He, Jian Sun

In contrast to previous region-based detectors such as Fast/Faster R-CNN that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image.

Real-Time Object Detection Translation

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

no code implementations CVPR 2016 Di Lin, Jifeng Dai, Jiaya Jia, Kaiming He, Jian Sun

Large-scale data is of crucial importance for learning semantic segmentation models, but annotating per-pixel masks is a tedious and inefficient procedure.

Semantic Segmentation

Instance-sensitive Fully Convolutional Networks

no code implementations29 Mar 2016 Jifeng Dai, Kaiming He, Yi Li, Shaoqing Ren, Jian Sun

In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances.

Semantic Segmentation

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

no code implementations ICCV 2015 Jifeng Dai, Kaiming He, Jian Sun

Recent leading approaches to semantic segmentation rely on deep convolutional networks trained with human-annotated, pixel-level segmentation masks.

Semantic Segmentation

Generative Modeling of Convolutional Neural Networks

no code implementations19 Dec 2014 Jifeng Dai, Yang Lu, Ying-Nian Wu

(2) We propose a generative gradient for pre-training CNNs by a non-parametric importance sampling scheme, which is fundamentally different from the commonly used discriminative gradient, and yet has the same computational architecture and cost as the latter.

Convolutional Feature Masking for Joint Object and Stuff Segmentation

1 code implementation CVPR 2015 Jifeng Dai, Kaiming He, Jian Sun

The current leading approaches for semantic segmentation exploit shape information by extracting CNN features from masked image regions.

Semantic Segmentation

Unsupervised Learning of Dictionaries of Hierarchical Compositional Models

no code implementations CVPR 2014 Jifeng Dai, Yi Hong, Wenze Hu, Song-Chun Zhu, Ying Nian Wu

Given a set of unannotated training images, a dictionary of such hierarchical templates are learned so that each training image can be represented by a small number of templates that are spatially translated, rotated and scaled versions of the templates in the learned dictionary.

Domain Adaptation Template Matching

Cannot find the paper you are looking for? You can Submit a new open access paper.