Search Results for author: Zizheng Pan

Found 6 papers, 5 papers with code

Pruning Self-attentions into Convolutional Layers in Single Path

1 code implementation23 Nov 2021 Haoyu He, Jing Liu, Zizheng Pan, Jianfei Cai, Jing Zhang, DaCheng Tao, Bohan Zhuang

Vision Transformers (ViTs) have achieved impressive performance over various computer vision tasks.

Neural Architecture Search

Mesa: A Memory-saving Training Framework for Transformers

1 code implementation22 Nov 2021 Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang

Specifically, Mesa uses exact activations during forward pass while storing a low-precision version of activations to reduce memory consumption during training.


Less is More: Pay Less Attention in Vision Transformers

1 code implementation29 May 2021 Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision.

Image Classification Instance Segmentation +2

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

1 code implementation ICCV 2021 Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton Van Den Hengel, Qi Wu

Vision-and-Language Navigation (VLN) requires an agent to find a path to a remote location on the basis of natural-language instructions and a set of photo-realistic panoramas.

Vision and Language Navigation Vision-Language Navigation

Scalable Vision Transformers with Hierarchical Pooling

1 code implementation ICCV 2021 Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai

However, the routine of the current ViT model is to maintain a full-length patch sequence during inference, which is redundant and lacks hierarchical representation.

Image Classification

Object-and-Action Aware Model for Visual Language Navigation

no code implementations ECCV 2020 Yuankai Qi, Zizheng Pan, Shengping Zhang, Anton Van Den Hengel, Qi Wu

The first is object description (e. g., 'table', 'door'), each presenting as a tip for the agent to determine the next action by finding the item visible in the environment, and the second is action specification (e. g., 'go straight', 'turn left') which allows the robot to directly predict the next movements without relying on visual perceptions.

Vision and Language Navigation

Cannot find the paper you are looking for? You can Submit a new open access paper.