Search Results for author: Renrui Zhang

Found 10 papers, 7 papers with code

POS-BERT: Point Cloud One-Stage BERT Pre-Training

1 code implementation3 Apr 2022 Kexue Fu, Peng Gao, Shaolei Liu, Renrui Zhang, Yu Qiao, Manning Wang

We propose to use the dynamically updated momentum encoder as the tokenizer, which is updated and outputs the dynamic supervision signal along with the training process.

Contrastive Learning Language Modelling +3

MonoDETR: Depth-aware Transformer for Monocular 3D Object Detection

1 code implementation24 Mar 2022 Renrui Zhang, Han Qiu, Tai Wang, Xuanzhuo Xu, Ziyu Guo, Yu Qiao, Peng Gao, Hongsheng Li

In this paper, we introduce a simple framework for Monocular DEtection with depth-aware TRansformer, named MonoDETR.

Autonomous Driving Monocular 3D Object Detection

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

no code implementations9 Feb 2022 Kexue Fu, Peng Gao, Renrui Zhang, Hongsheng Li, Yu Qiao, Manning Wang

Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method.

Contrastive Learning Knowledge Distillation +1

VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts

no code implementations4 Dec 2021 Renrui Zhang, Longtian Qiu, Wei zhang, Ziyao Zeng

Contrastive Vision-Language Pre-training (CLIP) has drown increasing attention recently for its transferable visual representation learning.

Language Modelling Representation Learning +1

PointCLIP: Point Cloud Understanding by CLIP

2 code implementations4 Dec 2021 Renrui Zhang, Ziyu Guo, Wei zhang, Kunchang Li, Xupeng Miao, Bin Cui, Yu Qiao, Peng Gao, Hongsheng Li

On top of that, we design an inter-view adapter to better extract the global feature and adaptively fuse the few-shot knowledge learned from 3D into CLIP pre-trained in 2D.

Few-Shot Learning Transfer Learning

Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling

1 code implementation6 Nov 2021 Renrui Zhang, Rongyao Fang, Wei zhang, Peng Gao, Kunchang Li, Jifeng Dai, Yu Qiao, Hongsheng Li

To further enhance CLIP's few-shot capability, CLIP-Adapter proposed to fine-tune a lightweight residual feature adapter and significantly improves the performance for few-shot classification.

Language Modelling Transfer Learning

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

1 code implementation9 Oct 2021 Peng Gao, Shijie Geng, Renrui Zhang, Teli Ma, Rongyao Fang, Yongfeng Zhang, Hongsheng Li, Yu Qiao

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning.

Representation Learning

Dual-stream Network for Visual Recognition

no code implementations NeurIPS 2021 Mingyuan Mao, Renrui Zhang, Honghui Zheng, Peng Gao, Teli Ma, Yan Peng, Errui Ding, Baochang Zhang, Shumin Han

Transformers with remarkable global representation capacities achieve competitive results for visual tasks, but fail to consider high-level local pattern information in input images.

Image Classification Instance Segmentation +2

End-to-End Object Detection with Adaptive Clustering Transformer

1 code implementation18 Nov 2020 Minghang Zheng, Peng Gao, Renrui Zhang, Kunchang Li, Xiaogang Wang, Hongsheng Li, Hao Dong

In this paper, a novel variant of transformer named Adaptive Clustering Transformer(ACT) has been proposed to reduce the computation cost for high-resolution input.

Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.