Search Results for author: Zhuliang Yao

Found 10 papers, 5 papers with code

iCLIP: Bridging Image Classification and Contrastive Language-Image Pre-Training for Visual Recognition

no code implementations CVPR 2023 Yixuan Wei, Yue Cao, Zheng Zhang, Houwen Peng, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

This paper presents a method that effectively combines two prevalent visual recognition methods, i. e., image classification and contrastive language-image pre-training, dubbed iCLIP.

Classification Image Classification +2

iCAR: Bridging Image Classification and Image-text Alignment for Visual Recognition

no code implementations22 Apr 2022 Yixuan Wei, Yue Cao, Zheng Zhang, Zhuliang Yao, Zhenda Xie, Han Hu, Baining Guo

Second, we convert the image classification problem from learning parametric category classifier weights to learning a text encoder as a meta network to generate category classifier weights.

Action Recognition Classification +7

SimMIM: A Simple Framework for Masked Image Modeling

4 code implementations CVPR 2022 Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai, Han Hu

We also leverage this approach to facilitate the training of a 3B model (SwinV2-G), that by $40\times$ less data than that in previous practice, we achieve the state-of-the-art on four representative vision benchmarks.

Representation Learning Self-Supervised Image Classification +1

Swin Transformer V2: Scaling Up Capacity and Resolution

19 code implementations CVPR 2022 Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Ranked #4 on Image Classification on ImageNet V2 (using extra training data)

Action Classification Image Classification +3

Leveraging Batch Normalization for Vision Transformers

no code implementations ICCVW 2021 Zhuliang Yao, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang, Han Hu

Transformer-based vision architectures have attracted great attention because of the strong performance over the convolutional neural networks (CNNs).

Disentangled Non-Local Neural Networks

5 code implementations ECCV 2020 Minghao Yin, Zhuliang Yao, Yue Cao, Xiu Li, Zheng Zhang, Stephen Lin, Han Hu

This paper first studies the non-local block in depth, where we find that its attention computation can be split into two terms, a whitened pairwise term accounting for the relationship between two pixels and a unary term representing the saliency of every pixel.

Ranked #20 on Semantic Segmentation on Cityscapes test (using extra training data)

Action Recognition object-detection +2

Cross-Iteration Batch Normalization

2 code implementations CVPR 2021 Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin

We thus compensate for the network weight changes via a proposed technique based on Taylor polynomials, so that the statistics can be accurately estimated and batch normalization can be effectively applied.

Image Classification object-detection +1

Balanced Sparsity for Efficient DNN Inference on GPU

no code implementations1 Nov 2018 Zhuliang Yao, Shijie Cao, Wencong Xiao, Chen Zhang, Lanshun Nie

However, it requires the customization of hardwares to speed up practical inference.

Cannot find the paper you are looking for? You can Submit a new open access paper.