Search Results for author: Jianyuan Guo

Found 36 papers, 28 papers with code

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

1 code implementation3 Jun 2024 Ding Jia, Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Chang Xu, Xinghao Chen

Cross-modal transformers have demonstrated superiority in various vision tasks by effectively integrating different modalities.

 Ranked #1 on Semantic Segmentation on NYU Depth v2 (using extra training data)

3D Object Detection Image-to-Image Translation +2

SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

1 code implementation27 Feb 2024 Chengcheng Wang, Zhiwei Hao, Yehui Tang, Jianyuan Guo, Yujie Yang, Kai Han, Yunhe Wang

In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference.

Image Super-Resolution

Data-efficient Large Vision Models through Sequential Autoregression

1 code implementation7 Feb 2024 Jianyuan Guo, Zhiwei Hao, Chengcheng Wang, Yehui Tang, Han Wu, Han Hu, Kai Han, Chang Xu

Training general-purpose vision models on purely sequential visual data, eschewing linguistic inputs, has heralded a new frontier in visual understanding.

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

1 code implementation6 Feb 2024 Jianyuan Guo, Hanting Chen, Chengcheng Wang, Kai Han, Chang Xu, Yunhe Wang

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called superalignment.

Few-Shot Learning Knowledge Distillation +1

A Survey on Transformer Compression

no code implementations5 Feb 2024 Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhijun Tu, Kai Han, Hailin Hu, DaCheng Tao

Model compression methods reduce the memory and computational cost of Transformer, which is a necessary step to implement large language/vision models on practical devices.

Knowledge Distillation Model Compression +1

PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation

no code implementations27 Dec 2023 Yunhe Wang, Hanting Chen, Yehui Tang, Tianyu Guo, Kai Han, Ying Nie, Xutao Wang, Hailin Hu, Zheyuan Bai, Yun Wang, Fangcheng Liu, Zhicheng Liu, Jianyuan Guo, Sinan Zeng, Yinchen Zhang, Qinghua Xu, Qun Liu, Jun Yao, Chao Xu, DaCheng Tao

We then demonstrate that the proposed approach is significantly effective for enhancing the model nonlinearity through carefully designed ablations; thus, we present a new efficient model architecture for establishing modern, namely, PanGu-$\pi$.

Language Modelling

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

1 code implementation NeurIPS 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Yehui Tang, Han Hu, Yunhe Wang, Chang Xu

To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.

Knowledge Distillation

ParameterNet: Parameters Are All You Need

no code implementations26 Jun 2023 Kai Han, Yunhe Wang, Jianyuan Guo, Enhua Wu

In the language domain, LLaMA-1B enhanced with ParameterNet achieves 2\% higher accuracy over vanilla LLaMA.

VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale

1 code implementation25 May 2023 Zhiwei Hao, Jianyuan Guo, Kai Han, Han Hu, Chang Xu, Yunhe Wang

The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results.

Data Augmentation Knowledge Distillation

VanillaNet: the Power of Minimalism in Deep Learning

4 code implementations NeurIPS 2023 Hanting Chen, Yunhe Wang, Jianyuan Guo, DaCheng Tao

In this study, we introduce VanillaNet, a neural network architecture that embraces elegance in design.


Masked Image Modeling with Local Multi-Scale Reconstruction

1 code implementation CVPR 2023 Haoqing Wang, Yehui Tang, Yunhe Wang, Jianyuan Guo, Zhi-Hong Deng, Kai Han

The lower layers are not explicitly guided and the interaction among their patches is only used for calculating new activations.

Representation Learning

FastMIM: Expediting Masked Image Modeling Pre-training for Vision

1 code implementation13 Dec 2022 Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Yunhe Wang, Chang Xu

This paper presents FastMIM, a simple and generic framework for expediting masked image modeling with the following two steps: (i) pre-training vision backbones with low-resolution input images; and (ii) reconstructing Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.

GhostNetV2: Enhance Cheap Operation with Long-Range Attention

11 code implementations23 Nov 2022 Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Chao Xu, Yunhe Wang

The convolutional operation can only capture local information in a window region, which prevents performance from being further improved.

Hierarchical Relational Learning for Few-Shot Knowledge Graph Completion

no code implementations2 Sep 2022 Han Wu, Jie Yin, Bala Rajaratnam, Jianyuan Guo

By jointly capturing three levels of relational information (entity-level, triplet-level and context-level), HiRe can effectively learn and refine the meta representation of few-shot relations, and consequently generalize very well to new unseen relations.

Relational Reasoning

Vision GNN: An Image is Worth Graph of Nodes

9 code implementations1 Jun 2022 Kai Han, Yunhe Wang, Jianyuan Guo, Yehui Tang, Enhua Wu

In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks.

Image Classification Object Detection

Brain-inspired Multilayer Perceptron with Spiking Neurons

4 code implementations CVPR 2022 Wenshuo Li, Hanting Chen, Jianyuan Guo, Ziyang Zhang, Yunhe Wang

However, due to the simplicity of their structures, the performance highly depends on the local features communication machenism.

Inductive Bias

GhostNets on Heterogeneous Devices via Cheap Operations

8 code implementations10 Jan 2022 Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chunjing Xu, Enhua Wu, Qi Tian

The proposed C-Ghost module can be taken as a plug-and-play component to upgrade existing convolutional neural networks.

PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture

1 code implementation4 Jan 2022 Kai Han, Jianyuan Guo, Yehui Tang, Yunhe Wang

We hope this new baseline will be helpful to the further research and application of vision transformer.

An Image Patch is a Wave: Phase-Aware Vision MLP

8 code implementations CVPR 2022 Yehui Tang, Kai Han, Jianyuan Guo, Chang Xu, Yanxi Li, Chao Xu, Yunhe Wang

To dynamically aggregate tokens, we propose to represent each token as a wave function with two parts, amplitude and phase.

Image Classification object-detection +2

Hire-MLP: Vision MLP via Hierarchical Rearrangement

10 code implementations CVPR 2022 Jianyuan Guo, Yehui Tang, Kai Han, Xinghao Chen, Han Wu, Chao Xu, Chang Xu, Yunhe Wang

Previous vision MLPs such as MLP-Mixer and ResMLP accept linearly flattened image patches as input, making them inflexible for different input sizes and hard to capture spatial information.

Image Classification object-detection +2

CMT: Convolutional Neural Networks Meet Vision Transformers

14 code implementations CVPR 2022 Jianyuan Guo, Kai Han, Han Wu, Yehui Tang, Xinghao Chen, Yunhe Wang, Chang Xu

Vision transformers have been successfully applied to image recognition tasks due to their ability to capture long-range dependencies within an image.

Positive-Unlabeled Data Purification in the Wild for Object Detection

no code implementations CVPR 2021 Jianyuan Guo, Kai Han, Han Wu, Chao Zhang, Xinghao Chen, Chunjing Xu, Chang Xu, Yunhe Wang

In this paper, we present a positive-unlabeled learning based scheme to expand training data by purifying valuable images from massive unlabeled ones, where the original training data are viewed as positive data and the unlabeled images in the wild are unlabeled data.

Knowledge Distillation object-detection +1

Patch Slimming for Efficient Vision Transformers

no code implementations CVPR 2022 Yehui Tang, Kai Han, Yunhe Wang, Chang Xu, Jianyuan Guo, Chao Xu, DaCheng Tao

We first identify the effective patches in the last layer and then use them to guide the patch selection process of previous layers.

Efficient ViTs

Distilling Object Detectors via Decoupled Features

1 code implementation CVPR 2021 Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, Chang Xu

To this end, we present a novel distillation algorithm via decoupled features (DeFeat) for learning a better student detector.

Image Classification Knowledge Distillation +3

Transformer in Transformer

12 code implementations NeurIPS 2021 Kai Han, An Xiao, Enhua Wu, Jianyuan Guo, Chunjing Xu, Yunhe Wang

In this paper, we point out that the attention inside these local patches are also essential for building visual transformers with high performance and we explore a new architecture, namely, Transformer iN Transformer (TNT).

Fine-Grained Image Classification Sentence

A Survey on Visual Transformer

no code implementations23 Dec 2020 Kai Han, Yunhe Wang, Hanting Chen, Xinghao Chen, Jianyuan Guo, Zhenhua Liu, Yehui Tang, An Xiao, Chunjing Xu, Yixing Xu, Zhaohui Yang, Yiman Zhang, DaCheng Tao

Transformer, first applied to the field of natural language processing, is a type of deep neural network mainly based on the self-attention mechanism.

Image Classification Inductive Bias

HourNAS: Extremely Fast Neural Architecture Search Through an Hourglass Lens

6 code implementations CVPR 2021 Zhaohui Yang, Yunhe Wang, Xinghao Chen, Jianyuan Guo, Wei zhang, Chao Xu, Chunjing Xu, DaCheng Tao, Chang Xu

To achieve an extremely fast NAS while preserving the high accuracy, we propose to identify the vital blocks and make them the priority in the architecture search.

Neural Architecture Search

Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection

1 code implementation CVPR 2020 Jianyuan Guo, Kai Han, Yunhe Wang, Chao Zhang, Zhaohui Yang, Han Wu, Xinghao Chen, Chang Xu

To this end, we propose a hierarchical trinity search framework to simultaneously discover efficient architectures for all components (i. e. backbone, neck, and head) of object detector in an end-to-end manner.

Image Classification Neural Architecture Search +3

GhostNet: More Features from Cheap Operations

32 code implementations CVPR 2020 Kai Han, Yunhe Wang, Qi Tian, Jianyuan Guo, Chunjing Xu, Chang Xu

Deploying convolutional neural networks (CNNs) on embedded devices is difficult due to the limited memory and computation resources.

Image Classification

Beyond Human Parts: Dual Part-Aligned Representations for Person Re-Identification

1 code implementation ICCV 2019 Jianyuan Guo, Yuhui Yuan, Lang Huang, Chao Zhang, Jinge Yao, Kai Han

On the other hand, there still exist many useful contextual cues that do not fall into the scope of predefined human parts or attributes.

Human Parsing Person Re-Identification

Attribute-Aware Attention Model for Fine-grained Representation Learning

1 code implementation2 Jan 2019 Kai Han, Jianyuan Guo, Chao Zhang, Mingjian Zhu

Based on the considerations above, we propose a novel Attribute-Aware Attention Model ($A^3M$), which can learn local attribute representation and global category representation simultaneously in an end-to-end manner.

Attribute Fine-Grained Image Classification +4

OCNet: Object Context Network for Scene Parsing

8 code implementations4 Sep 2018 Yuhui Yuan, Lang Huang, Jianyuan Guo, Chao Zhang, Xilin Chen, Jingdong Wang

To capture richer context information, we further combine our interlaced sparse self-attention scheme with the conventional multi-scale context schemes including pyramid pooling~\citep{zhao2017pyramid} and atrous spatial pyramid pooling~\citep{chen2018deeplab}.

Object Relation +2

Cannot find the paper you are looking for? You can Submit a new open access paper.