Search Results for author: Geng Yuan

Found 44 papers, 13 papers with code

EfficientFormer: Vision Transformers at MobileNet Speed

10 code implementations • 2 Jun 2022 • Yanyu Li, Geng Yuan, Yang Wen, Ju Hu, Georgios Evangelidis, Sergey Tulyakov, Yanzhi Wang, Jian Ren

Our work proves that properly designed transformers can reach extremely low latency on mobile devices while maintaining high performance.

29,680

Paper
Code

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

3 code implementations • 12 Sep 2020 • Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, Yanzhi Wang

In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design.

Computational Efficiency Object +2

363

Paper
Code

Teachers Do More Than Teach: Compressing Image-to-Image Models

1 code implementation • CVPR 2021 • Qing Jin, Jian Ren, Oliver J. Woodford, Jiazhuo Wang, Geng Yuan, Yanzhi Wang, Sergey Tulyakov

In this work, we aim to address these issues by introducing a teacher network that provides a search space in which efficient network architectures can be found, in addition to performing knowledge distillation.

Knowledge Distillation

175

Paper
Code

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?

2 code implementations • NeurIPS 2021 • Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, Yanzhi Wang

Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.

139

Paper
Code

SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

1 code implementation • 27 Dec 2021 • Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Ranked #4 on Efficient ViTs on ImageNet-1K (with DeiT-S)

Efficient ViTs Model Compression

Paper
Code

SparCL: Sparse Continual Learning on the Edge

1 code implementation • 20 Sep 2022 • Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

Continual Learning

Paper
Code

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

1 code implementation • NeurIPS 2021 • Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.

Paper
Code

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation • 19 Nov 2022 • Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Paper
Code

Pruning-as-Search: Efficient Neural Architecture Search via Channel Pruning and Structural Reparameterization

1 code implementation • 2 Jun 2022 • Yanyu Li, Pu Zhao, Geng Yuan, Xue Lin, Yanzhi Wang, Xin Chen

By combining the structural reparameterization and PaS, we successfully searched out a new family of VGG-like and lightweight networks, which enable the flexibility of arbitrary width with respect to each layer instead of each stage.

Instance Segmentation Network Pruning +2

Paper
Code

Layer Freezing & Data Sieving: Missing Pieces of a Generic Framework for Sparse Training

1 code implementation • 22 Sep 2022 • Geng Yuan, Yanyu Li, Sheng Li, Zhenglun Kong, Sergey Tulyakov, Xulong Tang, Yanzhi Wang, Jian Ren

Therefore, we analyze the feasibility and potentiality of using the layer freezing technique in sparse training and find it has the potential to save considerable training costs.

Paper
Code

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

1 code implementation • 22 Nov 2022 • Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang

In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures.

Paper
Code

Data Level Lottery Ticket Hypothesis for Vision Transformers

1 code implementation • 2 Nov 2022 • Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.

Analogical Similarity Informativeness

Paper
Code

MTS-LOF: Medical Time-Series Representation Learning via Occlusion-Invariant Features

1 code implementation • 19 Oct 2023 • Huayu Li, Ana S. Carreon-Rascon, Xiwen Chen, Geng Yuan, Ao Li

MTS-LOF leverages the strengths of contrastive learning and Masked Autoencoder (MAE) methods, offering a unique approach to representation learning for medical time series data.

Contrastive Learning Representation Learning +2

Paper
Code

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

no code implementations • 28 Mar 2018 • Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang

For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10.

Paper
Add Code

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

no code implementations • 18 Feb 2018 • Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan, Siyu Liao, Xiaolong Ma, Bo Yuan, Xuehai Qian, Jian Tang, Qinru Qiu, Xue Lin

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia.

Paper
Add Code

An Area and Energy Efficient Design of Domain-Wall Memory-Based Deep Convolutional Neural Networks using Stochastic Computing

no code implementations • 3 Feb 2018 • Xiaolong Ma, Yi-Peng Zhang, Geng Yuan, Ao Ren, Zhe Li, Jie Han, Jingtong Hu, Yanzhi Wang

However, in these works, the memory design optimization is neglected for weight storage, which will inevitably result in large hardware cost.

Paper
Add Code

CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-CirculantWeight Matrices

no code implementations • 29 Aug 2017 • Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yi-Peng Zhang, Jian Tang, Qinru Qiu, Xue Lin, Bo Yuan

As the size of DNNs continues to grow, it is critical to improve the energy efficiency and performance while maintaining accuracy.

Paper
Add Code

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

no code implementations • 30 Apr 2019 • Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang

The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources.

Paper
Add Code

Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM

no code implementations • 2 May 2019 • Sheng Lin, Xiaolong Ma, Shaokai Ye, Geng Yuan, Kaisheng Ma, Yanzhi Wang

Weight quantization is one of the most important techniques of Deep Neural Networks (DNNs) model compression method.

Model Compression Quantization

Paper
Add Code

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

no code implementations • 3 Jul 2019 • Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency.

Model Compression Quantization

Paper
Add Code

Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation

no code implementations • 27 Aug 2019 • Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, Yanzhi Wang

To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications.

Model Compression Quantization

Paper
Add Code

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

no code implementations • 29 Aug 2019 • Geng Yuan, Xiaolong Ma, Caiwen Ding, Sheng Lin, Tianyun Zhang, Zeinab S. Jalali, Yilong Zhao, Li Jiang, Sucheta Soundarajan, Yanzhi Wang

Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model.

Quantization

Paper
Add Code

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

no code implementations • 24 Nov 2019 • Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Caiwen Ding

Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices.

Model Compression Quantization

Paper
Add Code

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

no code implementations • 23 Jan 2020 • Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang

Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations.

Model Compression

Paper
Add Code

Real-Time Execution of Large-scale Language Models on Mobile

no code implementations • 15 Sep 2020 • Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices, thus achieving real-time execution of large transformer-based models like BERT variants.

Edge-computing

Paper
Add Code

ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning

no code implementations • 20 Nov 2020 • Chengming Zhang, Geng Yuan, Wei Niu, Jiannan Tian, Sian Jin, Donglin Zhuang, Zhe Jiang, Yanzhi Wang, Bin Ren, Shuaiwen Leon Song, Dingwen Tao

Moreover, compared with the state-of-the-art pruning-during-training approach, ClickTrain provides significant improvements both accuracy and compression ratio on the tested CNN models and datasets, under similar limited training time.

Paper
Add Code

NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration

no code implementations • CVPR 2021 • Zhengang Li, Geng Yuan, Wei Niu, Pu Zhao, Yanyu Li, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Zhiyu Chen, Sijia Liu, Kaiyuan Yang, Bin Ren, Yanzhi Wang, Xue Lin

With the increasing demand to efficiently deploy DNNs on mobile edge devices, it becomes much more important to reduce unnecessary computation and increase the execution speed.

Bayesian Optimization Code Generation +2

Paper
Add Code

Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device

no code implementations • 26 Dec 2020 • Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Hsin-Hsuan Sung, Sijia Liu, Xipeng Shen, Bin Ren, Yanzhi Wang, Xue Lin

3D object detection is an important task, especially in the autonomous driving application domain.

3D Object Detection Autonomous Driving +5

Paper
Add Code

Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?

no code implementations • 19 Feb 2021 • Ning Liu, Geng Yuan, Zhengping Che, Xuan Shen, Xiaolong Ma, Qing Jin, Jian Ren, Jian Tang, Sijia Liu, Yanzhi Wang

In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i. e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network.

Model Compression

Paper
Add Code

A Compression-Compilation Framework for On-mobile Real-time BERT Applications

no code implementations • 30 May 2021 • Wei Niu, Zhenglun Kong, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang

In this paper, we propose a compression-compilation co-design framework that can guarantee the identified model to meet both resource and real-time specifications of mobile devices.

Question Answering Text Generation

Paper
Add Code

Improving DNN Fault Tolerance using Weight Pruning and Differential Crossbar Mapping for ReRAM-based Edge AI

no code implementations • 16 Jun 2021 • Geng Yuan, Zhiheng Liao, Xiaolong Ma, Yuxuan Cai, Zhenglun Kong, Xuan Shen, Jingyan Fu, Zhengang Li, Chengming Zhang, Hongwu Peng, Ning Liu, Ao Ren, Jinhui Wang, Yanzhi Wang

More importantly, our method does not require extra hardware cost compared to the traditional two-column mapping scheme.

Paper
Add Code

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

no code implementations • 16 Jun 2021 • Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding

With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines.

Paper
Add Code

Achieving Real-Time Object Detection on MobileDevices with Neural Pruning Search

no code implementations • 28 Jun 2021 • Pu Zhao, Wei Niu, Geng Yuan, Yuxuan Cai, Bin Ren, Yanzhi Wang, Xue Lin

Object detection plays an important role in self-driving cars for security development.

3D Object Detection Compiler Optimization +4

Paper
Add Code

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

no code implementations • 6 Jun 2021 • Xuan Shen, Geng Yuan, Wei Niu, Xiaolong Ma, Jiexiong Guan, Zhengang Li, Bin Ren, Yanzhi Wang

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms.

Autonomous Driving Multi-Person Pose Estimation

Paper
Add Code

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

no code implementations • ICCV 2021 • Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices.

Image Super-Resolution Neural Architecture Search +1

Paper
Add Code

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

no code implementations • 22 Nov 2021 • Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.

Model Compression

Paper
Add Code

Real-Time Portrait Stylization on the Edge

no code implementations • 2 Jun 2022 • Yanyu Li, Xuan Shen, Geng Yuan, Jiexiong Guan, Wei Niu, Hao Tang, Bin Ren, Yanzhi Wang

In this work we demonstrate real-time portrait stylization, specifically, translating self-portrait into cartoon or anime style on mobile devices.

Paper
Add Code

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

no code implementations • 10 Aug 2022 • Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.

Quantization

Paper
Add Code

Towards Artificial General Intelligence (AGI) in the Internet of Things (IoT): Opportunities and Challenges

no code implementations • 14 Sep 2023 • Fei Dou, Jin Ye, Geng Yuan, Qin Lu, Wei Niu, Haijian Sun, Le Guan, Guoyu Lu, Gengchen Mai, Ninghao Liu, Jin Lu, Zhengliang Liu, Zihao Wu, Chenjiao Tan, Shaochen Xu, Xianqiao Wang, Guoming Li, Lilong Chai, Sheng Li, Jin Sun, Hongyue Sun, Yunli Shao, Changying Li, Tianming Liu, WenZhan Song

Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and execute tasks with human cognitive abilities, engenders significant anticipation and intrigue across scientific, commercial, and societal arenas.

Decision Making

Paper
Add Code

SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices

no code implementations • 21 Sep 2023 • Zhengang Li, Geng Yuan, Tomoharu Yamauchi, Zabihi Masoud, Yanyue Xie, Peiyan Dong, Xulong Tang, Nobuyuki Yoshikawa, Devesh Tiwari, Yanzhi Wang, Olivia Chen

Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation.

Paper
Add Code

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

no code implementations • 22 Jan 2024 • Bingbing Li, Geng Yuan, Zigeng Wang, Shaoyi Huang, Hongwu Peng, Payman Behnam, Wujie Wen, Hang Liu, Caiwen Ding

Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication.

Paper
Add Code

SmartFRZ: An Efficient Training Framework using Attention-Based Layer Freezing

no code implementations • 30 Jan 2024 • Sheng Li, Geng Yuan, Yue Dai, Youtao Zhang, Yanzhi Wang, Xulong Tang

Therefore, there lacks a generic and smart layer freezing method that can automatically perform ``in-situation'' layer freezing for different networks during training processes.

Paper
Add Code

EdgeOL: Efficient in-situ Online Learning on Edge Devices

no code implementations • 30 Jan 2024 • Sheng Li, Geng Yuan, Yawen Wu, Yue Dai, Chao Wu, Alex K. Jones, Jingtong Hu, Yanzhi Wang, Xulong Tang

Emerging applications, such as robot-assisted eldercare and object recognition, generally employ deep learning neural networks (DNNs) and naturally require: i) handling streaming-in inference requests and ii) adapting to possible deployment scenario changes.

Object Recognition

Paper
Add Code

Efficient Pruning of Large Language Model with Adaptive Estimation Fusion

no code implementations • 16 Mar 2024 • Jun Liu, Chao Wu, Changdi Yang, Hao Tang, Haoye Dong, Zhenglun Kong, Geng Yuan, Wei Niu, Dong Huang, Yanzhi Wang

Large language models (LLMs) have become crucial for many generative downstream tasks, leading to an inevitable trend and significant challenge to deploy them efficiently on resource-constrained devices.

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.