Search Results for author: Xiaolong Ma

Found 45 papers, 11 papers with code

Rethinking Graph Lottery Tickets: Graph Sparsity Matters

no code implementations3 May 2023 Bo Hui, Da Yan, Xiaolong Ma, Wei-Shinn Ku

Therefore, we propose two techniques to improve GNN performance when the graph sparsity is high.

Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

1 code implementation CVPR 2023 Gen Li, Jie Ji, Minghai Qin, Wei Niu, Bin Ren, Fatemeh Afghah, Linke Guo, Xiaolong Ma

To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum.

Video Super-Resolution

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation19 Nov 2022 Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Data Level Lottery Ticket Hypothesis for Vision Transformers

1 code implementation2 Nov 2022 Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang

That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.

Analogical Similarity Informativeness

CHEX: CHannel EXploration for CNN Model Compression

1 code implementation CVPR 2022 Zejiang Hou, Minghai Qin, Fei Sun, Xiaolong Ma, Kun Yuan, Yi Xu, Yen-Kuang Chen, Rong Jin, Yuan Xie, Sun-Yuan Kung

However, conventional pruning methods have limitations in that: they are restricted to pruning process only, and they require a fully pre-trained large model.

Image Classification Instance Segmentation +4

Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets

1 code implementation9 Feb 2022 Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., winning tickets) that can be trained in isolation to match full accuracy.

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

no code implementations17 Jan 2022 Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, Yanzhi Wang

To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate.


SPViT: Enabling Faster Vision Transformers via Soft Token Pruning

1 code implementation27 Dec 2021 Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Moreover, our framework can guarantee the identified model to meet resource specifications of mobile devices and FPGA, and even achieve the real-time execution of DeiT-T on mobile platforms.

Image Classification Model Compression

Load-balanced Gather-scatter Patterns for Sparse Deep Neural Networks

no code implementations20 Dec 2021 Fei Sun, Minghai Qin, Tianyun Zhang, Xiaolong Ma, Haoran Li, Junwen Luo, Zihao Zhao, Yen-Kuang Chen, Yuan Xie

Our experiments show that GS patterns consistently make better trade-offs between accuracy and computation efficiency compared to conventional structured sparse patterns.

Machine Translation speech-recognition +1

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

1 code implementation NeurIPS 2021 Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.

HFSP: A Hardware-friendly Soft Pruning Framework for Vision Transformers

no code implementations29 Sep 2021 Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang

Recently, Vision Transformer (ViT) has continuously established new milestones in the computer vision field, while the high computation and memory cost makes its propagation in industrial production difficult.

Image Classification Model Compression

Lottery Tickets can have Structural Sparsity

no code implementations29 Sep 2021 Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wang, Zhangyang Wang

The lottery ticket hypothesis (LTH) has shown that dense models contain highly sparse subnetworks (i. e., $\textit{winning tickets}$) that can be trained in isolation to match full accuracy.

GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity

no code implementations25 Aug 2021 Wei Niu, Zhengang Li, Xiaolong Ma, Peiyan Dong, Gang Zhou, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

It necessitates the sparse model inference via weight pruning, i. e., DNN weight sparsity, and it is desirable to design a new DNN weight sparsity scheme that can facilitate real-time inference on mobile devices while preserving a high sparse model accuracy.

Code Generation Compiler Optimization

A Topological-Framework to Improve Analysis of Machine Learning Model Performance

no code implementations9 Jul 2021 Henry Kvinge, Colby Wight, Sarah Akers, Scott Howland, Woongjo Choi, Xiaolong Ma, Luke Gosink, Elizabeth Jurrus, Keerti Kappagantula, Tegan H. Emerson

As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic.

BIG-bench Machine Learning

Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot?

2 code implementations NeurIPS 2021 Xiaolong Ma, Geng Yuan, Xuan Shen, Tianlong Chen, Xuxi Chen, Xiaohan Chen, Ning Liu, Minghai Qin, Sijia Liu, Zhangyang Wang, Yanzhi Wang

Based on our analysis, we summarize a guideline for parameter settings in regards of specific architecture characteristics, which we hope to catalyze the research progress on the topic of lottery ticket hypothesis.

Effective Model Sparsification by Scheduled Grow-and-Prune Methods

1 code implementation ICLR 2022 Xiaolong Ma, Minghai Qin, Fei Sun, Zejiang Hou, Kun Yuan, Yi Xu, Yanzhi Wang, Yen-Kuang Chen, Rong Jin, Yuan Xie

It addresses the shortcomings of the previous works by repeatedly growing a subset of layers to dense and then pruning them back to sparse after some training.

Image Classification

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

no code implementations16 Jun 2021 Geng Yuan, Payman Behnam, Zhengang Li, Ali Shafiee, Sheng Lin, Xiaolong Ma, Hang Liu, Xuehai Qian, Mahdi Nazm Bojnordi, Yanzhi Wang, Caiwen Ding

With weights stored in the ReRAM crossbar cells as conductance, when the input vector is applied to word lines, the matrix-vector multiplication results can be generated as the current in bit lines.

Towards Fast and Accurate Multi-Person Pose Estimation on Mobile Devices

no code implementations6 Jun 2021 Xuan Shen, Geng Yuan, Wei Niu, Xiaolong Ma, Jiexiong Guan, Zhengang Li, Bin Ren, Yanzhi Wang

The rapid development of autonomous driving, abnormal behavior detection, and behavior recognition makes an increasing demand for multi-person pose estimation-based applications, especially on mobile platforms.

Autonomous Driving Multi-Person Pose Estimation

Lottery Ticket Preserves Weight Correlation: Is It Desirable or Not?

no code implementations19 Feb 2021 Ning Liu, Geng Yuan, Zhengping Che, Xuan Shen, Xiaolong Ma, Qing Jin, Jian Ren, Jian Tang, Sijia Liu, Yanzhi Wang

In deep model compression, the recent finding "Lottery Ticket Hypothesis" (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i. e., a properly pruned sub-network together with original weight initialization) that can achieve competitive performance than the original dense network.

Model Compression

A Unified DNN Weight Compression Framework Using Reweighted Optimization Methods

no code implementations12 Apr 2020 Tianyun Zhang, Xiaolong Ma, Zheng Zhan, Shanglin Zhou, Minghai Qin, Fei Sun, Yen-Kuang Chen, Caiwen Ding, Makan Fardad, Yanzhi Wang

To address the large model size and intensive computation requirement of deep neural networks (DNNs), weight pruning techniques have been proposed and generally fall into two categories, i. e., static regularization-based pruning and dynamic regularization-based pruning.

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

no code implementations13 Mar 2020 Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.

Model Compression Privacy Preserving

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

no code implementations23 Jan 2020 Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang

Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations.

Model Compression

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

no code implementations ECCV 2020 Xiaolong Ma, Wei Niu, Tianyun Zhang, Sijia Liu, Sheng Lin, Hongjia Li, Xiang Chen, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms.

Code Generation Compiler Optimization

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

no code implementations1 Jan 2020 Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, Bin Ren

Weight pruning of DNNs is proposed, but existing schemes represent two extremes in the design space: non-structured pruning is fine-grained, accurate, but not hardware friendly; structured pruning is coarse-grained, hardware-efficient, but with higher accuracy loss.

Code Generation Model Compression

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

no code implementations24 Nov 2019 Geng Yuan, Xiaolong Ma, Sheng Lin, Zhengang Li, Caiwen Ding

Thus, the footprint and power consumption of SOT-MRAM PIM can be reduced, while increasing the overall system throughput at the meantime, making our proposed ADMM-based SOT-MRAM PIM more energy efficiency and suitable for embedded systems or IoT devices.

Model Compression Quantization

PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices

no code implementations6 Sep 2019 Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, Yanzhi Wang

Model compression techniques on Deep Neural Network (DNN) have been widely acknowledged as an effective way to achieve acceleration on a variety of platforms, and DNN weight pruning is a straightforward and effective method.

Model Compression

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

no code implementations29 Aug 2019 Geng Yuan, Xiaolong Ma, Caiwen Ding, Sheng Lin, Tianyun Zhang, Zeinab S. Jalali, Yilong Zhao, Li Jiang, Sucheta Soundarajan, Yanzhi Wang

Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model.


Tiny but Accurate: A Pruned, Quantized and Optimized Memristor Crossbar Framework for Ultra Efficient DNN Implementation

no code implementations27 Aug 2019 Xiaolong Ma, Geng Yuan, Sheng Lin, Caiwen Ding, Fuxun Yu, Tao Liu, Wujie Wen, Xiang Chen, Yanzhi Wang

To mitigate the challenges, the memristor crossbar array has emerged as an intrinsically suitable matrix computation and low-power acceleration framework for DNN applications.

Model Compression Quantization

AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates

no code implementations6 Jul 2019 Ning Liu, Xiaolong Ma, Zhiyuan Xu, Yanzhi Wang, Jian Tang, Jieping Ye

This work proposes AutoCompress, an automatic structured pruning framework with the following key performance improvements: (i) effectively incorporate the combination of structured pruning schemes in the automatic process; (ii) adopt the state-of-art ADMM-based structured weight pruning as the core algorithm, and propose an innovative additional purification step for further weight reduction without accuracy loss; and (iii) develop effective heuristic search method enhanced by experience-based guided search, replacing the prior deep reinforcement learning technique which has underlying incompatibility with the target pruning problem.

Model Compression

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

no code implementations3 Jul 2019 Xiaolong Ma, Sheng Lin, Shaokai Ye, Zhezhi He, Linfeng Zhang, Geng Yuan, Sia Huat Tan, Zhengang Li, Deliang Fan, Xuehai Qian, Xue Lin, Kaisheng Ma, Yanzhi Wang

Based on the proposed comparison framework, with the same accuracy and quantization, the results show that non-structrued pruning is not competitive in terms of both storage and computation efficiency.

Model Compression Quantization

26ms Inference Time for ResNet-50: Towards Real-Time Execution of all DNNs on Smartphone

no code implementations2 May 2019 Wei Niu, Xiaolong Ma, Yanzhi Wang, Bin Ren

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem.

Model Compression

Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM

no code implementations2 May 2019 Sheng Lin, Xiaolong Ma, Shaokai Ye, Geng Yuan, Kaisheng Ma, Yanzhi Wang

Weight quantization is one of the most important techniques of Deep Neural Networks (DNNs) model compression method.

Model Compression Quantization

ResNet Can Be Pruned 60x: Introducing Network Purification and Unused Path Removal (P-RM) after Weight Pruning

no code implementations30 Apr 2019 Xiaolong Ma, Geng Yuan, Sheng Lin, Zhengang Li, Hao Sun, Yanzhi Wang

The state-of-art DNN structures involve high computation and great demand for memory storage which pose intensive challenge on DNN framework resources.

Progressive DNN Compression: A Key to Achieve Ultra-High Weight Pruning and Quantization Rates using ADMM

2 code implementations23 Mar 2019 Shaokai Ye, Xiaoyu Feng, Tianyun Zhang, Xiaolong Ma, Sheng Lin, Zhengang Li, Kaidi Xu, Wujie Wen, Sijia Liu, Jian Tang, Makan Fardad, Xue Lin, Yongpan Liu, Yanzhi Wang

A recent work developed a systematic frame-work of DNN weight pruning using the advanced optimization technique ADMM (Alternating Direction Methods of Multipliers), achieving one of state-of-art in weight pruning results.

Model Compression Quantization

StructADMM: A Systematic, High-Efficiency Framework of Structured Weight Pruning for DNNs

1 code implementation29 Jul 2018 Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Xiaolong Ma, Ning Liu, Linfeng Zhang, Jian Tang, Kaisheng Ma, Xue Lin, Makan Fardad, Yanzhi Wang

Without loss of accuracy on the AlexNet model, we achieve 2. 58X and 3. 65X average measured speedup on two GPUs, clearly outperforming the prior work.

Model Compression

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs

no code implementations28 Mar 2018 Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang

For FPGA implementations on deep convolutional neural networks (DCNNs), we achieve at least 152X and 72X improvement in performance and energy efficiency, respectively using the SWM-based framework, compared with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10.

C3PO: Database and Benchmark for Early-stage Malicious Activity Detection in 3D Printing

no code implementations20 Mar 2018 Zhe Li, Xiaolong Ma, Hongjia Li, Qiyuan An, Aditya Singh Rathore, Qinru Qiu, Wenyao Xu, Yanzhi Wang

It is of vital importance to enable 3D printers to identify the objects to be printed, so that the manufacturing procedure of an illegal weapon can be terminated at the early stage.

Action Detection Activity Detection +1

Image Dataset for Visual Objects Classification in 3D Printing

no code implementations15 Feb 2018 Hongjia Li, Xiaolong Ma, Aditya Singh Rathore, Zhe Li, Qiyuan An, Chen Song, Wenyao Xu, Yanzhi Wang

The rapid development in additive manufacturing (AM), also known as 3D printing, has brought about potential risk and security issues along with significant benefits.

Classification General Classification

An Area and Energy Efficient Design of Domain-Wall Memory-Based Deep Convolutional Neural Networks using Stochastic Computing

no code implementations3 Feb 2018 Xiaolong Ma, Yi-Peng Zhang, Geng Yuan, Ao Ren, Zhe Li, Jie Han, Jingtong Hu, Yanzhi Wang

However, in these works, the memory design optimization is neglected for weight storage, which will inevitably result in large hardware cost.

Person Re-Identification by Unsupervised Video Matching

no code implementations25 Nov 2016 Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, Yisheng Zhong

Crucially, this model does not require pairwise labelled training data (i. e. unsupervised) therefore readily scalable to large scale camera networks of arbitrary camera pairs without the need for exhaustive data annotation for every camera pair.

Benchmarking Dynamic Time Warping +2

Cannot find the paper you are looking for? You can Submit a new open access paper.