Search Results for author: Liangzhen Lai

Found 20 papers, 6 papers with code

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition

no code implementations14 Sep 2023 Yang Li, Liangzhen Lai, Yuan Shangguan, Forrest N. Iandola, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra

Instead, the bottleneck lies in the linear projection layers of multi-head attention and feedforward networks, constituting a substantial portion of the model size and contributing significantly to computation, memory, and power usage.

speech-recognition Speech Recognition

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

1 code implementation26 Jan 2023 Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Yang Li, Liangzhen Lai, Ilias Leontiadis, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, G. Edward Suh

Together, for various on-device ML applications such as recommendation and language modeling, our system on a single V100 GPU can serve up to $100, 000$ queries per second -- a $>100 \times$ throughput improvement over a CPU-based baseline -- while maintaining model accuracy.

Information Retrieval Language Modelling +1

DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads

no code implementations7 Dec 2022 Seah Kim, Hyoukjun Kwon, Jinook Song, Jihyuck Jo, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra

Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads.

Scheduling

Low-Rank+Sparse Tensor Compression for Neural Networks

no code implementations2 Nov 2021 Cole Hawkins, Haichuan Yang, Meng Li, Liangzhen Lai, Vikas Chandra

Low-rank tensor compression has been proposed as a promising approach to reduce the memory and compute requirements of neural networks for their deployment on edge devices.

Tensor Decomposition

Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation

1 code implementation CVPR 2022 Jiaqi Gu, Hyoukjun Kwon, Dilin Wang, Wei Ye, Meng Li, Yu-Hsin Chen, Liangzhen Lai, Vikas Chandra, David Z. Pan

Therefore, we propose HRViT, which enhances ViTs to learn semantically-rich and spatially-precise multi-scale representations by integrating high-resolution multi-branch architectures with ViTs.

Image Classification Representation Learning +3

Improving Efficiency in Neural Network Accelerator Using Operands Hamming Distance optimization

no code implementations13 Feb 2020 Meng Li, Yilei Li, Pierce Chuang, Liangzhen Lai, Vikas Chandra

Neural network accelerator is a key enabler for the on-device AI inference, for which energy efficiency is an important metric.

Co-Exploration of Neural Architectures and Heterogeneous ASIC Accelerator Designs Targeting Multiple Tasks

no code implementations10 Feb 2020 Lei Yang, Zheyu Yan, Meng Li, Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra, Weiwen Jiang, Yiyu Shi

Neural Architecture Search (NAS) has demonstrated its power on various AI accelerating platforms such as Field Programmable Gate Arrays (FPGAs) and Graphic Processing Units (GPUs).

Neural Architecture Search

Heterogeneous Dataflow Accelerators for Multi-DNN Workloads

no code implementations13 Sep 2019 Hyoukjun Kwon, Liangzhen Lai, Tushar Krishna, Vikas Chandra

The results suggest that HDA is an alternative class of Pareto-optimal accelerators to RDA with strength in energy, which can be a better choice than RDAs depending on the use cases.

Distributed, Parallel, and Cluster Computing

Rethinking Machine Learning Development and Deployment for Edge Devices

no code implementations20 Jun 2018 Liangzhen Lai, Naveen Suda

Machine learning (ML), especially deep learning is made possible by the availability of big data, enormous compute power and, often overlooked, development tools or frameworks.

BIG-bench Machine Learning

Federated Learning with Non-IID Data

2 code implementations2 Jun 2018 Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, Vikas Chandra

Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.

Federated Learning

CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs

1 code implementation19 Jan 2018 Liangzhen Lai, Naveen Suda, Vikas Chandra

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication.

Efficient Neural Network

Not All Ops Are Created Equal!

no code implementations12 Jan 2018 Liangzhen Lai, Naveen Suda, Vikas Chandra

Efficient and compact neural network models are essential for enabling the deployment on mobile and embedded devices.

Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Networks

no code implementations5 Dec 2017 Hardik Sharma, Jongse Park, Naveen Suda, Liangzhen Lai, Benson Chau, Joon Kyung Kim, Vikas Chandra, Hadi Esmaeilzadeh

Compared to Stripes, BitFusion provides 2. 6x speedup and 3. 9x energy reduction at 45 nm node when BitFusion area and frequency are set to those of Stripes.

Hello Edge: Keyword Spotting on Microcontrollers

18 code implementations20 Nov 2017 Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra

We train various neural network architectures for keyword spotting published in literature to compare their accuracy and memory/compute requirements.

Keyword Spotting

PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training

no code implementations ICLR 2018 Meng Li, Liangzhen Lai, Naveen Suda, Vikas Chandra, David Z. Pan

Massive data exist among user local platforms that usually cannot support deep neural network (DNN) training due to computation and storage resource constraints.

General Classification Image Classification +1

Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations

no code implementations8 Mar 2017 Liangzhen Lai, Naveen Suda, Vikas Chandra

To alleviate these problems to some extent, prior research utilize low precision fixed-point numbers to represent the CNN weights and activations.

Cannot find the paper you are looking for? You can Submit a new open access paper.