Search Results for author: Yilong Zhao

Found 4 papers, 1 papers with code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation29 Oct 2023 Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

Neural-PIM: Efficient Processing-In-Memory with Neural Approximation of Peripherals

no code implementations30 Jan 2022 Weidong Cao, Yilong Zhao, Adith Boloor, Yinhe Han, Xuan Zhang, Li Jiang

This paper presents a new PIM architecture to efficiently accelerate deep learning tasks by minimizing the required A/D conversions with analog accumulation and neural approximated peripheral circuits.

Quantization

An Ultra-Efficient Memristor-Based DNN Framework with Structured Weight Pruning and Quantization Using ADMM

no code implementations29 Aug 2019 Geng Yuan, Xiaolong Ma, Caiwen Ding, Sheng Lin, Tianyun Zhang, Zeinab S. Jalali, Yilong Zhao, Li Jiang, Sucheta Soundarajan, Yanzhi Wang

Memristor-based weight pruning and weight quantization have been seperately investigated and proven effectiveness in reducing area and power consumption compared to the original DNN model.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.