Search Results for author: Kan Zhu

Found 3 papers, 2 papers with code

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

1 code implementation • 10 Feb 2024 • Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci

Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks.

128

Paper
Code

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation • 29 Oct 2023 • Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

160

Paper
Code

Practical Algorithms for Learning Near-Isometric Linear Embeddings

no code implementations • 1 Jan 2016 • Jerry Luo, Kayla Shapiro, Hao-Jun Michael Shi, Qi Yang, Kan Zhu

Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax.

Dimensionality Reduction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.