Search Results for author: Kan Zhu

Found 3 papers, 2 papers with code

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models

1 code implementation10 Feb 2024 Keisuke Kamahori, Yile Gu, Kan Zhu, Baris Kasikci

Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks.

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

1 code implementation29 Oct 2023 Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, Baris Kasikci

To maximize LLMs' serving throughput, we introduce Atom, a low-bit quantization method that achieves high throughput improvements with negligible accuracy loss.

Quantization Sentiment Analysis

Practical Algorithms for Learning Near-Isometric Linear Embeddings

no code implementations1 Jan 2016 Jerry Luo, Kayla Shapiro, Hao-Jun Michael Shi, Qi Yang, Kan Zhu

Motivated by non-negative matrix factorization, we reformulate our problem into a Frobenius norm minimization problem, which is solved by the Alternating Direction Method of Multipliers (ADMM) and develop an algorithm, FroMax.

Dimensionality Reduction

Cannot find the paper you are looking for? You can Submit a new open access paper.