Search Results for author: Vladimir Malinovskii

Found 3 papers, 3 papers with code

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

1 code implementation31 Jan 2025 Alina Shutova, Vladimir Malinovskii, Vage Egiazarian, Denis Kuznedelev, Denis Mazur, Nikita Surkov, Ivan Ermakov, Dan Alistarh

Efficient real-world deployments of large language models (LLMs) rely on Key-Value (KV) caching for processing and generating long outputs, reducing the need for repetitive computation.

Quantization

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

1 code implementation23 May 2024 Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtarik

In this work, we question the use of STE for extreme LLM compression, showing that it can be sub-optimal, and perform a systematic study of quantization-aware fine-tuning strategies for LLMs.

Quantization

Cannot find the paper you are looking for? You can Submit a new open access paper.