Search Results for author: Minsik Cho

Found 25 papers, 0 papers with code

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

no code implementations19 Jul 2024 Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi

The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens.

Question Answering

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

no code implementations12 Dec 2023 Keivan Alizadeh, Iman Mirzadeh, Dmitry Belenko, Karen Khatamifard, Minsik Cho, Carlo C Del Mundo, Mohammad Rastegari, Mehrdad Farajtabar

These methods collectively enable running models up to twice the size of the available DRAM, with a 4-5x and 20-25x increase in inference speed compared to naive loading approaches in CPU and GPU, respectively.

Language Modelling Large Language Model +1

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

no code implementations9 Oct 2023 Utkarsh Oggy Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms.

Do Compressed LLMs Forget Knowledge? An Experimental Study with Practical Implications

no code implementations2 Oct 2023 Duc N. M Hoang, Minsik Cho, Thomas Merth, Mohammad Rastegari, Zhangyang Wang

We start by proposing two conjectures on the nature of the damage: one is certain knowledge being forgotten (or erased) after LLM compression, hence necessitating the compressed model to (re)learn from data with additional parameters; the other presumes that knowledge is internally displaced and hence one requires merely "inference re-direction" with input-side augmentation such as prompting, to recover the knowledge-related performance.

eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models

no code implementations2 Sep 2023 Minsik Cho, Keivan A. Vahid, Qichen Fu, Saurabh Adya, Carlo C Del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal

Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection.

Clustering Quantization

Flexible Keyword Spotting based on Homogeneous Audio-Text Embedding

no code implementations12 Aug 2023 Kumari Nishu, Minsik Cho, Paul Dixon, Devang Naik

Spotting user-defined/flexible keywords represented in text frequently uses an expensive text encoder for joint analysis with an audio encoder in an embedding space, which can suffer from heterogeneous modality representation (i. e., large mismatch) and increased complexity.

Keyword Spotting

Matching Latent Encoding for Audio-Text based Keyword Spotting

no code implementations8 Jun 2023 Kumari Nishu, Minsik Cho, Devang Naik

Using audio and text embeddings jointly for Keyword Spotting (KWS) has shown high-quality results, but the key challenge of how to semantically align two embeddings for multi-word keywords of different sequence lengths remains largely unsolved.

Keyword Spotting

R2 Loss: Range Restriction Loss for Model Compression and Quantization

no code implementations14 Mar 2023 Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya

To overcome the challenge, we focus on outliers in weights of a pre-trained model which disrupt effective lower bit quantization and compression.

Classification Model Compression +2

DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

no code implementations ICLR 2022 Minsik Cho, Keivan A. Vahid, Saurabh Adya, Mohammad Rastegari

For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 63. 9% top-1 ImageNet1k accuracy with 0. 72 MB model size (22. 4x model compression factor).

Clustering Neural Network Compression

Exploring Avenues Beyond Revised DSD Functionals: I. range separation, with xDSD as a special case

no code implementations9 Feb 2021 Golokesh Santra, Minsik Cho, Jan M. L. Martin

We have explored the use of range separation as a possible avenue for further improvement on our revDSD minimally empirical double hybrid functionals.

Chemical Physics

NASTransfer: Analyzing Architecture Transferability in Large Scale Neural Architecture Search

no code implementations23 Jun 2020 Rameswar Panda, Michele Merler, Mayoore Jaiswal, Hui Wu, Kandan Ramakrishnan, Ulrich Finkler, Chun-Fu Chen, Minsik Cho, David Kung, Rogerio Feris, Bishwaranjan Bhattacharjee

The typical way of conducting large scale NAS is to search for an architectural building block on a small dataset (either using a proxy set from the large dataset or a completely different small scale dataset) and then transfer the block to a larger dataset.

Neural Architecture Search

SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning of Convolutional Neural Networks

no code implementations ICLR 2020 Chungkuk Yoo, Bumsoo Kang, Minsik Cho

SNOW is an efficient learning method to improve training/serving throughput as well as accuracy for transfer and lifelong learning of convolutional neural networks based on knowledge subscription.

Image Classification

SimEx: Express Prediction of Inter-dataset Similarity by a Fleet of Autoencoders

no code implementations14 Jan 2020 Inseok Hwang, Jinho Lee, Frank Liu, Minsik Cho

Our intuition is that, the more similarity exists between the unknown data samples and the part of known data that an autoencoder was trained with, the better chances there could be that this autoencoder makes use of its trained knowledge, reconstructing output samples closer to the originals.

Data Augmentation

MUTE: Data-Similarity Driven Multi-hot Target Encoding for Neural Network Design

no code implementations15 Oct 2019 Mayoore S. Jaiswal, Bumboo Kang, Jinho Lee, Minsik Cho

Target encoding is an effective technique to deliver better performance for conventional machine learning methods, and recently, for deep neural networks as well.

General Classification Image Classification

Deep Learning for Multi-Messenger Astrophysics: A Gateway for Discovery in the Big Data Era

no code implementations1 Feb 2019 Gabrielle Allen, Igor Andreoni, Etienne Bachelet, G. Bruce Berriman, Federica B. Bianco, Rahul Biswas, Matias Carrasco Kind, Kyle Chard, Minsik Cho, Philip S. Cowperthwaite, Zachariah B. Etienne, Daniel George, Tom Gibbs, Matthew Graham, William Gropp, Anushri Gupta, Roland Haas, E. A. Huerta, Elise Jennings, Daniel S. Katz, Asad Khan, Volodymyr Kindratenko, William T. C. Kramer, Xin Liu, Ashish Mahabal, Kenton McHenry, J. M. Miller, M. S. Neubauer, Steve Oberlin, Alexander R. Olivas Jr, Shawn Rosofsky, Milton Ruiz, Aaron Saxton, Bernard Schutz, Alex Schwing, Ed Seidel, Stuart L. Shapiro, Hongyu Shen, Yue Shen, Brigitta M. Sipőcz, Lunan Sun, John Towns, Antonios Tsokaros, Wei Wei, Jack Wells, Timothy J. Williams, JinJun Xiong, Zhizhen Zhao

We discuss key aspects to realize this endeavor, namely (i) the design and exploitation of scalable and computationally efficient AI algorithms for Multi-Messenger Astrophysics; (ii) cyberinfrastructure requirements to numerically simulate astrophysical sources, and to process and interpret Multi-Messenger Astrophysics data; (iii) management of gravitational wave detections and triggers to enable electromagnetic and astro-particle follow-ups; (iv) a vision to harness future developments of machine and deep learning and cyberinfrastructure resources to cope with the scale of discovery in the Big Data Era; (v) and the need to build a community that brings domain experts together with data scientists on equal footing to maximize and accelerate discovery in the nascent field of Multi-Messenger Astrophysics.

Astronomy Management

Data-parallel distributed training of very large models beyond GPU capacity

no code implementations29 Nov 2018 Samuel Matzek, Max Grossman, Minsik Cho, Anar Yusifov, Bryant Nelson, Amit Juneja

GPUs have limited memory and it is difficult to train wide and/or deep models that cause the training process to go out of memory.

A Unified Approximation Framework for Compressing and Accelerating Deep Neural Networks

no code implementations26 Jul 2018 Yuzhe Ma, Ran Chen, Wei Li, Fanhua Shang, Wenjian Yu, Minsik Cho, Bei Yu

To address this issue, various approximation techniques have been investigated, which seek for a light weighted network with little performance degradation in exchange of smaller model size or faster inference.

General Classification Image Classification +1


no code implementations7 Aug 2017 Minsik Cho, Ulrich Finkler, Sameer Kumar, David Kung, Vaibhav Saxena, Dheeraj Sreedhar

We train Resnet-101 on Imagenet 22K with 64 IBM Power8 S822LC servers (256 GPUs) in about 7 hours to an accuracy of 33. 8 % validation accuracy.

MEC: Memory-efficient Convolution for Deep Neural Network

no code implementations ICML 2017 Minsik Cho, Daniel Brand

However, all these indirect methods have high memory-overhead, which creates performance degradation and offers a poor trade-off between performance and memory consumption.

Cannot find the paper you are looking for? You can Submit a new open access paper.