Search Results for author: Avner May

Found 12 papers, 6 papers with code

Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models

no code implementations21 Feb 2025 Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re

MinionS reduces costs by 5. 7x on average while recovering 97. 9% of the performance of the remote model alone.

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

2 code implementations27 Aug 2024 Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao

The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks.

Language Modeling Language Modelling +1

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

1 code implementation4 Jun 2024 Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin

We propose SpecExec (Speculative Execution), a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.

Text Generation

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

1 code implementation19 Feb 2024 Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen

This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.

Audio-visual fine-tuning of audio-only ASR models

no code implementations14 Dec 2023 Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan

Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data.

Automatic Speech Recognition Self-Supervised Learning +2

Contextual Embeddings: When Are They Worth It?

no code implementations ACL 2020 Simran Arora, Avner May, Jian Zhang, Christopher Ré

We study the settings for which deep contextual embeddings (e. g., BERT) give large improvements in performance relative to classic pretrained embeddings (e. g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task.

Word Embeddings

Understanding the Downstream Instability of Word Embeddings

1 code implementation29 Feb 2020 Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré

To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.

Word Embeddings

On the Downstream Performance of Compressed Word Embeddings

1 code implementation NeurIPS 2019 Avner May, Jian Zhang, Tri Dao, Christopher Ré

Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.

Generalization Bounds Quantization +1

Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation

1 code implementation31 Oct 2018 Jian Zhang, Avner May, Tri Dao, Christopher Ré

We investigate how to train kernel approximation methods that generalize well under a memory budget.

Quantization

Kernel Approximation Methods for Speech Recognition

no code implementations13 Jan 2017 Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha

First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.

feature selection speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.