no code implementations • 21 Feb 2025 • Avanika Narayan, Dan Biderman, Sabri Eyuboglu, Avner May, Scott Linderman, James Zou, Christopher Re
MinionS reduces costs by 5. 7x on average while recovering 97. 9% of the performance of the remote model alone.
2 code implementations • 27 Aug 2024 • Junxiong Wang, Daniele Paliotta, Avner May, Alexander M. Rush, Tri Dao
The resulting hybrid model, which incorporates a quarter of the attention layers, achieves performance comparable to the original Transformer in chat benchmarks and outperforms open-source hybrid Mamba models trained from scratch with trillions of tokens in both chat benchmarks and general benchmarks.
1 code implementation • 4 Jun 2024 • Ruslan Svirschevski, Avner May, Zhuoming Chen, Beidi Chen, Zhihao Jia, Max Ryabinin
We propose SpecExec (Speculative Execution), a simple parallel decoding method that can generate up to 20 tokens per target model iteration for popular LLM families.
1 code implementation • 19 Feb 2024 • Zhuoming Chen, Avner May, Ruslan Svirschevski, Yuhsun Huang, Max Ryabinin, Zhihao Jia, Beidi Chen
This paper introduces Sequoia, a scalable, robust, and hardware-aware algorithm for speculative decoding.
no code implementations • 14 Dec 2023 • Avner May, Dmitriy Serdyuk, Ankit Parag Shah, Otavio Braga, Olivier Siohan
Audio-visual automatic speech recognition (AV-ASR) models are very effective at reducing word error rates on noisy speech, but require large amounts of transcribed AV training data.
no code implementations • ACL 2020 • Simran Arora, Avner May, Jian Zhang, Christopher Ré
We study the settings for which deep contextual embeddings (e. g., BERT) give large improvements in performance relative to classic pretrained embeddings (e. g., GloVe), and an even simpler baseline---random word embeddings---focusing on the impact of the training set size and the linguistic properties of the task.
1 code implementation • 29 Feb 2020 • Megan Leszczynski, Avner May, Jian Zhang, Sen Wu, Christopher R. Aberger, Christopher Ré
To theoretically explain this tradeoff, we introduce a new measure of embedding instability---the eigenspace instability measure---which we prove bounds the disagreement in downstream predictions introduced by the change in word embeddings.
1 code implementation • NeurIPS 2019 • Avner May, Jian Zhang, Tri Dao, Christopher Ré
Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to $2\times$ lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.
1 code implementation • 31 Oct 2018 • Jian Zhang, Avner May, Tri Dao, Christopher Ré
We investigate how to train kernel approximation methods that generalize well under a memory budget.
no code implementations • 13 Jan 2017 • Avner May, Alireza Bagheri Garakani, Zhiyun Lu, Dong Guo, Kuan Liu, Aurélien Bellet, Linxi Fan, Michael Collins, Daniel Hsu, Brian Kingsbury, Michael Picheny, Fei Sha
First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection.
no code implementations • 18 Mar 2016 • Zhiyun Lu, Dong Guo, Alireza Bagheri Garakani, Kuan Liu, Avner May, Aurelien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha
We study large-scale kernel methods for acoustic modeling and compare to DNNs on performance metrics related to both acoustic modeling and recognition.
no code implementations • 14 Nov 2014 • Zhiyun Lu, Avner May, Kuan Liu, Alireza Bagheri Garakani, Dong Guo, Aurélien Bellet, Linxi Fan, Michael Collins, Brian Kingsbury, Michael Picheny, Fei Sha
The computational complexity of kernel methods has often been a major barrier for applying them to large-scale learning problems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2