Search Results for author: Felix Yu

Found 24 papers, 3 papers with code

SpecTr: Fast Speculative Decoding via Optimal Transport

no code implementations NeurIPS 2023 Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$.

Language Modelling Large Language Model

Large Language Models with Controllable Working Memory

no code implementations9 Nov 2022 Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.

counterfactual World Knowledge

The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers

no code implementations12 Oct 2022 Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar

This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.

FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning

1 code implementation CVPR 2023 Yuanhao Xiong, Ruochen Wang, Minhao Cheng, Felix Yu, Cho-Jui Hsieh

Federated learning~(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints.

Federated Learning Image Classification

Correlated quantization for distributed mean estimation and optimization

no code implementations9 Mar 2022 Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix Yu

We show that applying the proposed protocol as sub-routine in distributed optimization algorithms leads to better convergence rates.

Distributed Optimization Quantization

InFillmore: Frame-Guided Language Generation with Bidirectional Context

no code implementations Joint Conference on Lexical and Computational Semantics 2021 Jiefu Ou, Nathaniel Weir, Anton Belyy, Felix Yu, Benjamin Van Durme

We propose a structured extension to bidirectional-context conditional language generation, or "infilling," inspired by Frame Semantic theory (Fillmore, 1976).

Text Infilling

Modifying Memories in Transformer Models

no code implementations1 Dec 2020 Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar

In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.


Semantic Label Smoothing for Sequence to Sequence Problems

no code implementations EMNLP 2020 Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.

Machine Translation Translation

Learning discrete distributions: user vs item-level privacy

no code implementations NeurIPS 2020 Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley

If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.

Federated Learning

Self-supervised Learning for Large-scale Item Recommendations

1 code implementation25 Jul 2020 Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger

Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.

Data Augmentation Natural Language Understanding +3

Doubly-stochastic mining for heterogeneous retrieval

no code implementations23 Apr 2020 Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar

Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.

Retrieval Stochastic Optimization

Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation

no code implementations31 Mar 2020 Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions.

Vision and Language Navigation

Sampled Softmax with Random Fourier Features

no code implementations NeurIPS 2019 Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar

For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.

Hyperparameter Optimization in Black-box Image Processing using Differentiable Proxies

1 code implementation SIGGRAPH 2019 2019 Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl St. Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, Felix Heide

We present a fully automatic system to optimize the parameters of black-box hardware and software image processing pipelines according to any arbitrary (i. e., application-specific) metric.

Hyperparameter Optimization Image Denoising +3

Stochastic Negative Mining for Learning with Large Output Spaces

no code implementations16 Oct 2018 Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar

Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.


Loss Decomposition for Fast Learning in Large Output Spaces

no code implementations ICML 2018 Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.

Word Embeddings

Lattice Rescoring Strategies for Long Short Term Memory Language Models in Speech Recognition

no code implementations15 Nov 2017 Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu

Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.