no code implementations • 7 Mar 2024 • Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar
Large language models (LLMs) have demonstrated strong results on a range of NLP tasks.
no code implementations • 15 Dec 2023 • Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar
Answering complex natural language questions often necessitates multi-step reasoning and integrating external information.
Ranked #1 on Question Answering on Bamboogle
no code implementations • NeurIPS 2023 • Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu
We show that the optimal draft selection algorithm (transport plan) can be computed via linear programming, whose best-known runtime is exponential in $k$.
no code implementations • 9 Nov 2022 • Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar
By contrast, when the context is irrelevant to the task, the model should ignore it and fall back on its internal knowledge.
no code implementations • 1 Nov 2022 • Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar
Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts.
no code implementations • 12 Oct 2022 • Zonglin Li, Chong You, Srinadh Bhojanapalli, Daliang Li, Ankit Singh Rawat, Sashank J. Reddi, Ke Ye, Felix Chern, Felix Yu, Ruiqi Guo, Sanjiv Kumar
This paper studies the curious phenomenon for machine learning models with Transformer architectures that their activation maps are sparse.
1 code implementation • CVPR 2023 • Yuanhao Xiong, Ruochen Wang, Minhao Cheng, Felix Yu, Cho-Jui Hsieh
Federated learning~(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints.
no code implementations • 9 Mar 2022 • Ananda Theertha Suresh, Ziteng Sun, Jae Hun Ro, Felix Yu
We show that applying the proposed protocol as sub-routine in distributed optimization algorithms leads to better convergence rates.
no code implementations • 28 Oct 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang
In this paper, we study training and inference of neural networks under the MPC setup.
no code implementations • 29 Sep 2021 • Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang
In this paper, we study training and inference of neural networks under the MPC setup.
no code implementations • Joint Conference on Lexical and Computational Semantics 2021 • Jiefu Ou, Nathaniel Weir, Anton Belyy, Felix Yu, Benjamin Van Durme
We propose a structured extension to bidirectional-context conditional language generation, or "infilling," inspired by Frame Semantic theory (Fillmore, 1976).
no code implementations • 1 Dec 2020 • Chen Zhu, Ankit Singh Rawat, Manzil Zaheer, Srinadh Bhojanapalli, Daliang Li, Felix Yu, Sanjiv Kumar
In this paper, we propose a new task of \emph{explicitly modifying specific factual knowledge in Transformer models while ensuring the model performance does not degrade on the unmodified facts}.
no code implementations • EMNLP 2020 • Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar
Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising.
no code implementations • NeurIPS 2020 • Yuhan Liu, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, Michael Riley
If each user has $m$ samples, we show that straightforward applications of Laplace or Gaussian mechanisms require the number of users to be $\mathcal{O}(k/(m\alpha^2) + k/\epsilon\alpha)$ to achieve an $\ell_1$ distance of $\alpha$ between the true and estimated distributions, with the privacy-induced penalty $k/\epsilon\alpha$ independent of the number of samples per user $m$.
1 code implementation • 25 Jul 2020 • Tiansheng Yao, Xinyang Yi, Derek Zhiyuan Cheng, Felix Yu, Ting Chen, Aditya Menon, Lichan Hong, Ed H. Chi, Steve Tjoa, Jieqi Kang, Evan Ettinger
Our online results also verify our hypothesis that our framework indeed improves model performance even more on slices that lack supervision.
no code implementations • 23 Apr 2020 • Ankit Singh Rawat, Aditya Krishna Menon, Andreas Veit, Felix Yu, Sashank J. Reddi, Sanjiv Kumar
Modern retrieval problems are characterised by training sets with potentially billions of labels, and heterogeneous data distributions across subpopulations (e. g., users of a retrieval system may be from different countries), each of which poses a challenge.
no code implementations • 31 Mar 2020 • Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky
In the Vision-and-Language Navigation (VLN) task, an agent with egocentric vision navigates to a destination given natural language instructions.
no code implementations • NeurIPS 2019 • Ankit Singh Rawat, Jiecao Chen, Felix Yu, Ananda Theertha Suresh, Sanjiv Kumar
For the settings where a large number of classes are involved, a common method to speed up training is to sample a subset of classes and utilize an estimate of the loss gradient based on these classes, known as the sampled softmax method.
1 code implementation • SIGGRAPH 2019 2019 • Ethan Tseng, Felix Yu, Yuting Yang, Fahim Mannan, Karl St. Arnaud, Derek Nowrouzezahrai, Jean-François Lalonde, Felix Heide
We present a fully automatic system to optimize the parameters of black-box hardware and software image processing pipelines according to any arbitrary (i. e., application-specific) metric.
no code implementations • 16 Oct 2018 • Sashank J. Reddi, Satyen Kale, Felix Yu, Dan Holtmann-Rice, Jiecao Chen, Sanjiv Kumar
Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i. e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining.
no code implementations • ICML 2018 • Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar
For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space.
no code implementations • NeurIPS 2018 • Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan McMahan
Distributed stochastic gradient descent is an important subroutine in distributed learning.
no code implementations • NeurIPS 2017 • Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel N. Holtmann-Rice, David Simcha, Felix Yu
We propose a multiscale quantization approach for fast similarity search on large, high-dimensional datasets.
no code implementations • 15 Nov 2017 • Shankar Kumar, Michael Nirschl, Daniel Holtmann-Rice, Hank Liao, Ananda Theertha Suresh, Felix Yu
Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional N-gram LMs on speech recognition tasks.