no code implementations • 29 Jan 2024 • Guru Guruganesh, Yoav Kolumbus, Jon Schneider, Inbal Talgam-Cohen, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Joshua R. Wang, S. Matthew Weinberg
We initiate the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes.
no code implementations • 6 Oct 2023 • Shanda Li, Chong You, Guru Guruganesh, Joshua Ainslie, Santiago Ontanon, Manzil Zaheer, Sumit Sanghai, Yiming Yang, Sanjiv Kumar, Srinadh Bhojanapalli
Preventing the performance decay of Transformers on inputs longer than those used for training has been an important challenge in extending the context length of these models.
no code implementations • 5 Oct 2022 • Mingda Qiao, Guru Guruganesh, Ankit Singh Rawat, Avinava Dubey, Manzil Zaheer
Regev and Vijayaraghavan (2017) showed that with $\Delta = \Omega(\sqrt{\log k})$ separation, the means can be learned using $\mathrm{poly}(k, d)$ samples, whereas super-polynomially many samples are required if $\Delta = o(\sqrt{\log k})$ and $d = \Omega(\log k)$.
no code implementations • NeurIPS 2021 • Guru Guruganesh, Allen Liu, Jon Schneider, Joshua Wang
We consider the problem of multi-class classification, where a stream of adversarially chosen queries arrive and must be assigned a label online.
no code implementations • 7 Sep 2021 • Ashwinkumar Badanidiyuru, Zhe Feng, Guru Guruganesh
For binary feedback, when the noise distribution $\mathcal{F}$ is known, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $\widetilde{O}(\sqrt{\log(d) T})$ regret.
no code implementations • NeurIPS 2021 • Sreenivas Gollapudi, Guru Guruganesh, Kostas Kollias, Pasin Manurangsi, Renato Paes Leme, Jon Schneider
We design algorithms for this problem which achieve regret $O(d\log T)$ and $\exp(O(d \log d))$.
2 code implementations • 22 Oct 2020 • Nicholas Monath, Avinava Dubey, Guru Guruganesh, Manzil Zaheer, Amr Ahmed, Andrew McCallum, Gokhan Mergen, Marc Najork, Mert Terzihan, Bryon Tjanaka, YuAn Wang, Yuchen Wu
The applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability.
14 code implementations • NeurIPS 2020 • Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed
To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear.
Ranked #1 on Text Classification on Arxiv HEP-TH citation graph