no code implementations • 12 Feb 2025 • Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar
Large language models' significant advances in capabilities are accompanied by significant increases in inference costs.
no code implementations • 26 Jan 2025 • Dylan Cutler, Arun Kandoor, Nishanth Dikkala, Nikunj Saunshi, Xin Wang, Rina Panigrahy
Instead, we stagger the execution and only allow a dependency on token representations until time step $i-1$.
no code implementations • 6 Nov 2024 • Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy
We are able to identify certain "planning" and "reasoning" mechanisms in the network that necessitate cooperation between the attention blocks to implement the desired logic.
1 code implementation • 16 Sep 2024 • Kulin Shah, Nishanth Dikkala, Xin Wang, Rina Panigrahy
We observe that Transformer models trained on this synthetic task can indeed learn to solve Sudokus (our model solves $94. 21\%$ of the puzzles fully correctly) when trained on a logical sequence of steps taken by a solver.
no code implementations • 18 Oct 2023 • Yuanzhi Li, Raghu Meka, Rina Panigrahy, Kulin Shah
Deep networks typically learn concepts via classifiers, which involves setting up a model and training it via gradient descent to fit the concept-labeled data.
no code implementations • 31 Jan 2023 • Cenk Baykal, Dylan J Cutler, Nishanth Dikkala, Nikhil Ghosh, Rina Panigrahy, Xin Wang
One way of introducing sparsity into deep networks is by attaching an external table of parameters that is sparsely looked up at different layers of the network.
no code implementations • 8 Aug 2022 • Cenk Baykal, Nishanth Dikkala, Rina Panigrahy, Cyrus Rashtchian, Xin Wang
After representing LSH-based sparse networks with our model, we prove that sparse networks can match the approximation power of dense networks on Lipschitz functions.
no code implementations • 13 Apr 2022 • Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He, Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 29 Sep 2021 • Rina Panigrahy, Brendan Juba, Zihao Deng, Xin Wang, Zee Fryer
We propose a modular architecture for lifelong learning of hierarchically structured tasks.
no code implementations • ICLR 2021 • Atish Agarwala, Abhimanyu Das, Brendan Juba, Rina Panigrahy, Vatsal Sharan, Xin Wang, Qiuyi Zhang
Can deep learning solve multiple tasks simultaneously, even when they are unrelated and very different?
1 code implementation • 11 Mar 2021 • Nishanth Dikkala, Gal Kaplun, Rina Panigrahy
We provide theoretical and empirical evidence that neural representations can be viewed as LSH-like functions that map each input to an embedding that is a function of solely the informative $\gamma$ and invariant to $\theta$, effectively recovering the manifold identifier $\gamma$.
no code implementations • 15 May 2020 • Atish Agarwala, Abhimanyu Das, Rina Panigrahy, Qiuyi Zhang
We present experimental evidence that the many-body gravitational force function is easier to learn with ReLU networks as compared to networks with exponential activations.
no code implementations • 3 Oct 2019 • Rina Panigrahy
How we store information in our mind has been a major intriguing open question.
no code implementations • 29 May 2019 • Badih Ghazi, Rina Panigrahy, Joshua R. Wang
The sketch summarizes essential information about the inputs and outputs of the network and can be used to quickly identify key components and summary statistics of the inputs.
no code implementations • 8 Apr 2019 • Abhimanyu Das, Sreenivas Gollapudi, Ravi Kumar, Rina Panigrahy
In this paper we study the learnability of deep random networks from both theoretical and practical points of view.
no code implementations • ICLR 2019 • Surbhi Goel, Rina Panigrahy
Giving provable guarantees for learning neural networks is a core challenge of machine learning theory.
no code implementations • ICML 2017 • Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David P. Woodruff
We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem.
no code implementations • 1 Feb 2017 • Rina Panigrahy, Sushant Sachdeva, Qiuyi Zhang
Iterating, we show that gradient descent can be used to learn the entire network one node at a time.
no code implementations • 13 Nov 2013 • Behnam Neyshabur, Rina Panigrahy
We investigate the problem of factorizing a matrix into several sparse matrices and propose an algorithm for this under randomness and sparsity assumptions.
no code implementations • 7 May 2013 • Alexandr Andoni, Rina Panigrahy
To obtain our main result, we show that the optimal payoff functions have to satisfy the Hermite differential equation, and hence are given by the solutions to this equation.
no code implementations • 29 Apr 2013 • Rina Panigrahy, Preyas Popat
In this paper we show a randomized algorithm that in an amortized sense gets a regret of $O(\sqrt x)$ for any interval when the sequence is partitioned into intervals arbitrarily.
no code implementations • 29 Apr 2013 • Rina Panigrahy, Preyas Popat
In this work we study how "fractal-like" processes arise in a prediction game where an adversary is generating a sequence of bits and an algorithm is trying to predict them.
no code implementations • NeurIPS 2011 • Michael Kapralov, Rina Panigrahy
Moreover, for {\em any window of size $n$} the regret of our algorithm to any expert never exceeds $O(\sqrt{n(\log N+\log T)})$, where $N$ is the number of experts and $T$ is the time horizon, while maintaining the essentially zero loss property.