no code implementations • 8 Mar 2024 • Naman Agarwal, Pranjal Awasthi, Satyen Kale, Eric Zhao
Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks.
no code implementations • 11 Feb 2024 • Rudrajit Das, Naman Agarwal, Sujay Sanghavi, Inderjit S. Dhillon
Specifically, for a $d$-dimensional quadratic with a diagonal Hessian having condition number $\kappa$, we show that the effective condition number-like quantity controlling the iteration complexity of Adam without momentum is $\mathcal{O}(\min(d, \kappa))$.
no code implementations • 15 Dec 2023 • Naman Agarwal, Satyen Kale, Karan Singh, Abhradeep Guha Thakurta
We study the task of $(\epsilon, \delta)$-differentially private online convex optimization (OCO).
2 code implementations • 11 Dec 2023 • Naman Agarwal, Daniel Suo, Xinyi Chen, Elad Hazan
This paper studies sequence modeling for prediction tasks with long range dependencies.
no code implementations • 23 Sep 2023 • Ankit Jha, Debabrata Pal, Mainak Singha, Naman Agarwal, Biplab Banerjee
Even though joint training of audio-visual modalities improves classification performance in a low-data regime, it has yet to be thoroughly investigated in the RS domain.
3 code implementations • 12 Jun 2023 • George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark.
no code implementations • 12 Dec 2022 • Naman Agarwal, Brian Bullins, Karan Singh
We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space.
no code implementations • 21 Nov 2022 • Gautam Goel, Naman Agarwal, Karan Singh, Elad Hazan
We consider the fundamental problem of online control of a linear dynamical system from two different viewpoints: regret minimization and competitive analysis.
no code implementations • 11 Oct 2022 • Naman Agarwal, Prateek Jain, Suhas Kowshik, Dheeraj Nagaraj, Praneeth Netrapalli
In this work, we consider the problem of collaborative multi-user reinforcement learning.
no code implementations • 29 Jul 2022 • Jeremy M. Cohen, Behrooz Ghorbani, Shankar Krishnan, Naman Agarwal, Sourabh Medapati, Michal Badura, Daniel Suo, David Cardoze, Zachary Nado, George E. Dahl, Justin Gilmer
Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning.
no code implementations • 6 Feb 2022 • Julian Zimmert, Naman Agarwal, Satyen Kale
This algorithm, called SCHRODINGER'S BISONS, is the first efficient algorithm with polylogarithmic regret for this more general problem.
no code implementations • 19 Nov 2021 • Daniel Suo, Cyril Zhang, Paula Gradu, Udaya Ghai, Xinyi Chen, Edgar Minasyan, Naman Agarwal, Karan Singh, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, Elad Hazan
Mechanical ventilation is one of the most widely used therapies in the ICU.
no code implementations • ICLR 2022 • Naman Agarwal, Syomantak Chaudhuri, Prateek Jain, Dheeraj Nagaraj, Praneeth Netrapalli
The starting point of our work is the observation that in practice, Q-learning is used with two important modifications: (i) training with two networks, called online network and target network simultaneously (online target learning, or OTL) , and (ii) experience replay (ER) (Mnih et al., 2015).
1 code implementation • NeurIPS 2021 • Naman Agarwal, Peter Kairouz, Ziyu Liu
We introduce the multi-dimensional Skellam mechanism, a discrete differential privacy mechanism based on the difference of two independent Poisson random variables.
no code implementations • 6 Oct 2021 • Naman Agarwal, Satyen Kale, Julian Zimmert
Previous work (Foster et al., 2018) has highlighted the importance of improper predictors for achieving "fast rates" in the online multiclass logistic regression problem without suffering exponentially from secondary problem parameters, such as the norm of the predictors in the comparison class.
no code implementations • 29 Sep 2021 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
In the empirical science of training large neural networks, the learning rate schedule is a notoriously challenging-to-tune hyperparameter, which can depend on all other properties (architecture, optimizer, batch size, dataset, regularization, ...) of the problem.
no code implementations • 1 Mar 2021 • Naman Agarwal, Surbhi Goel, Cyril Zhang
In practical applications of iterative first-order optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune.
no code implementations • 26 Feb 2021 • Naman Agarwal, Elad Hazan, Anirudha Majumdar, Karan Singh
We consider the setting of iterative learning control, or model-based policy learning in the presence of uncertain, time-varying dynamics.
1 code implementation • 19 Feb 2021 • Paula Gradu, John Hallman, Daniel Suo, Alex Yu, Naman Agarwal, Udaya Ghai, Karan Singh, Cyril Zhang, Anirudha Majumdar, Elad Hazan
We present an open-source library of natively differentiable physics and robotics environments, accompanied by gradient-based control methods and a benchmark-ing suite.
2 code implementations • 12 Feb 2021 • Daniel Suo, Naman Agarwal, Wenhan Xia, Xinyi Chen, Udaya Ghai, Alexander Yu, Paula Gradu, Karan Singh, Cyril Zhang, Edgar Minasyan, Julienne LaChance, Tom Zajdel, Manuel Schottdorf, Daniel Cohen, Elad Hazan
We consider the problem of controlling an invasive mechanical ventilator for pressure-controlled ventilation: a controller must let air in and out of a sedated patient's lungs according to a trajectory of airway pressures specified by a clinician.
no code implementations • NeurIPS 2020 • Naman Agarwal, Rohan Anil, Tomer Koren, Kunal Talwar, Cyril Zhang
State-of-the-art optimization is steadily shifting towards massively parallel pipelines with extremely large batch sizes.
1 code implementation • 26 Feb 2020 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
We investigate several confounding factors in the evaluation of optimization algorithms for deep learning.
no code implementations • 4 Feb 2020 • Naman Agarwal, Pranjal Awasthi, Satyen Kale
We study the role of depth in training randomly initialized overparameterized neural networks.
no code implementations • ICLR 2020 • Naman Agarwal, Rohan Anil, Elad Hazan, Tomer Koren, Cyril Zhang
A commonplace belief in the machine learning community is that using adaptive gradient methods hurts generalization.
no code implementations • NeurIPS 2019 • Naman Agarwal, Elad Hazan, Karan Singh
We study optimal regret bounds for control in linear dynamical systems under adversarially changing strongly convex cost functions, given the knowledge of transition dynamics.
no code implementations • ICML 2020 • Naman Agarwal, Nataly Brukhim, Elad Hazan, Zhou Lu
We study the question of how to aggregate controllers for dynamical systems in order to improve their performance.
no code implementations • 23 Feb 2019 • Naman Agarwal, Brian Bullins, Elad Hazan, Sham M. Kakade, Karan Singh
We study the control of a linear dynamical system with adversarial disturbances (as opposed to statistical noise).
no code implementations • ICLR 2020 • Xinyi Chen, Naman Agarwal, Elad Hazan, Cyril Zhang, Yi Zhang
State-of-the-art models are now trained with billions of parameters, reaching hardware limits in terms of memory consumption.
no code implementations • 17 Oct 2018 • Naman Agarwal, Alon Gonen, Elad Hazan
We consider online learning in an adversarial, non-convex setting under the assumption that the learner has an access to an offline optimization oracle.
no code implementations • ICLR 2019 • Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang
Due to the large number of parameters of machine learning problems, full-matrix preconditioning methods are prohibitively expensive.
no code implementations • NeurIPS 2018 • Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan McMahan
Distributed stochastic gradient descent is an important subroutine in distributed learning.
no code implementations • 21 May 2018 • Naman Agarwal, Alon Gonen
We derive optimal statistical and computational complexity bounds for exp-concave stochastic minimization in terms of the effective dimension.
no code implementations • 22 Nov 2017 • Naman Agarwal, Sham Kakade, Rahul Kidambi, Yin Tat Lee, Praneeth Netrapalli, Aaron Sidford
Given a matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$ and a vector $b \in\mathbb{R}^{d}$, we show how to compute an $\epsilon$-approximate solution to the regression problem $ \min_{x\in\mathbb{R}^{d}}\frac{1}{2} \|\mathbf{A} x - b\|_{2}^{2} $ in time $ \tilde{O} ((n+\sqrt{d\cdot\kappa_{\text{sum}}})\cdot s\cdot\log\epsilon^{-1}) $ where $\kappa_{\text{sum}}=\mathrm{tr}\left(\mathbf{A}^{\top}\mathbf{A}\right)/\lambda_{\min}(\mathbf{A}^{T}\mathbf{A})$ and $s$ is the maximum number of non-zero entries in a row of $\mathbf{A}$.
no code implementations • 27 Oct 2017 • Naman Agarwal, Elad Hazan
State-of-the-art methods in convex and non-convex optimization employ higher-order derivative information, either implicitly or explicitly.
no code implementations • ICML 2017 • Naman Agarwal, Karan Singh
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal $\tilde{O}(\sqrt{T})$ regret bounds.
1 code implementation • 3 Nov 2016 • Naman Agarwal, Zeyuan Allen-Zhu, Brian Bullins, Elad Hazan, Tengyu Ma
We design a non-convex second-order optimization algorithm that is guaranteed to return an approximate local minimum in time which scales linearly in the underlying dimension and the number of training examples.
4 code implementations • 12 Feb 2016 • Naman Agarwal, Brian Bullins, Elad Hazan
First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity.
no code implementations • 8 Jul 2015 • Naman Agarwal, Afonso S. Bandeira, Konstantinos Koiliaris, Alexandra Kolla
We consider the problem of identifying underlying community-like structures in graphs.