no code implementations • NeurIPS 2023 • Jincheng Mei, Bo Dai, Alekh Agarwal, Mohammad Ghavamzadeh, Csaba Szepesvari, Dale Schuurmans
We prove that, for finite-arm bandits with linear function approximation, the global convergence of policy gradient (PG) methods depends on inter-related properties between the policy update and the representation.
no code implementations • 10 Mar 2025 • Dhawal Gupta, Adam Fisch, Christoph Dann, Alekh Agarwal
This work tackles the problem of overoptimization in reinforcement learning from human feedback (RLHF), a prevalent technique for aligning models with human preferences.
no code implementations • 21 Feb 2025 • Lior Belenki, Alekh Agarwal, Tianze Shi, Kristina Toutanova
We propose a method to optimize language model pre-training data mixtures through efficient approximation of the cross-entropy loss corresponding to each candidate mixture via a Mixture of Data Experts (MDE).
no code implementations • 11 Feb 2025 • Jincheng Mei, Bo Dai, Alekh Agarwal, Sharan Vaswani, Anant Raj, Csaba Szepesvari, Dale Schuurmans
The proofs are based on novel findings about action sampling rates and the relationship between cumulative progress and noise, and extend the current understanding of how simple stochastic gradient methods behave in bandit settings.
no code implementations • 8 Feb 2025 • Alekh Agarwal, Christoph Dann, Teodor V. Marinov
Offline algorithms for Reinforcement Learning from Human Preferences (RLHF), which use only a fixed dataset of sampled responses given an input, and preference feedback among these responses, have gained increasing prominence in the literature on aligning language models.
no code implementations • 4 Feb 2025 • Chenlu Ye, Yujia Jin, Alekh Agarwal, Tong Zhang
When the variance of the reward at each round is known, we use a variance-weighted regression approach and establish a regret bound that depends only on the cumulative reward variance and logarithmically on the reward range $R$ as well as the number of rounds $T$.
no code implementations • 18 Nov 2024 • Navodita Sharma, Vishnu Vinod, Abhradeep Thakurta, Alekh Agarwal, Borja Balle, Christoph Dann, Aravindan Raghuveer
The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment.
no code implementations • 10 Oct 2024 • Amrith Setlur, Chirag Nagpal, Adam Fisch, Xinyang Geng, Jacob Eisenstein, Rishabh Agarwal, Alekh Agarwal, Jonathan Berant, Aviral Kumar
Our key insight is that, to be effective, the process reward for a step should measure progress: a change in the likelihood of producing a correct response in the future, before and after taking the step, corresponding to the notion of step-level advantages in RL.
no code implementations • 22 Jul 2024 • Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent
Reward-based finetuning is crucial for aligning language policies with intended behaviors (e. g., creativity and safety).
no code implementations • 29 May 2024 • Adam Fisch, Jacob Eisenstein, Vicky Zayats, Alekh Agarwal, Ahmad Beirami, Chirag Nagpal, Pete Shaw, Jonathan Berant
Moreover, to account for uncertainty in the reward model we are distilling from, we optimize against a family of reward models that, as a whole, is likely to include at least one reasonable proxy for the preference distribution.
no code implementations • 28 Mar 2024 • Teodor V. Marinov, Alekh Agarwal, Mircea Trofin
This work studies a Reinforcement Learning (RL) problem in which we are given a set of trajectories collected with K baseline policies.
no code implementations • 27 Feb 2024 • Jincheng Mei, Zixin Zhong, Bo Dai, Alekh Agarwal, Csaba Szepesvari, Dale Schuurmans
We show that the \emph{stochastic gradient} bandit algorithm converges to a \emph{globally optimal} policy at an $O(1/t)$ rate, even with a \emph{constant} step size.
no code implementations • 11 Feb 2024 • Kaiwen Wang, Owen Oertell, Alekh Agarwal, Nathan Kallus, Wen Sun
Second-order bounds are instance-dependent bounds that scale with the variance of return, which we prove are tighter than the previously known small-loss bounds of distributional RL.
Distributional Reinforcement Learning
Multi-Armed Bandits
+1
no code implementations • 8 Jan 2024 • Gokul Swamy, Christoph Dann, Rahul Kidambi, Zhiwei Steven Wu, Alekh Agarwal
Our approach is maximalist in that it provably handles non-Markovian, intransitive, and stochastic preferences while being robust to the compounding errors that plague offline approaches to sequential prediction.
no code implementations • 3 Jan 2024 • Ahmad Beirami, Alekh Agarwal, Jonathan Berant, Alexander D'Amour, Jacob Eisenstein, Chirag Nagpal, Ananda Theertha Suresh
A simple and effective method for the inference-time alignment of generative models is the best-of-$n$ policy, where $n$ samples are drawn from a reference policy, ranked based on a reward function, and the highest ranking one is selected.
1 code implementation • 14 Dec 2023 • Jacob Eisenstein, Chirag Nagpal, Alekh Agarwal, Ahmad Beirami, Alex D'Amour, DJ Dvijotham, Adam Fisch, Katherine Heller, Stephen Pfohl, Deepak Ramachandran, Peter Shaw, Jonathan Berant
However, even pretrain reward ensembles do not eliminate reward hacking: we show several qualitative reward hacking phenomena that are not mitigated by ensembling because all reward models in the ensemble exhibit similar error patterns.
no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text.
no code implementations • 26 May 2023 • Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth
We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization.
1 code implementation • 17 Mar 2023 • Alekh Agarwal, H. Brendan McMahan, Zheng Xu
As the adoption of federated learning increases for learning from sensitive data local to user devices, it is natural to ask if the learning can be done using implicit signals generated as users interact with the applications of interest, rather than requiring access to explicit labels which can be difficult to acquire in many tasks.
no code implementations • 7 Feb 2023 • Alekh Agarwal, Claudio Gentile, Teodor V. Marinov
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
no code implementations • 31 Jan 2023 • Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
no code implementations • 12 Dec 2022 • Alekh Agarwal, Yujia Jin, Tong Zhang
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards.
no code implementations • 21 Jun 2022 • Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions.
no code implementations • 15 Jun 2022 • Alekh Agarwal, Tong Zhang
We propose a general framework to design posterior sampling methods for model-based RL.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.
no code implementations • 15 Mar 2022 • Alekh Agarwal, Tong Zhang
Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear.
no code implementations • 11 Feb 2022 • Alekh Agarwal, Tong Zhang
We instead propose an alternative method called Minimax Regret Optimization (MRO), and show that under suitable conditions this method achieves uniformly low regret across all test distributions.
3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
no code implementations • 24 Mar 2021 • Andrea Zanette, Ching-An Cheng, Alekh Agarwal
Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.
1 code implementation • 22 Mar 2021 • Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang
Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces.
no code implementations • 19 Mar 2021 • Juan C. Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett
We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension.
no code implementations • 14 Feb 2021 • Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
In this work, we present the first model-free representation learning algorithms for low rank MDPs.
no code implementations • NeurIPS 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • 16 Jul 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • NeurIPS 2020 • Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.
no code implementations • NeurIPS 2020 • Ching-An Cheng, Andrey Kolobov, Alekh Agarwal
In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.
1 code implementation • NeurIPS 2020 • Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
no code implementations • 19 Jun 2020 • Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke, Ryen W. White
Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior.
no code implementations • 18 Jun 2020 • Dilip Arumugam, Debadeepta Dey, Alekh Agarwal, Asli Celikyilmaz, Elnaz Nouri, Bill Dolan
While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success.
no code implementations • NeurIPS 2020 • Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.
no code implementations • 28 Mar 2020 • Alekh Agarwal, John Langford, Chen-Yu Wei
We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.
no code implementations • 4 Mar 2020 • Chen-Yu Wei, Haipeng Luo, Alekh Agarwal
We initiate the study of learning in contextual bandits with the help of loss predictors.
no code implementations • 1 Aug 2019 • Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.
2 code implementations • NeurIPS 2019 • Aditya Grover, Jiaming Song, Alekh Agarwal, Kenneth Tran, Ashish Kapoor, Eric Horvitz, Stefano Ermon
A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio under model and true distributions.
no code implementations • 10 Jun 2019 • Alekh Agarwal, Sham Kakade, Lin F. Yang
In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.
6 code implementations • ICLR 2020 • Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal
We design a new algorithm for batch active learning with deep neural network models.
4 code implementations • 30 May 2019 • Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu
Our schemes only require access to standard risk minimization algorithms (such as standard classification or least-squares regression) while providing theoretical guarantees on the optimality and fairness of the obtained solutions.
no code implementations • 12 May 2019 • Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.
no code implementations • 17 Apr 2019 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric Horvitz, Stefano Ermon
A standard technique to correct this bias is by importance weighting samples from the model by the likelihood ratio under the model and true distributions.
1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
1 code implementation • 2 Jan 2019 • Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban
We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.
no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.
3 code implementations • ICML 2018 • Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach
We present a systematic approach for achieving fairness in a binary classification setting.
no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.
no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
We study the computational tractability of PAC reinforcement learning with rich observations.
no code implementations • ICML 2018 • Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III
We study how to effectively leverage expert feedback to learn sequential decision-making policies.
no code implementations • 17 Feb 2018 • Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke
Effective optimization is essential for interactive systems to provide a satisfactory user experience.
1 code implementation • 12 Feb 2018 • Alberto Bietti, Alekh Agarwal, John Langford
Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.
no code implementations • 5 Aug 2017 • Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford
In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.
no code implementations • ICML 2017 • Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford
We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.
1 code implementation • 19 Dec 2016 • Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.
2 code implementations • ICML 2017 • Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik
We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.
no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.
no code implementations • 13 Jun 2016 • Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins
The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.
1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni
This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.
1 code implementation • 14 Mar 2016 • David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire
We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.
no code implementations • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, John Langford
We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.
no code implementations • NeurIPS 2016 • Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford
We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.
no code implementations • NeurIPS 2015 • Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.
no code implementations • NeurIPS 2015 • Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.
1 code implementation • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik
We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback.
no code implementations • 8 Feb 2015 • Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford
Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.
no code implementations • NeurIPS 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • 2 Oct 2014 • Alekh Agarwal, Leon Bottou
This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $\mu$-strongly convex.
no code implementations • 2 Oct 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli
Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford
We leverage the same observation to build a generic strategy for parallelizing learning algorithms.
no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant
This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.
no code implementations • 8 Sep 2013 • Alekh Agarwal, Animashree Anandkumar, Praneeth Netrapalli
We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements.
no code implementations • NeurIPS 2012 • Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse.
no code implementations • NeurIPS 2011 • Alekh Agarwal, John C. Duchi
We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information.
2 code implementations • 19 Oct 2011 • Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford
We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}
1 code implementation • NeurIPS 2011 • Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, Alexander Rakhlin
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\xset$ under a stochastic bandit feedback model.
no code implementations • NeurIPS 2010 • Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
Many statistical $M$-estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer.
no code implementations • NeurIPS 2010 • Alekh Agarwal, Martin J. Wainwright, John C. Duchi
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.
no code implementations • 12 May 2010 • John Duchi, Alekh Agarwal, Martin Wainwright
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.
no code implementations • NeurIPS 2009 • Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.