no code implementations • 16 Nov 2023 • Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova
Understanding visually situated language requires recognizing text and visual elements, and interpreting complex layouts.
no code implementations • 26 May 2023 • Jacob Abernethy, Alekh Agarwal, Teodor V. Marinov, Manfred K. Warmuth
We study the phenomenon of \textit{in-context learning} (ICL) exhibited by large language models, where they can adapt to a new learning task, given a handful of labeled examples, without any explicit parameter optimization.
1 code implementation • 17 Mar 2023 • Alekh Agarwal, H. Brendan McMahan, Zheng Xu
As the adoption of federated learning increases for learning from sensitive data local to user devices, it is natural to ask if the learning can be done using implicit signals generated as users interact with the applications of interest, rather than requiring access to explicit labels which can be difficult to acquire in many tasks.
no code implementations • 7 Feb 2023 • Alekh Agarwal, Claudio Gentile, Teodor V. Marinov
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
no code implementations • 31 Jan 2023 • Jonathan N. Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
POMDPs capture a broad class of decision making problems, but hardness results suggest that learning is intractable even in simple settings due to the inherent partial observability.
no code implementations • 12 Dec 2022 • Alekh Agarwal, Yujia Jin, Tong Zhang
We study time-inhomogeneous episodic reinforcement learning (RL) under general function approximation and sparse rewards.
no code implementations • 21 Jun 2022 • Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions.
no code implementations • 15 Jun 2022 • Alekh Agarwal, Tong Zhang
We propose a general framework to design posterior sampling methods for model-based RL.
1 code implementation • 29 May 2022 • Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang
We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a \emph{target task}.
no code implementations • 15 Mar 2022 • Alekh Agarwal, Tong Zhang
Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear.
no code implementations • 11 Feb 2022 • Alekh Agarwal, Tong Zhang
We instead propose an alternative method called Minimax Regret Optimization (MRO), and show that under suitable conditions this method achieves uniformly low regret across all test distributions.
3 code implementations • 5 Feb 2022 • Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal
We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism.
1 code implementation • 31 Jan 2022 • Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun
We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.
no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • NeurIPS 2021 • Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.
no code implementations • 24 Mar 2021 • Andrea Zanette, Ching-An Cheng, Alekh Agarwal
Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.
1 code implementation • 22 Mar 2021 • Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang
Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces.
no code implementations • 19 Mar 2021 • Juan C. Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett
We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension.
no code implementations • 14 Feb 2021 • Aditya Modi, Jinglin Chen, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal
In this work, we present the first model-free representation learning algorithms for low rank MDPs.
no code implementations • NeurIPS 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • 16 Jul 2020 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.
1 code implementation • NeurIPS 2020 • Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.
no code implementations • NeurIPS 2020 • Ching-An Cheng, Andrey Kolobov, Alekh Agarwal
In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.
1 code implementation • NeurIPS 2020 • Matteo Turchetta, Andrey Kolobov, Shital Shah, Andreas Krause, Alekh Agarwal
In safety-critical applications, autonomous agents may need to learn in an environment where mistakes can be very costly.
no code implementations • 19 Jun 2020 • Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke, Ryen W. White
Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior.
no code implementations • 18 Jun 2020 • Dilip Arumugam, Debadeepta Dey, Alekh Agarwal, Asli Celikyilmaz, Elnaz Nouri, Bill Dolan
While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success.
no code implementations • NeurIPS 2020 • Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun
In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.
no code implementations • 28 Mar 2020 • Alekh Agarwal, John Langford, Chen-Yu Wei
We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.
no code implementations • 4 Mar 2020 • Chen-Yu Wei, Haipeng Luo, Alekh Agarwal
We initiate the study of learning in contextual bandits with the help of loss predictors.
no code implementations • 1 Aug 2019 • Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan
Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.
2 code implementations • NeurIPS 2019 • Aditya Grover, Jiaming Song, Alekh Agarwal, Kenneth Tran, Ashish Kapoor, Eric Horvitz, Stefano Ermon
A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio under model and true distributions.
no code implementations • 10 Jun 2019 • Alekh Agarwal, Sham Kakade, Lin F. Yang
In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.
Model-based Reinforcement Learning
reinforcement-learning
+1
4 code implementations • ICLR 2020 • Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal
We design a new algorithm for batch active learning with deep neural network models.
4 code implementations • 30 May 2019 • Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu
Our schemes only require access to standard risk minimization algorithms (such as standard classification or least-squares regression) while providing theoretical guarantees on the optimality and fairness of the obtained solutions.
no code implementations • 12 May 2019 • Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.
no code implementations • 17 Apr 2019 • Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill
We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.
no code implementations • ICLR Workshop DeepGenStruct 2019 • Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric Horvitz, Stefano Ermon
A standard technique to correct this bias is by importance weighting samples from the model by the likelihood ratio under the model and true distributions.
1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
1 code implementation • 2 Jan 2019 • Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban
We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.
no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.
3 code implementations • ICML 2018 • Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach
We present a systematic approach for achieving fairness in a binary classification setting.
no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.
no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
We study the computational tractability of PAC reinforcement learning with rich observations.
no code implementations • ICML 2018 • Hoang M. Le, Nan Jiang, Alekh Agarwal, Miroslav Dudík, Yisong Yue, Hal Daumé III
We study how to effectively leverage expert feedback to learn sequential decision-making policies.
no code implementations • 17 Feb 2018 • Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke
Effective optimization is essential for interactive systems to provide a satisfactory user experience.
1 code implementation • 12 Feb 2018 • Alberto Bietti, Alekh Agarwal, John Langford
Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.
no code implementations • 5 Aug 2017 • Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford
In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.
no code implementations • ICML 2017 • Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford
We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.
1 code implementation • 19 Dec 2016 • Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.
2 code implementations • ICML 2017 • Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik
We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.
no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.
no code implementations • 13 Jun 2016 • Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins
The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.
1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni
This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.
1 code implementation • 14 Mar 2016 • David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire
We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.
no code implementations • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, John Langford
We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.
no code implementations • NeurIPS 2016 • Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford
We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.
no code implementations • NeurIPS 2015 • Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.
no code implementations • NeurIPS 2015 • Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.
1 code implementation • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik
We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback.
no code implementations • 8 Feb 2015 • Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford
Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.
no code implementations • NeurIPS 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • 2 Oct 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • 2 Oct 2014 • Alekh Agarwal, Leon Bottou
This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $\mu$-strongly convex.
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli
Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford
We leverage the same observation to build a generic strategy for parallelizing learning algorithms.
no code implementations • 7 Oct 2013 • Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant
This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.
no code implementations • 8 Sep 2013 • Alekh Agarwal, Animashree Anandkumar, Praneeth Netrapalli
We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements.
no code implementations • NeurIPS 2012 • Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse.
no code implementations • NeurIPS 2011 • Alekh Agarwal, John C. Duchi
We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information.
no code implementations • NeurIPS 2011 • Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin
This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model.
2 code implementations • 19 Oct 2011 • Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford
We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}
no code implementations • NeurIPS 2010 • Alekh Agarwal, Sahand Negahban, Martin J. Wainwright
Many statistical $M$-estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer.
no code implementations • NeurIPS 2010 • Alekh Agarwal, Martin J. Wainwright, John C. Duchi
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.
no code implementations • 12 May 2010 • John Duchi, Alekh Agarwal, Martin Wainwright
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.
no code implementations • NeurIPS 2009 • Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar
The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.