Search Results for author: Alekh Agarwal

Found 71 papers, 21 papers with code

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

no code implementations21 Jun 2022 Jinglin Chen, Aditya Modi, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal

We study reward-free reinforcement learning (RL) under general non-linear function approximation, and establish sample efficiency and hardness results under various standard structural assumptions.

reinforcement-learning

Model-based RL with Optimistic Posterior Sampling: Structural Conditions and Sample Complexity

no code implementations15 Jun 2022 Alekh Agarwal, Tong Zhang

We propose a general framework to design posterior sampling methods for model-based RL.

Provable Benefits of Representational Transfer in Reinforcement Learning

1 code implementation29 May 2022 Alekh Agarwal, Yuda Song, Wen Sun, Kaiwen Wang, Mengdi Wang, Xuezhou Zhang

We study the problem of representational transfer in RL, where an agent first pretrains in a number of source tasks to discover a shared representation, which is subsequently used to learn a good policy in a target task.

reinforcement-learning Representation Learning

Non-Linear Reinforcement Learning in Large Action Spaces: Structural Conditions and Sample-efficiency of Posterior Sampling

no code implementations15 Mar 2022 Alekh Agarwal, Tong Zhang

Provably sample-efficient Reinforcement Learning (RL) with rich observations and function approximation has witnessed tremendous recent progress, particularly when the underlying function approximators are linear.

Minimax Regret Optimization for Robust Machine Learning under Distribution Shift

no code implementations11 Feb 2022 Alekh Agarwal, Tong Zhang

We instead propose an alternative method called Minimax Regret Optimization (MRO), and show that under suitable conditions this method achieves uniformly low regret across all test distributions.

Learning Theory

Adversarially Trained Actor Critic for Offline Reinforcement Learning

1 code implementation5 Feb 2022 Ching-An Cheng, Tengyang Xie, Nan Jiang, Alekh Agarwal

We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning under insufficient data coverage, based on a two-player Stackelberg game framing of offline RL: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy.

Continuous Control Offline RL +1

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

1 code implementation31 Jan 2022 Xuezhou Zhang, Yuda Song, Masatoshi Uehara, Mengdi Wang, Alekh Agarwal, Wen Sun

We present BRIEE (Block-structured Representation learning with Interleaved Explore Exploit), an algorithm for efficient reinforcement learning in Markov Decision Processes with block-structured dynamics (i. e., Block MDPs), where rich observations are generated from a set of unknown latent states.

reinforcement-learning Representation Learning

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

no code implementations17 Oct 2021 Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Representation Learning

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics

no code implementations ICLR 2022 Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Representation Learning

Bellman-consistent Pessimism for Offline Reinforcement Learning

no code implementations NeurIPS 2021 Tengyang Xie, Ching-An Cheng, Nan Jiang, Paul Mineiro, Alekh Agarwal

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has recently gained prominence in offline reinforcement learning.

reinforcement-learning

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

no code implementations24 Mar 2021 Andrea Zanette, Ching-An Cheng, Alekh Agarwal

Policy optimization methods are popular reinforcement learning algorithms, because their incremental and on-policy nature makes them more stable than the value-based counterparts.

reinforcement-learning

Provably Correct Optimization and Exploration with Non-linear Policies

1 code implementation22 Mar 2021 Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang

Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces.

Towards a Dimension-Free Understanding of Adaptive Linear Control

no code implementations19 Mar 2021 Juan C. Perdomo, Max Simchowitz, Alekh Agarwal, Peter Bartlett

We study the problem of adaptive control of the linear quadratic regulator for systems in very high, or even infinite dimension.

Provably Good Batch Off-Policy Reinforcement Learning Without Great Exploration

no code implementations NeurIPS 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

1 code implementation NeurIPS 2020 Alekh Agarwal, Mikael Henaff, Sham Kakade, Wen Sun

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies.

Policy Gradient Methods Q-Learning

Provably Good Batch Reinforcement Learning Without Great Exploration

1 code implementation16 Jul 2020 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

Doing batch RL in a way that yields a reliable new policy in large domains is challenging: a new decision policy may visit states and actions outside the support of the batch data, and function approximation and optimization with limited samples can further increase the potential of learning policies with overly optimistic estimates of their future performance.

reinforcement-learning

Policy Improvement via Imitation of Multiple Oracles

no code implementations NeurIPS 2020 Ching-An Cheng, Andrey Kolobov, Alekh Agarwal

In this paper, we propose the state-wise maximum of the oracle policies' values as a natural baseline to resolve conflicting advice from multiple oracles.

Imitation Learning online learning

Optimizing Interactive Systems via Data-Driven Objectives

no code implementations19 Jun 2020 Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke, Ryen W. White

Effective optimization is essential for real-world interactive systems to provide a satisfactory user experience in response to changing user behavior.

Reparameterized Variational Divergence Minimization for Stable Imitation

1 code implementation18 Jun 2020 Dilip Arumugam, Debadeepta Dey, Alekh Agarwal, Asli Celikyilmaz, Elnaz Nouri, Bill Dolan

While recent state-of-the-art results for adversarial imitation-learning algorithms are encouraging, recent works exploring the imitation learning from observation (ILO) setting, where trajectories \textit{only} contain expert observations, have not been met with the same success.

Continuous Control Imitation Learning +1

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

no code implementations NeurIPS 2020 Alekh Agarwal, Sham Kakade, Akshay Krishnamurthy, Wen Sun

In order to deal with the curse of dimensionality in reinforcement learning (RL), it is common practice to make parametric assumptions where values or policies are functions of some low dimensional feature space.

reinforcement-learning Representation Learning

Federated Residual Learning

no code implementations28 Mar 2020 Alekh Agarwal, John Langford, Chen-Yu Wei

We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.

Federated Learning

Taking a hint: How to leverage loss predictors in contextual bandits?

no code implementations4 Mar 2020 Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

We initiate the study of learning in contextual bandits with the help of loss predictors.

Multi-Armed Bandits

On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift

no code implementations1 Aug 2019 Alekh Agarwal, Sham M. Kakade, Jason D. Lee, Gaurav Mahajan

Policy gradient methods are among the most effective methods in challenging reinforcement learning problems with large state and/or action spaces.

Policy Gradient Methods

Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting

2 code implementations NeurIPS 2019 Aditya Grover, Jiaming Song, Alekh Agarwal, Kenneth Tran, Ashish Kapoor, Eric Horvitz, Stefano Ermon

A standard technique to correct this bias is importance sampling, where samples from the model are weighted by the likelihood ratio under model and true distributions.

Data Augmentation

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

no code implementations10 Jun 2019 Alekh Agarwal, Sham Kakade, Lin F. Yang

In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.

Model-based Reinforcement Learning reinforcement-learning

Fair Regression: Quantitative Definitions and Reduction-based Algorithms

3 code implementations30 May 2019 Alekh Agarwal, Miroslav Dudík, Zhiwei Steven Wu

Our schemes only require access to standard risk minimization algorithms (such as standard classification or least-squares regression) while providing theoretical guarantees on the optimality and fairness of the obtained solutions.

Fairness

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

no code implementations12 May 2019 Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system.

Decision Making reinforcement-learning

Off-Policy Policy Gradient with State Distribution Correction

no code implementations17 Apr 2019 Yao Liu, Adith Swaminathan, Alekh Agarwal, Emma Brunskill

We study the problem of off-policy policy optimization in Markov decision processes, and develop a novel off-policy policy gradient method.

Bias Correction of Learned Generative Models via Likelihood-free Importance Weighting

no code implementations ICLR Workshop DeepGenStruct 2019 Aditya Grover, Jiaming Song, Ashish Kapoor, Kenneth Tran, Alekh Agarwal, Eric Horvitz, Stefano Ermon

A standard technique to correct this bias is by importance weighting samples from the model by the likelihood ratio under the model and true distributions.

Data Augmentation

Provably efficient RL with Rich Observations via Latent State Decoding

1 code implementation25 Jan 2019 Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.

Q-Learning

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

1 code implementation2 Jan 2019 Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban

We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.

Multi-Armed Bandits

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations21 Nov 2018 Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

Model-based Reinforcement Learning

Practical Contextual Bandits with Regression Oracles

no code implementations ICML 2018 Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.

General Classification Multi-Armed Bandits

Learning Data-Driven Objectives to Optimize Interactive Systems

no code implementations17 Feb 2018 Ziming Li, Julia Kiseleva, Alekh Agarwal, Maarten de Rijke

Effective optimization is essential for interactive systems to provide a satisfactory user experience.

A Contextual Bandit Bake-off

1 code implementation12 Feb 2018 Alberto Bietti, Alekh Agarwal, John Langford

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.

Efficient Contextual Bandits in Non-stationary Worlds

no code implementations5 Aug 2017 Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.

Multi-Armed Bandits

Active Learning for Cost-Sensitive Classification

no code implementations ICML 2017 Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford

We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.

Active Learning Classification +1

Corralling a Band of Bandit Algorithms

no code implementations19 Dec 2016 Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.

Multi-Armed Bandits online learning

Optimal and Adaptive Off-policy Evaluation in Contextual Bandits

2 code implementations ICML 2017 Yu-Xiang Wang, Alekh Agarwal, Miroslav Dudik

We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.

Multi-Armed Bandits

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

no code implementations ICML 2017 Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.

Efficient Exploration reinforcement-learning

Making Contextual Decisions with Low Technical Debt

no code implementations13 Jun 2016 Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.

Multi-Armed Bandits online learning

Off-policy evaluation for slate recommendation

1 code implementation NeurIPS 2017 Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Learning-To-Rank

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

1 code implementation14 Mar 2016 David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.

reinforcement-learning

PAC Reinforcement Learning with Rich Observations

no code implementations NeurIPS 2016 Akshay Krishnamurthy, Alekh Agarwal, John Langford

We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.

Decision Making Multi-Armed Bandits +1

Efficient Second Order Online Learning by Sketching

no code implementations NeurIPS 2016 Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.

online learning

Fast Convergence of Regularized Learning in Games

no code implementations NeurIPS 2015 Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.

Efficient and Parsimonious Agnostic Active Learning

no code implementations NeurIPS 2015 Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.

Active Learning General Classification

Contextual Semibandits via Supervised Learning Oracles

1 code implementation NeurIPS 2016 Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudik

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this feedback.

Decision Making Learning-To-Rank

Learning to Search Better Than Your Teacher

no code implementations8 Feb 2015 Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.

Multi-Armed Bandits Structured Prediction

A Lower Bound for the Optimization of Finite Sums

no code implementations2 Oct 2014 Alekh Agarwal, Leon Bottou

This paper presents a lower bound for optimizing a finite sum of $n$ functions, where each function is $L$-smooth and the sum is $\mu$-strongly convex.

Scalable Nonlinear Learning with Adaptive Polynomial Expansions

no code implementations2 Oct 2014 Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky

Can we effectively learn a nonlinear representation in time comparable to linear learning?

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation4 Feb 2014 Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

Para-active learning

no code implementations30 Oct 2013 Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

We leverage the same observation to build a generic strategy for parallelizing learning algorithms.

Active Learning

Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization

no code implementations30 Oct 2013 Alekh Agarwal, Animashree Anandkumar, Prateek Jain, Praneeth Netrapalli

Alternating minimization is a popular heuristic for sparse coding, where the dictionary and the coefficients are estimated in alternate steps, keeping the other fixed.

Least Squares Revisited: Scalable Approaches for Multi-class Prediction

no code implementations7 Oct 2013 Alekh Agarwal, Sham M. Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant

This work provides simple algorithms for multi-class (and multi-label) prediction in settings where both the number of examples n and the data dimension d are relatively large.

A Clustering Approach to Learn Sparsely-Used Overcomplete Dictionaries

no code implementations8 Sep 2013 Alekh Agarwal, Animashree Anandkumar, Praneeth Netrapalli

We consider the problem of learning overcomplete dictionaries in the context of sparse coding, where each sample selects a sparse subset of dictionary elements.

Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

no code implementations NeurIPS 2012 Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse.

Stochastic Optimization

Stochastic convex optimization with bandit feedback

no code implementations NeurIPS 2011 Alekh Agarwal, Dean P. Foster, Daniel J. Hsu, Sham M. Kakade, Alexander Rakhlin

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $X$ under a stochastic bandit feedback model.

Distributed Delayed Stochastic Optimization

no code implementations NeurIPS 2011 Alekh Agarwal, John C. Duchi

We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information.

Distributed Optimization

A Reliable Effective Terascale Linear Learning System

2 code implementations19 Oct 2011 Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}

Fast global convergence rates of gradient methods for high-dimensional statistical recovery

no code implementations NeurIPS 2010 Alekh Agarwal, Sahand Negahban, Martin J. Wainwright

Many statistical $M$-estimators are based on convex optimization problems formed by the weighted sum of a loss function with a norm-based regularizer.

Distributed Dual Averaging In Networks

no code implementations NeurIPS 2010 Alekh Agarwal, Martin J. Wainwright, John C. Duchi

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

no code implementations12 May 2010 John Duchi, Alekh Agarwal, Martin Wainwright

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication.

Distributed Optimization

Information-theoretic lower bounds on the oracle complexity of convex optimization

no code implementations NeurIPS 2009 Alekh Agarwal, Martin J. Wainwright, Peter L. Bartlett, Pradeep K. Ravikumar

The extensive use of convex optimization in machine learning and statistics makes such an understanding critical to understand fundamental computational limits of learning and estimation.

Cannot find the paper you are looking for? You can Submit a new open access paper.