no code implementations • 14 Apr 2024 • Dipendra Misra, Aldo Pacchiano, Robert E. Schapire
We study interactive learning in a setting where the agent has to generate a response (e. g., an action or trajectory) given a context and an instruction.
no code implementations • 27 May 2022 • Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire
We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan.
no code implementations • 6 May 2022 • Miroslav Dudík, Robert E. Schapire, Matus Telgarsky
Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity.
no code implementations • NeurIPS 2021 • Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire
Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an ''easy-to-learn'' base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classification loss.
no code implementations • NeurIPS 2021 • Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire
We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.
no code implementations • 19 Jun 2020 • Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky
Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias.
no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire
A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.
no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
We study the computational tractability of PAC reinforcement learning with rich observations.
1 code implementation • 19 Dec 2016 • Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire
We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.
no code implementations • 5 Nov 2016 • Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan
We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle.
no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.
no code implementations • NeurIPS 2016 • Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire
We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i. i. d.
1 code implementation • 14 Mar 2016 • David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire
We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.
no code implementations • 16 Feb 2016 • Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt
Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution.
no code implementations • 8 Feb 2016 • Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire
We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem.
no code implementations • NeurIPS 2015 • Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire
We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.
no code implementations • NeurIPS 2015 • Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.
no code implementations • 23 Feb 2015 • Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi
The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space.
no code implementations • 20 Feb 2015 • Haipeng Luo, Robert E. Schapire
We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information.
no code implementations • NeurIPS 2014 • Haipeng Luo, Robert E. Schapire
Different online learning settings (Hedge, multi-armed bandit problems and online convex optimization) are studied by converting into various kinds of drifting games.
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • 31 Jul 2013 • Haipeng Luo, Robert E. Schapire
We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon.
no code implementations • NeurIPS 2010 • Umar Syed, Robert E. Schapire
We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert.
no code implementations • NeurIPS 2010 • Satyen Kale, Lev Reyzin, Robert E. Schapire
We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions.
11 code implementations • 28 Feb 2010 • Lihong Li, Wei Chu, John Langford, Robert E. Schapire
In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
1 code implementation • 14 Dec 2007 • Tamara Broderick, Miroslav Dudik, Gasper Tkacik, Robert E. Schapire, William Bialek
Recent work has shown that probabilistic models based on pairwise interactions-in the simplest case, the Ising model-provide surprisingly accurate descriptions of experiments on real biological networks ranging from neurons to genes.