Search Results for author: Robert E. Schapire

Found 26 papers, 5 papers with code

Provable Interactive Learning with Hindsight Instruction Feedback

no code implementations14 Apr 2024 Dipendra Misra, Aldo Pacchiano, Robert E. Schapire

We study interactive learning in a setting where the agent has to generate a response (e. g., an action or trajectory) given a context and an instruction.

Provably Sample-Efficient RL with Side Information about Latent Dynamics

no code implementations27 May 2022 Yao Liu, Dipendra Misra, Miro Dudík, Robert E. Schapire

We study reinforcement learning (RL) in settings where observations are high-dimensional, but where an RL agent has access to abstract knowledge about the structure of the state space, as is the case, for example, when a robot is tasked to go to a specific room in a building using observations from its own camera, while having access to the floor plan.

reinforcement-learning Reinforcement Learning (RL) +1

Convex Analysis at Infinity: An Introduction to Astral Space

no code implementations6 May 2022 Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity.

Multiclass Boosting and the Cost of Weak Learning

no code implementations NeurIPS 2021 Nataly Brukhim, Elad Hazan, Shay Moran, Indraneel Mukherjee, Robert E. Schapire

Here, we focus on an especially natural formulation in which the weak hypotheses are assumed to belong to an ''easy-to-learn'' base class, and the weak learner is an agnostic PAC learner for that class with respect to the standard classification loss.

Bayesian decision-making under misspecified priors with applications to meta-learning

no code implementations NeurIPS 2021 Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

We prove that the expected reward accrued by Thompson sampling (TS) with a misspecified prior differs by at most $\tilde{\mathcal{O}}(H^2 \epsilon)$ from TS with a well specified prior, where $\epsilon$ is the total-variation distance between priors and $H$ is the learning horizon.

Decision Making Meta-Learning +1

Gradient descent follows the regularization path for general losses

no code implementations19 Jun 2020 Ziwei Ji, Miroslav Dudík, Robert E. Schapire, Matus Telgarsky

Recent work across many machine learning disciplines has highlighted that standard descent methods, even without explicit regularization, do not merely minimize the training error, but also exhibit an implicit bias.

Practical Contextual Bandits with Regression Oracles

no code implementations ICML 2018 Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.

General Classification Multi-Armed Bandits +1

Corralling a Band of Bandit Algorithms

1 code implementation19 Dec 2016 Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.

Multi-Armed Bandits

Oracle-Efficient Online Learning and Auction Design

no code implementations5 Nov 2016 Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle.

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

no code implementations ICML 2017 Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.

Efficient Exploration reinforcement-learning +2

Exploratory Gradient Boosting for Reinforcement Learning in Complex Domains

1 code implementation14 Mar 2016 David Abel, Alekh Agarwal, Fernando Diaz, Akshay Krishnamurthy, Robert E. Schapire

We address both of these challenges with two complementary techniques: First, we develop a gradient-boosting style, non-parametric function approximator for learning on $Q$-function residuals.

reinforcement-learning Reinforcement Learning +1

Unsupervised Domain Adaptation Using Approximate Label Matching

no code implementations16 Feb 2016 Jordan T. Ash, Robert E. Schapire, Barbara E. Engelhardt

Domain adaptation addresses the problem created when training data is generated by a so-called source distribution, but test data is generated by a significantly different target distribution.

Unsupervised Domain Adaptation

Efficient Algorithms for Adversarial Contextual Learning

no code implementations8 Feb 2016 Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem.

Combinatorial Optimization

Fast Convergence of Regularized Learning in Games

no code implementations NeurIPS 2015 Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.

Efficient and Parsimonious Agnostic Active Learning

no code implementations NeurIPS 2015 Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.

Active Learning General Classification

Contextual Dueling Bandits

no code implementations23 Feb 2015 Miroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins, Masrour Zoghi

The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space.

Achieving All with No Parameters: Adaptive NormalHedge

no code implementations20 Feb 2015 Haipeng Luo, Robert E. Schapire

We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information.

A Drifting-Games Analysis for Online Learning and Applications to Boosting

no code implementations NeurIPS 2014 Haipeng Luo, Robert E. Schapire

Different online learning settings (Hedge, multi-armed bandit problems and online convex optimization) are studied by converting into various kinds of drifting games.

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation4 Feb 2014 Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

Towards Minimax Online Learning with Unknown Time Horizon

no code implementations31 Jul 2013 Haipeng Luo, Robert E. Schapire

We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon.

A Reduction from Apprenticeship Learning to Classification

no code implementations NeurIPS 2010 Umar Syed, Robert E. Schapire

We provide new theoretical results for apprenticeship learning, a variant of reinforcement learning in which the true reward function is unknown, and the goal is to perform well relative to an observed expert.

Classification General Classification +1

Non-Stochastic Bandit Slate Problems

no code implementations NeurIPS 2010 Satyen Kale, Lev Reyzin, Robert E. Schapire

We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions.

A Contextual-Bandit Approach to Personalized News Article Recommendation

11 code implementations28 Feb 2010 Lihong Li, Wei Chu, John Langford, Robert E. Schapire

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

Collaborative Filtering Learning Theory

Faster solutions of the inverse pairwise Ising problem

1 code implementation14 Dec 2007 Tamara Broderick, Miroslav Dudik, Gasper Tkacik, Robert E. Schapire, William Bialek

Recent work has shown that probabilistic models based on pairwise interactions-in the simplest case, the Ising model-provide surprisingly accurate descriptions of experiments on real biological networks ranging from neurons to genes.

Cannot find the paper you are looking for? You can Submit a new open access paper.