Search Results for author: John Langford

Found 62 papers, 17 papers with code

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

no code implementations17 Oct 2021 Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Representation Learning

ChaCha for Online AutoML

1 code implementation9 Jun 2021 Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.


Interaction-Grounded Learning

no code implementations9 Jun 2021 Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.

Provable Rich Observation Reinforcement Learning with Combinatorial Latent States

no code implementations ICLR 2021 Dipendra Misra, Qinghua Liu, Chi Jin, John Langford

We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).

Contrastive Learning

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations NeurIPS 2020 Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

no code implementations12 Jun 2020 Keyi Chen, John Langford, Francesco Orabona

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.

Stochastic Optimization

Federated Residual Learning

no code implementations28 Mar 2020 Alekh Agarwal, John Langford, Chen-Yu Wei

We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.

Federated Learning

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

no code implementations ICML 2020 Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space.

Representation Learning

Empirical Likelihood for Contextual Bandits

1 code implementation NeurIPS 2020 Nikos Karampatziakis, John Langford, Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.

Multi-Armed Bandits

Efficient Forward Architecture Search

2 code implementations NeurIPS 2019 Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric Horvitz, Debadeepta Dey

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers.

Feature Selection Neural Architecture Search

Provably efficient RL with Rich Observations via Latent State Decoding

1 code implementation25 Jan 2019 Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.


Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

1 code implementation2 Jan 2019 Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban

We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.

Multi-Armed Bandits

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations21 Nov 2018 Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

Model-based Reinforcement Learning

Contextual Memory Trees

no code implementations17 Jul 2018 Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

Classification General Classification +2

A Contextual Bandit Bake-off

1 code implementation12 Feb 2018 Alberto Bietti, Alekh Agarwal, John Langford

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.

Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback

1 code implementation ICLR 2018 Hal Daumé III, John Langford, Amr Sharaf

We consider reinforcement learning and bandit structured prediction problems with very sparse loss feedback: only at the end of an episode.

Multi-Armed Bandits Structured Prediction

Efficient Contextual Bandits in Non-stationary Worlds

no code implementations5 Aug 2017 Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.

Multi-Armed Bandits

Learning Deep ResNet Blocks Sequentially using Boosting Theory

no code implementations ICML 2018 Furong Huang, Jordan Ash, John Langford, Robert Schapire

We prove that the training error decays exponentially with the depth $T$ if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline.

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

1 code implementation EMNLP 2017 Dipendra Misra, John Langford, Yoav Artzi

We propose to directly map raw visual observations and text input to actions for instruction execution.

Active Learning for Cost-Sensitive Classification

no code implementations ICML 2017 Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford

We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.

Active Learning Classification +1

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

no code implementations ICML 2017 Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.

Efficient Exploration

Logarithmic Time One-Against-Some

no code implementations ICML 2017 Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.

Classification General Classification

Making Contextual Decisions with Low Technical Debt

no code implementations13 Jun 2016 Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.

Multi-Armed Bandits

Off-policy evaluation for slate recommendation

1 code implementation NeurIPS 2017 Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.


Search Improves Label for Active Learning

no code implementations NeurIPS 2016 Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang

We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not).

Active Learning

PAC Reinforcement Learning with Rich Observations

no code implementations NeurIPS 2016 Akshay Krishnamurthy, Alekh Agarwal, John Langford

We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.

Decision Making Multi-Armed Bandits

Efficient Second Order Online Learning by Sketching

no code implementations NeurIPS 2016 Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.

Efficient and Parsimonious Agnostic Active Learning

no code implementations NeurIPS 2015 Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.

Active Learning General Classification

Learning to Search for Dependencies

no code implementations18 Mar 2015 Kai-Wei Chang, He He, Hal Daumé III, John Langford

We demonstrate that a dependency parser can be built using a credit assignment compiler which removes the burden of worrying about low-level machine learning details from the parser implementation.

Doubly Robust Policy Evaluation and Optimization

no code implementations10 Mar 2015 Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.

Decision Making Multi-Armed Bandits

Learning Reductions that Really Work

no code implementations9 Feb 2015 Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.

Learning to Search Better Than Your Teacher

no code implementations8 Feb 2015 Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.

Multi-Armed Bandits Structured Prediction

Scalable Nonlinear Learning with Adaptive Polynomial Expansions

no code implementations2 Oct 2014 Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky

Can we effectively learn a nonlinear representation in time comparable to linear learning?

Conditional Probability Tree Estimation Analysis and Algorithms

no code implementations9 Aug 2014 Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alexander L. Strehl

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels.

Normalized Online Learning

no code implementations9 Aug 2014 Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

A Credit Assignment Compiler for Joint Prediction

no code implementations NeurIPS 2016 Kai-Wei Chang, He He, Hal Daumé III, John Langford, Stephane Ross

Many machine learning applications involve jointly predicting multiple mutually dependent output variables.

Logarithmic Time Online Multiclass prediction

no code implementations NeurIPS 2015 Anna Choromanska, John Langford

We develop top-down tree construction approaches for constructing logarithmic depth trees.

Resourceful Contextual Bandits

no code implementations27 Feb 2014 Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.

Multi-Armed Bandits

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation4 Feb 2014 Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

Efficient Online Bootstrapping for Large Scale Learning

no code implementations18 Dec 2013 Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford

Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.

Para-active learning

no code implementations30 Oct 2013 Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

We leverage the same observation to build a generic strategy for parallelizing learning algorithms.

Active Learning

Normalized Online Learning

1 code implementation28 May 2013 Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

A Reliable Effective Terascale Linear Learning System

2 code implementations19 Oct 2011 Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}

Doubly Robust Policy Evaluation and Learning

1 code implementation23 Mar 2011 Miroslav Dudik, John Langford, Lihong Li

The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.

Decision Making Multi-Armed Bandits

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

4 code implementations31 Mar 2010 Lihong Li, Wei Chu, John Langford, Xuanhui Wang

\emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature.

News Recommendation Recommendation Systems

A Contextual-Bandit Approach to Personalized News Article Recommendation

9 code implementations28 Feb 2010 Lihong Li, Wei Chu, John Langford, Robert E. Schapire

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

Collaborative Filtering Learning Theory

Learning from Logged Implicit Exploration Data

no code implementations NeurIPS 2010 Alex Strehl, John Langford, Sham Kakade, Lihong Li

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.

Slow Learners are Fast

no code implementations NeurIPS 2009 Martin Zinkevich, John Langford, Alex J. Smola

Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems.

Search-based Structured Prediction

no code implementations4 Jul 2009 Hal Daumé III, John Langford, Daniel Marcu

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision.

Classification General Classification +1

Sparse Online Learning via Truncated Gradient

no code implementations NeurIPS 2008 John Langford, Lihong Li, Tong Zhang

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss.

Cannot find the paper you are looking for? You can Submit a new open access paper.