Search Results for author: John Langford

Found 77 papers, 25 papers with code

Towards Principled Representation Learning from Videos for Reinforcement Learning

no code implementations • 20 Mar 2024 • Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford

We study two types of settings: one where there is iid noise in the observation, and a more challenging setting where there is also the presence of exogenous noise, which is non-iid noise that is temporally correlated, such as the motion of people or cars in the background.

Contrastive Learning reinforcement-learning +1

Paper
Add Code

Position Paper: Agent AI Towards a Holistic Intelligence

no code implementations • 28 Feb 2024 • Qiuyuan Huang, Naoki Wake, Bidipta Sarkar, Zane Durante, Ran Gong, Rohan Taori, Yusuke Noda, Demetri Terzopoulos, Noboru Kuno, Ade Famoti, Ashley Llorens, John Langford, Hoi Vo, Li Fei-Fei, Katsu Ikeuchi, Jianfeng Gao

Recent advancements in large foundation models have remarkably enhanced our understanding of sensory information in open-world environments.

Position

Paper
Add Code

Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss

1 code implementation • 9 Feb 2024 • Ruijie Zheng, Yongyuan Liang, Xiyao Wang, Shuang Ma, Hal Daumé III, Huazhe Xu, John Langford, Praveen Palanisamy, Kalyan Shankar Basu, Furong Huang

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks.

Computational Efficiency Continuous Control +4

Paper
Code

PcLast: Discovering Plannable Continuous Latent States

no code implementations • 6 Nov 2023 • Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb

Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations.

Paper
Add Code

Streaming Active Learning with Deep Neural Networks

2 code implementations • 5 Mar 2023 • Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, Jordan T. Ash

Active learning is perhaps most naturally posed as an online learning problem.

Active Learning

186

Paper
Code

Towards Data-Driven Offline Simulations for Online Reinforcement Learning

1 code implementation • 14 Nov 2022 • Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman

Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.

Decision Making reinforcement-learning +1

Paper
Code

Agent-Controller Representations: Principled Offline RL with Rich Exogenous Information

1 code implementation • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford

We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.

Offline RL Reinforcement Learning (RL) +1

Paper
Code

Eigen Memory Trees

1 code implementation • 25 Oct 2022 • Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad

This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.

Paper
Code

Guaranteed Discovery of Control-Endogenous Latent States with Multi-Step Inverse Models

no code implementations • 17 Jul 2022 • Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford

In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.

Decision Making

Paper
Add Code

Contextual Bandits with Large Action Spaces: Made Practical

1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro

Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.

Decision Making Multi-Armed Bandits

Paper
Code

Interaction-Grounded Learning with Action-inclusive Feedback

no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford

Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.

Brain Computer Interface

Paper
Add Code

Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information

no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Personalization Improves Privacy-Accuracy Tradeoffs in Federated Learning

1 code implementation • 10 Feb 2022 • Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu

Large-scale machine learning systems often involve data distributed across a collection of users.

Personalized Federated Learning Stochastic Optimization

Paper
Code

Provable RL with Exogenous Distractors via Multistep Inverse Dynamics

no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Provably Filtering Exogenous Distractors using Multistep Inverse Dynamics

no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.

Reinforcement Learning (RL) Representation Learning

Paper
Add Code

Interaction-Grounded Learning

no code implementations • 9 Jun 2021 • Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad

We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.

Paper
Add Code

ChaCha for Online AutoML

1 code implementation • 9 Jun 2021 • Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi

We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.

AutoML Scheduling

3,710

Paper
Code

Provable Rich Observation Reinforcement Learning with Combinatorial Latent States

no code implementations • ICLR 2021 • Dipendra Misra, Qinghua Liu, Chi Jin, John Langford

We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).

Contrastive Learning reinforcement-learning +1

Paper
Add Code

Resonance: Replacing Software Constants with Context-Aware Models in Real-time Communication

no code implementations • 23 Nov 2020 • Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler, Johannes Gehrke

Large software systems tune hundreds of 'constants' to optimize their runtime performance.

Friction Multi-Armed Bandits

Paper
Add Code

Learning the Linear Quadratic Regulator from Nonlinear Observations

no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford

We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.

Continuous Control Decoder

Paper
Add Code

Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting

no code implementations • 12 Jun 2020 • Keyi Chen, John Langford, Francesco Orabona

Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.

Stochastic Optimization

Paper
Add Code

Efficient Contextual Bandits with Continuous Actions

1 code implementation • NeurIPS 2020 • Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins

We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.

Multi-Armed Bandits

Paper
Code

Federated Residual Learning

no code implementations • 28 Mar 2020 • Alekh Agarwal, John Langford, Chen-Yu Wei

We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.

Federated Learning

Paper
Add Code

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

no code implementations • ICML 2020 • Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds

5 code implementations • ICLR 2020 • Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal

We design a new algorithm for batch active learning with deep neural network models.

Active Learning

523

Paper
Code

Empirical Likelihood for Contextual Bandits

1 code implementation • NeurIPS 2020 • Nikos Karampatziakis, John Langford, Paul Mineiro

We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.

Multi-Armed Bandits

Paper
Code

Efficient Forward Architecture Search

2 code implementations • NeurIPS 2019 • Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric Horvitz, Debadeepta Dey

We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers.

feature selection Neural Architecture Search +1

458

Paper
Code

Contextual Bandits with Continuous Actions: Smoothing, Zooming, and Adapting

no code implementations • 5 Feb 2019 • Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang

We study contextual bandit learning with an abstract policy class and continuous action space.

Multi-Armed Bandits

Paper
Add Code

Provably efficient RL with Rich Observations via Latent State Decoding

1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.

Clustering Q-Learning +1

Paper
Code

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

1 code implementation • 2 Jan 2019 • Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban

We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.

Multi-Armed Bandits

Paper
Code

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches

no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.

Model-based Reinforcement Learning

Paper
Add Code

Contextual Memory Trees

no code implementations • 17 Jul 2018 • Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.

General Classification Image Captioning +2

Paper
Add Code

A Reductions Approach to Fair Classification

3 code implementations • ICML 2018 • Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach

We present a systematic approach for achieving fairness in a binary classification setting.

Binary Classification Classification +2

1,821

Paper
Code

On Oracle-Efficient PAC RL with Rich Observations

no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

We study the computational tractability of PAC reinforcement learning with rich observations.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

A Contextual Bandit Bake-off

1 code implementation • 12 Feb 2018 • Alberto Bietti, Alekh Agarwal, John Langford

Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.

Paper
Code

Residual Loss Prediction: Reinforcement Learning With No Incremental Feedback

1 code implementation • ICLR 2018 • Hal Daumé III, John Langford, Amr Sharaf

We consider reinforcement learning and bandit structured prediction problems with very sparse loss feedback: only at the end of an episode.

Multi-Armed Bandits reinforcement-learning +2

Paper
Code

Efficient Contextual Bandits in Non-stationary Worlds

no code implementations • 5 Aug 2017 • Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.

Multi-Armed Bandits

Paper
Add Code

Learning Deep ResNet Blocks Sequentially using Boosting Theory

no code implementations • ICML 2018 • Furong Huang, Jordan Ash, John Langford, Robert Schapire

We prove that the training error decays exponentially with the depth $T$ if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline.

Paper
Add Code

Mapping Instructions and Visual Observations to Actions with Reinforcement Learning

1 code implementation • EMNLP 2017 • Dipendra Misra, John Langford, Yoav Artzi

We propose to directly map raw visual observations and text input to actions for instruction execution.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Active Learning for Cost-Sensitive Classification

no code implementations • ICML 2017 • Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford

We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.

Active Learning Classification +2

Paper
Add Code

Contextual Decision Processes with Low Bellman Rank are PAC-Learnable

no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire

Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Logarithmic Time One-Against-Some

no code implementations • ICML 2017 • Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro

Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.

Binary Classification Classification +1

Paper
Add Code

Making Contextual Decisions with Low Technical Debt

no code implementations • 13 Jun 2016 • Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins

The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.

Multi-Armed Bandits

Paper
Add Code

Off-policy evaluation for slate recommendation

1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni

This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.

Learning-To-Rank Off-policy evaluation

Paper
Code

Search Improves Label for Active Learning

no code implementations • NeurIPS 2016 • Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang

We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not).

Active Learning

Paper
Add Code

PAC Reinforcement Learning with Rich Observations

no code implementations • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, John Langford

We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.

Decision Making Multi-Armed Bandits +2

Paper
Add Code

Efficient Second Order Online Learning by Sketching

no code implementations • NeurIPS 2016 • Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.

Paper
Add Code

Efficient and Parsimonious Agnostic Active Learning

no code implementations • NeurIPS 2015 • Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire

We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.

Active Learning General Classification

Paper
Add Code

Hands-on Learning to Search for Structured Prediction

no code implementations • HLT 2015 • Hal Daumé III, John Langford, Kai-Wei Chang, He He, Sudha Rao

Decision Making Dependency Parsing +2

Paper
Add Code

Learning to Search for Dependencies

no code implementations • 18 Mar 2015 • Kai-Wei Chang, He He, Hal Daumé III, John Langford

We demonstrate that a dependency parser can be built using a credit assignment compiler which removes the burden of worrying about low-level machine learning details from the parser implementation.

BIG-bench Machine Learning

Paper
Add Code

Doubly Robust Policy Evaluation and Optimization

no code implementations • 10 Mar 2015 • Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.

Decision Making Multi-Armed Bandits

Paper
Add Code

Learning Reductions that Really Work

no code implementations • 9 Feb 2015 • Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro

We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.

BIG-bench Machine Learning

Paper
Add Code

Learning to Search Better Than Your Teacher

no code implementations • 8 Feb 2015 • Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford

Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.

Multi-Armed Bandits Structured Prediction

Paper
Add Code

Scalable Non-linear Learning with Adaptive Polynomial Expansions

no code implementations • NeurIPS 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky

Can we effectively learn a nonlinear representation in time comparable to linear learning?

Computational Efficiency

Paper
Add Code

Scalable Nonlinear Learning with Adaptive Polynomial Expansions

no code implementations • 2 Oct 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky

Can we effectively learn a nonlinear representation in time comparable to linear learning?

Computational Efficiency

Paper
Add Code

Normalized Online Learning

no code implementations • 9 Aug 2014 • Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

Paper
Add Code

Conditional Probability Tree Estimation Analysis and Algorithms

no code implementations • 9 Aug 2014 • Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alexander L. Strehl

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels.

regression

Paper
Add Code

A Credit Assignment Compiler for Joint Prediction

no code implementations • NeurIPS 2016 • Kai-Wei Chang, He He, Hal Daumé III, John Langford, Stephane Ross

Many machine learning applications involve jointly predicting multiple mutually dependent output variables.

Paper
Add Code

Logarithmic Time Online Multiclass prediction

no code implementations • NeurIPS 2015 • Anna Choromanska, John Langford

We develop top-down tree construction approaches for constructing logarithmic depth trees.

Paper
Add Code

Resourceful Contextual Bandits

no code implementations • 27 Feb 2014 • Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins

We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.

Multi-Armed Bandits

Paper
Add Code

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

8,418

Paper
Code

Efficient Online Bootstrapping for Large Scale Learning

no code implementations • 18 Dec 2013 • Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford

Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.

Paper
Add Code

Para-active learning

no code implementations • 30 Oct 2013 • Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford

We leverage the same observation to build a generic strategy for parallelizing learning algorithms.

Active Learning

Paper
Add Code

Normalized Online Learning

1 code implementation • 28 May 2013 • Stephane Ross, Paul Mineiro, John Langford

We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.

818

Paper
Code

A Reliable Effective Terascale Linear Learning System

2 code implementations • 19 Oct 2011 • Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}

Paper
Code

Doubly Robust Policy Evaluation and Learning

1 code implementation • 23 Mar 2011 • Miroslav Dudik, John Langford, Lihong Li

The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.

Decision Making Multi-Armed Bandits

Paper
Code

Agnostic Active Learning Without Constraints

no code implementations • NeurIPS 2010 • Alina Beygelzimer, Daniel J. Hsu, John Langford, Tong Zhang

We present and analyze an agnostic active learning algorithm that works without keeping a version space.

Active Learning General Classification

Paper
Add Code

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

4 code implementations • 31 Mar 2010 • Lihong Li, Wei Chu, John Langford, Xuanhui Wang

\emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature.

News Recommendation Recommendation Systems

Paper
Code

A Contextual-Bandit Approach to Personalized News Article Recommendation

11 code implementations • 28 Feb 2010 • Lihong Li, Wei Chu, John Langford, Robert E. Schapire

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

Collaborative Filtering Learning Theory

31,490

Paper
Code

Learning from Logged Implicit Exploration Data

no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.

Paper
Add Code

Slow Learners are Fast

no code implementations • NeurIPS 2009 • Martin Zinkevich, John Langford, Alex J. Smola

Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems.

Paper
Add Code

Search-based Structured Prediction

no code implementations • 4 Jul 2009 • Hal Daumé III, John Langford, Daniel Marcu

We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision.

General Classification Structured Prediction