1 code implementation • 5 Mar 2023 • Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, Jordan T. Ash
Active learning is perhaps most naturally posed as an online learning problem.
1 code implementation • 14 Nov 2022 • Shengpu Tang, Felipe Vieira Frujeri, Dipendra Misra, Alex Lamb, John Langford, Paul Mineiro, Sebastian Kochman
Modern decision-making systems, from robots to web recommendation engines, are expected to adapt: to user preferences, changing circumstances or even new tasks.
no code implementations • 31 Oct 2022 • Riashat Islam, Manan Tomar, Alex Lamb, Yonathan Efroni, Hongyu Zang, Aniket Didolkar, Dipendra Misra, Xin Li, Harm van Seijen, Remi Tachet des Combes, John Langford
We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications.
1 code implementation • 25 Oct 2022 • Mark Rucker, Jordan T. Ash, John Langford, Paul Mineiro, Ida Momennejad
This work introduces the Eigen Memory Tree (EMT), a novel online memory model for sequential learning scenarios.
no code implementations • 17 Jul 2022 • Alex Lamb, Riashat Islam, Yonathan Efroni, Aniket Didolkar, Dipendra Misra, Dylan Foster, Lekan Molu, Rajan Chari, Akshay Krishnamurthy, John Langford
In many sequential decision-making tasks, the agent is not able to model the full complexity of the world, which consists of multitudes of relevant and irrelevant information.
1 code implementation • 12 Jul 2022 • Yinglun Zhu, Dylan J. Foster, John Langford, Paul Mineiro
Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice.
no code implementations • 16 Jun 2022 • Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida Momennejad, Nan Jiang, Paul Mineiro, John Langford
Consider the problem setting of Interaction-Grounded Learning (IGL), in which a learner's goal is to optimally interact with the environment with no explicit reward to ground its policies.
no code implementations • 9 Jun 2022 • Yonathan Efroni, Dylan J. Foster, Dipendra Misra, Akshay Krishnamurthy, John Langford
In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand.
1 code implementation • 10 Feb 2022 • Alberto Bietti, Chen-Yu Wei, Miroslav Dudík, John Langford, Zhiwei Steven Wu
Large-scale machine learning systems often involve data distributed across a collection of users.
no code implementations • 17 Oct 2021 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
no code implementations • ICLR 2022 • Yonathan Efroni, Dipendra Misra, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We initiate the formal study of latent state discovery in the presence of such exogenous noise sources by proposing a new model, the Exogenous Block MDP (EX-BMDP), for rich observation RL.
1 code implementation • 9 Jun 2021 • Qingyun Wu, Chi Wang, John Langford, Paul Mineiro, Marco Rossi
We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings.
no code implementations • 9 Jun 2021 • Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad
We propose Interaction-Grounded Learning for this novel setting, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
no code implementations • ICLR 2021 • Dipendra Misra, Qinghua Liu, Chi Jin, John Langford
We propose a novel setting for reinforcement learning that combines two common real-world difficulties: presence of observations (such as camera images) and factored states (such as location of objects).
no code implementations • 23 Nov 2020 • Jayant Gupchup, Ashkan Aazami, Yaran Fan, Senja Filipi, Tom Finley, Scott Inglis, Marcus Asteborg, Luke Caroll, Rajan Chari, Markus Cozowicz, Vishak Gopal, Vinod Prakash, Sasikanth Bendapudi, Jack Gerrits, Eric Lau, Huazhou Liu, Marco Rossi, Dima Slobodianyk, Dmitri Birjukov, Matty Cooper, Nilesh Javar, Dmitriy Perednya, Sriram Srinivasan, John Langford, Ross Cutler, Johannes Gehrke
Large software systems tune hundreds of 'constants' to optimize their runtime performance.
no code implementations • NeurIPS 2020 • Zakaria Mhammedi, Dylan J. Foster, Max Simchowitz, Dipendra Misra, Wen Sun, Akshay Krishnamurthy, Alexander Rakhlin, John Langford
We introduce a new algorithm, RichID, which learns a near-optimal policy for the RichLQR with sample complexity scaling only with the dimension of the latent state space and the capacity of the decoder function class.
no code implementations • 12 Jun 2020 • Keyi Chen, John Langford, Francesco Orabona
Parameter-free stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance.
1 code implementation • NeurIPS 2020 • Maryam Majzoubi, Chicheng Zhang, Rajan Chari, Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins
We create a computationally tractable algorithm for contextual bandits with continuous actions having unknown structure.
no code implementations • 28 Mar 2020 • Alekh Agarwal, John Langford, Chen-Yu Wei
We study a new form of federated learning where the clients train personalized local models and make predictions jointly with the server-side shared model.
no code implementations • ICML 2020 • Dipendra Misra, Mikael Henaff, Akshay Krishnamurthy, John Langford
We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space.
4 code implementations • ICLR 2020 • Jordan T. Ash, Chicheng Zhang, Akshay Krishnamurthy, John Langford, Alekh Agarwal
We design a new algorithm for batch active learning with deep neural network models.
1 code implementation • NeurIPS 2020 • Nikos Karampatziakis, John Langford, Paul Mineiro
We propose an estimator and confidence interval for computing the value of a policy from off-policy data in the contextual bandit setting.
2 code implementations • NeurIPS 2019 • Hanzhang Hu, John Langford, Rich Caruana, Saurajit Mukherjee, Eric Horvitz, Debadeepta Dey
We propose a neural architecture search (NAS) algorithm, Petridish, to iteratively add shortcut connections to existing network layers.
no code implementations • 5 Feb 2019 • Akshay Krishnamurthy, John Langford, Aleksandrs Slivkins, Chicheng Zhang
We study contextual bandit learning with an abstract policy class and continuous action space.
1 code implementation • 25 Jan 2019 • Simon S. Du, Akshay Krishnamurthy, Nan Jiang, Alekh Agarwal, Miroslav Dudík, John Langford
We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states.
1 code implementation • 2 Jan 2019 • Chicheng Zhang, Alekh Agarwal, Hal Daumé III, John Langford, Sahand N. Negahban
We investigate the feasibility of learning from a mix of both fully-labeled supervised data and contextual bandit data.
no code implementations • 21 Nov 2018 • Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford
We study the sample complexity of model-based reinforcement learning (henceforth RL) in general contextual decision processes that require strategic exploration to find a near-optimal policy.
no code implementations • 17 Jul 2018 • Wen Sun, Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
We design and study a Contextual Memory Tree (CMT), a learning memory controller that inserts new memories into an experience store of unbounded size.
3 code implementations • ICML 2018 • Alekh Agarwal, Alina Beygelzimer, Miroslav Dudík, John Langford, Hanna Wallach
We present a systematic approach for achieving fairness in a binary classification setting.
no code implementations • NeurIPS 2018 • Christoph Dann, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
We study the computational tractability of PAC reinforcement learning with rich observations.
1 code implementation • 12 Feb 2018 • Alberto Bietti, Alekh Agarwal, John Langford
Contextual bandit algorithms are essential for solving many real-world interactive machine learning problems.
1 code implementation • ICLR 2018 • Hal Daumé III, John Langford, Amr Sharaf
We consider reinforcement learning and bandit structured prediction problems with very sparse loss feedback: only at the end of an episode.
no code implementations • 5 Aug 2017 • Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford
In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.
no code implementations • ICML 2018 • Furong Huang, Jordan Ash, John Langford, Robert Schapire
We prove that the training error decays exponentially with the depth $T$ if the \emph{weak module classifiers} that we train perform slightly better than some weak baseline.
1 code implementation • EMNLP 2017 • Dipendra Misra, John Langford, Yoav Artzi
We propose to directly map raw visual observations and text input to actions for instruction execution.
no code implementations • ICML 2017 • Akshay Krishnamurthy, Alekh Agarwal, Tzu-Kuo Huang, Hal Daume III, John Langford
We design an active learning algorithm for cost-sensitive multiclass classification: problems where different errors have different costs.
no code implementations • ICML 2017 • Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford, Robert E. Schapire
Our first contribution is a complexity measure, the Bellman rank, that we show enables tractable learning of near-optimal behavior in these processes and is naturally small for many well-studied reinforcement learning settings.
no code implementations • ICML 2017 • Hal Daume III, Nikos Karampatziakis, John Langford, Paul Mineiro
Compared to previous approaches, we obtain substantially better statistical performance for two reasons: First, we prove a tighter and more complete boosting theorem, and second we translate the results more directly into an algorithm.
no code implementations • 13 Jun 2016 • Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, Alex Slivkins
The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy.
1 code implementation • NeurIPS 2017 • Adith Swaminathan, Akshay Krishnamurthy, Alekh Agarwal, Miroslav Dudík, John Langford, Damien Jose, Imed Zitouni
This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.
no code implementations • NeurIPS 2016 • Alina Beygelzimer, Daniel Hsu, John Langford, Chicheng Zhang
We investigate active learning with access to two distinct oracles: Label (which is standard) and Search (which is not).
no code implementations • NeurIPS 2016 • Akshay Krishnamurthy, Alekh Agarwal, John Langford
We prove that the algorithm learns near optimal behavior after a number of episodes that is polynomial in all relevant parameters, logarithmic in the number of policies, and independent of the size of the observation space.
no code implementations • NeurIPS 2016 • Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford
We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.
no code implementations • NeurIPS 2015 • Tzu-Kuo Huang, Alekh Agarwal, Daniel J. Hsu, John Langford, Robert E. Schapire
We develop a new active learning algorithm for the streaming setting satisfying three important properties: 1) It provably works for any classifier representation and classification problem including those with severe noise.
no code implementations • 18 Mar 2015 • Kai-Wei Chang, He He, Hal Daumé III, John Langford
We demonstrate that a dependency parser can be built using a credit assignment compiler which removes the burden of worrying about low-level machine learning details from the parser implementation.
no code implementations • 10 Mar 2015 • Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li
As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.
no code implementations • 9 Feb 2015 • Alina Beygelzimer, Hal Daumé III, John Langford, Paul Mineiro
We provide a summary of the mathematical and computational techniques that have enabled learning reductions to effectively address a wide class of problems, and show that this approach to solving machine learning problems can be broadly useful.
no code implementations • 8 Feb 2015 • Kai-Wei Chang, Akshay Krishnamurthy, Alekh Agarwal, Hal Daumé III, John Langford
Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference.
no code implementations • NeurIPS 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel J. Hsu, John Langford, Matus J. Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • 2 Oct 2014 • Alekh Agarwal, Alina Beygelzimer, Daniel Hsu, John Langford, Matus Telgarsky
Can we effectively learn a nonlinear representation in time comparable to linear learning?
no code implementations • 9 Aug 2014 • Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alexander L. Strehl
We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels.
no code implementations • 9 Aug 2014 • Stephane Ross, Paul Mineiro, John Langford
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.
no code implementations • NeurIPS 2016 • Kai-Wei Chang, He He, Hal Daumé III, John Langford, Stephane Ross
Many machine learning applications involve jointly predicting multiple mutually dependent output variables.
no code implementations • NeurIPS 2015 • Anna Choromanska, John Langford
We develop top-down tree construction approaches for constructing logarithmic depth trees.
no code implementations • 27 Feb 2014 • Ashwinkumar Badanidiyuru, John Langford, Aleksandrs Slivkins
We study contextual bandits with ancillary constraints on resources, which are common in real-world applications such as choosing ads or dynamic pricing of items.
1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire
We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.
no code implementations • 18 Dec 2013 • Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford
Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.
no code implementations • 30 Oct 2013 • Alekh Agarwal, Leon Bottou, Miroslav Dudik, John Langford
We leverage the same observation to build a generic strategy for parallelizing learning algorithms.
1 code implementation • 28 May 2013 • Stephane Ross, Paul Mineiro, John Langford
We introduce online learning algorithms which are independent of feature scales, proving regret bounds dependent on the ratio of scales existent in the data rather than the absolute scale.
2 code implementations • 19 Oct 2011 • Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford
We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.}
1 code implementation • 23 Mar 2011 • Miroslav Dudik, John Langford, Lihong Li
The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.
no code implementations • NeurIPS 2010 • Alina Beygelzimer, Daniel J. Hsu, John Langford, Tong Zhang
We present and analyze an agnostic active learning algorithm that works without keeping a version space.
4 code implementations • 31 Mar 2010 • Lihong Li, Wei Chu, John Langford, Xuanhui Wang
\emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature.
11 code implementations • 28 Feb 2010 • Lihong Li, Wei Chu, John Langford, Robert E. Schapire
In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.
no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li
We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.
no code implementations • NeurIPS 2009 • Martin Zinkevich, John Langford, Alex J. Smola
Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems.
no code implementations • 4 Jul 2009 • Hal Daumé III, John Langford, Daniel Marcu
We present Searn, an algorithm for integrating search and learning to solve complex structured prediction problems such as those that occur in natural language, speech, computational biology, and vision.
no code implementations • 21 Dec 2008 • Alina Beygelzimer, John Langford
We show that the Offset Tree is an optimal reduction to binary classification.
no code implementations • NeurIPS 2008 • John Langford, Lihong Li, Tong Zhang
We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss.
no code implementations • NeurIPS 2008 • Sharad Goel, John Langford, Alexander L. Strehl
We tackle the computational problem of query-conditioned search.
no code implementations • NeurIPS 2007 • John Langford, Tong Zhang
We present Epoch-Greedy, an algorithm for multi-armed bandits with observable side information.