Search Results for author: Lihong Li

Found 75 papers, 18 papers with code

Explicit Recall for Efficient Exploration

no code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Xinyue Cui, Lihong Li

In this paper, we advocate the use of explicit memory for efficient exploration in reinforcement learning.

Decision Making Efficient Exploration +2

Paper
Add Code

MESOB: Balancing Equilibria & Social Optimality

no code implementations • 16 Jul 2023 • Xin Guo, Lihong Li, Sareh Nabi, Rabih Salhab, Junzi Zhang

Motivated by bid recommendation in online ad auctions, this paper considers a general class of multi-level and multi-agent games, with two major characteristics: one is a large number of anonymous agents, and the other is the intricate interplay between competition and cooperation.

Paper
Add Code

Offline Policy Optimization in RL with Variance Regularizaton

no code implementations • 29 Dec 2022 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL +1

Paper
Add Code

A Reinforcement Learning Approach to Estimating Long-term Treatment Effects

no code implementations • 14 Oct 2022 • Ziyang Tang, Yiheng Duan, Stephanie Zhang, Lihong Li

Randomized experiments (a. k. a.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Understanding Domain Randomization for Sim-to-real Transfer

no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, LiWei Wang

Reinforcement learning encounters many challenges when applied directly in the real world.

Autonomous Driving Friction

Paper
Add Code

A Map of Bandits for E-commerce

no code implementations • 1 Jul 2021 • Yi Liu, Lihong Li

The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand.

Navigate

Paper
Add Code

On the Optimality of Batch Policy Optimization Algorithms

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

Value prediction

Paper
Add Code

Near-optimal Representation Learning for Linear Bandits and Linear RL

no code implementations • 8 Feb 2021 • Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, LiWei Wang

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation.

Representation Learning

Paper
Add Code

Offline Policy Optimization with Variance Regularization

no code implementations • 1 Jan 2021 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

Continuous Control Offline RL +1

Paper
Add Code

Escaping the Gravitational Pull of Softmax

no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

Paper
Add Code

CoinDICE: Off-Policy Confidence Interval Estimation

no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

Off-policy evaluation valid

Paper
Add Code

Neural Thompson Sampling

2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

Multi-Armed Bandits Thompson Sampling

Paper
Code

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

no code implementations • ICLR 2021 • Xiaoyu Chen, Jiachen Hu, Lihong Li, Li-Wei Wang

The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders

no code implementations • 27 Jul 2020 • Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi

We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders.

Off-policy evaluation reinforcement-learning

Paper
Add Code

Off-Policy Evaluation via the Regularized Lagrangian

no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

Off-policy evaluation

Paper
Add Code

Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning

no code implementations • ICLR 2020 • Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou

Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics, where high-fidelity simulators may not be available and on-policy evaluation is expensive or impossible.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Batch Stationary Distribution Estimation

1 code implementation • ICML 2020 • Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.

Off-policy evaluation

Paper
Code

GenDICE: Generalized Offline Estimation of Stationary Values

1 code implementation • ICLR 2020 • Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.

Paper
Code

Data Efficient Training for Reinforcement Learning with Adaptive Behavior Policy Sharing

no code implementations • 12 Feb 2020 • Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments.

Atari Games Decision Making +3

Paper
Add Code

AlgaeDICE: Policy Gradient from Arbitrary Experience

no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

Reinforcement Learning (RL)

Paper
Add Code

Neural Contextual Bandits with UCB-based Exploration

4 code implementations • ICML 2020 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Paper
Code

Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation

no code implementations • ICLR 2020 • Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Our method is doubly robust in that the bias vanishes when either the density ratio or the value function estimation is perfect.

Density Ratio Estimation Off-policy evaluation

Paper
Add Code

NeuralUCB: Contextual Bandits with Neural Network-Based Exploration

no code implementations • 25 Sep 2019 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

Efficient Exploration Multi-Armed Bandits

Paper
Add Code

Randomized Exploration in Generalized Linear Bandits

no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

Paper
Add Code

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

2 code implementations • NeurIPS 2019 • Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li

In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.

32,798

Paper
Code

A Kernel Loss for Solving the Bellman Equation

1 code implementation • NeurIPS 2019 • Yihao Feng, Lihong Li, Qiang Liu

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms.

Q-Learning Reinforcement Learning (RL)

Paper
Code

Neural Logic Machines

2 code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou

We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning.

Decision Making Inductive logic programming +1

276

Paper
Code

Policy Certificates: Towards Accountable Reinforcement Learning

no code implementations • 7 Nov 2018 • Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Adversarial Attacks on Stochastic Bandits

no code implementations • NeurIPS 2018 • Kwang-Sung Jun, Lihong Li, Yuzhe ma, Xiaojin Zhu

We study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm.

Paper
Add Code

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

2 code implementations • NeurIPS 2018 • Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy.

Paper
Code

Neural Approaches to Conversational AI

no code implementations • ACL 2018 • Jianfeng Gao, Michel Galley, Lihong Li

The present paper surveys neural approaches to conversational AI that have been developed in the last few years.

Question Answering

Paper
Add Code

Data Poisoning Attacks in Contextual Bandits

no code implementations • 17 Aug 2018 • Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu

We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector.

Data Poisoning Multi-Armed Bandits +2

Paper
Add Code

Scalable Bilinear Pi Learning Using State and Action Features

no code implementations • ICML 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Paper
Add Code

Scalable Bilinear $π$ Learning Using State and Action Features

no code implementations • 27 Apr 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

Paper
Add Code

Subgoal Discovery for Hierarchical Dialogue Policy Learning

no code implementations • EMNLP 2018 • Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara

Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals.

Hierarchical Reinforcement Learning

Paper
Add Code

Avoiding Catastrophic States with Intrinsic Fear

no code implementations • ICLR 2018 • Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng

Many practical reinforcement learning problems contain catastrophic states that the optimal policy visits infrequently or never.

Atari Games General Classification +3

Paper
Add Code

Now I Remember! Episodic Memory For Reinforcement Learning

no code implementations • ICLR 2018 • Ricky Loynd, Matthew Hausknecht, Lihong Li, Li Deng

Humans rely on episodic memory constantly, in remembering the name of someone they met 10 minutes ago, the plot of a movie as it unfolds, or where they parked the car.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation

no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song

When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.

Q-Learning reinforcement-learning +1

Paper
Add Code

Boosting the Actor with Dual Critic

no code implementations • ICLR 2018 • Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC.

Paper
Add Code

Q-LDA: Uncovering Latent Patterns in Text-based Sequential Decision Processes

no code implementations • NeurIPS 2017 • Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng

In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions.

Decision Making Q-Learning +2

Paper
Add Code

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

no code implementations • 15 Nov 2017 • Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

Efficient Exploration Q-Learning +4

Paper
Add Code

Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning

no code implementations • EMNLP 2017 • Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong

Building a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Investigation of Language Understanding Impact for Reinforcement Learning Based Dialogue Systems

no code implementations • 21 Mar 2017 • Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz

Language understanding is a key component in a spoken dialogue system.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

End-to-End Task-Completion Neural Dialogue Systems

13 code implementations • IJCNLP 2017 • Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz

One of the major drawbacks of modularized task-completion dialogue systems is that each module is trained individually, which presents several challenges.

Chatbot

807

Paper
Code

Scaffolding Networks: Incremental Learning and Teaching Through Questioning

no code implementations • 28 Feb 2017 • Asli Celikyilmaz, Li Deng, Lihong Li, Chong Wang

We introduce a new paradigm of learning for reasoning, understanding, and prediction, as well as the scaffolding network to implement this paradigm.

Incremental Learning Sentence

Paper
Add Code

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

no code implementations • ICML 2017 • Lihong Li, Yu Lu, Dengyong Zhou

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search.

Multi-Armed Bandits News Recommendation

Paper
Add Code

Stochastic Variance Reduction Methods for Policy Evaluation

no code implementations • ICML 2017 • Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy.

Reinforcement Learning (RL)

Paper
Add Code

A User Simulator for Task-Completion Dialogues

10 code implementations • 17 Dec 2016 • Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen

Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator.

reinforcement-learning Reinforcement Learning (RL) +2

807

Paper
Code

Active Learning with Oracle Epiphany

no code implementations • NeurIPS 2016 • Tzu-Kuo Huang, Lihong Li, Ara Vartanian, Saleema Amershi, Jerry Zhu

We present a theoretical analysis of active learning with more realistic interactions with human oracles.

Active Learning

Paper
Add Code

Neuro-Symbolic Program Synthesis

no code implementations • 6 Nov 2016 • Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli

While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network).

Program induction Program Synthesis

Paper
Add Code

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

no code implementations • 3 Nov 2016 • Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng

We introduce intrinsic fear (IF), a learned reward shaping that guards DRL agents against periodic catastrophes.

Atari Games General Classification +3

Paper
Add Code

Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access

1 code implementation • ACL 2017 • Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng

In this paper, we address this limitation by replacing symbolic queries with an induced "soft" posterior distribution over the KB that indicates which entities the user is interested in.

reinforcement-learning Reinforcement Learning (RL) +2

186

Paper
Code

BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems

no code implementations • 17 Aug 2016 • Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

Efficient Exploration Q-Learning +4

Paper
Add Code

Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads

1 code implementation • EMNLP 2016 • Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng

We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Deep Reinforcement Learning with a Natural Language Action Space

3 code implementations • ACL 2016 • Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf

This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based games.

Q-Learning reinforcement-learning +2

Paper
Code

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

2 code implementations • 11 Nov 2015 • Nan Jiang, Lihong Li

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy.

Decision Making reinforcement-learning +1

3,521

Paper
Code

Recurrent Reinforcement Learning: A Hybrid Approach

no code implementations • 10 Sep 2015 • Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

no code implementations • 10 Jun 2015 • Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives

no code implementations • 10 Jun 2015 • Shipra Agrawal, Nikhil R. Devanur, Lihong Li

This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it.

Multi-Armed Bandits Open-Ended Question Answering

Paper
Add Code

On the Prior Sensitivity of Thompson Sampling

no code implementations • 10 Jun 2015 • Che-Yu Liu, Lihong Li

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties.

Thompson Sampling

Paper
Add Code

Evaluation of Explore-Exploit Policies in Multi-result Ranking Systems

no code implementations • 28 Apr 2015 • Dragomir Yankov, Pavel Berkhin, Lihong Li

An offline framework is thus necessary to let us decide what policy and how we should apply in a production environment to ensure positive outcome.

News Recommendation Thompson Sampling

Paper
Add Code

Doubly Robust Policy Evaluation and Optimization

no code implementations • 10 Mar 2015 • Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.

Decision Making Multi-Armed Bandits

Paper
Add Code

On Minimax Optimal Offline Policy Evaluation

no code implementations • 12 Sep 2014 • Lihong Li, Remi Munos, Csaba Szepesvari

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy.

Multi-Armed Bandits Off-policy evaluation

Paper
Add Code

Counterfactual Estimation and Optimization of Click Metrics for Search Engines

no code implementations • 7 Mar 2014 • Lihong Li, Shunbao Chen, Jim Kleban, Ankur Gupta

Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments.

Causal Inference counterfactual +1

Paper
Add Code

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

General Classification Multi-Armed Bandits

8,400

Paper
Code

Efficient Online Bootstrapping for Large Scale Learning

no code implementations • 18 Dec 2013 • Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford

Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.

Paper
Add Code

Generalized Thompson Sampling for Contextual Bandits

no code implementations • 27 Oct 2013 • Lihong Li

Similar to most expert-learning algorithms, Generalized Thompson Sampling uses a loss function to adjust the experts' weights.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Sample Complexity of Multi-task Reinforcement Learning

no code implementations • 26 Sep 2013 • Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

An Empirical Evaluation of Thompson Sampling

no code implementations • NeurIPS 2011 • Olivier Chapelle, Lihong Li

Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly not very popular in the literature.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Doubly Robust Policy Evaluation and Learning

1 code implementation • 23 Mar 2011 • Miroslav Dudik, John Langford, Lihong Li

The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.

Decision Making Multi-Armed Bandits

Paper
Code

Parallelized Stochastic Gradient Descent

no code implementations • NeurIPS 2010 • Martin Zinkevich, Markus Weimer, Lihong Li, Alex J. Smola

With the increase in available data parallel machine learning has become an increasingly pressing problem.

Paper
Add Code

Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms

4 code implementations • 31 Mar 2010 • Lihong Li, Wei Chu, John Langford, Xuanhui Wang

\emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature.

News Recommendation Recommendation Systems

Paper
Code

A Contextual-Bandit Approach to Personalized News Article Recommendation

11 code implementations • 28 Feb 2010 • Lihong Li, Wei Chu, John Langford, Robert E. Schapire

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

Collaborative Filtering Learning Theory

31,072

Paper
Code

Learning from Logged Implicit Exploration Data

no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.

Paper
Add Code

Sparse Online Learning via Truncated Gradient

no code implementations • NeurIPS 2008 • John Langford, Lihong Li, Tong Zhang

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.