no code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Xinyue Cui, Lihong Li

In this paper, we advocate the use of explicit memory for efficient exploration in reinforcement learning.

no code implementations • 16 Jul 2023 • Xin Guo, Lihong Li, Sareh Nabi, Rabih Salhab, Junzi Zhang

Motivated by bid recommendation in online ad auctions, this paper considers a general class of multi-level and multi-agent games, with two major characteristics: one is a large number of anonymous agents, and the other is the intricate interplay between competition and cooperation.

no code implementations • 29 Dec 2022 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

no code implementations • 14 Oct 2022 • Ziyang Tang, Yiheng Duan, Stephanie Zhang, Lihong Li

Randomized experiments (a. k. a.

no code implementations • ICLR 2022 • Xiaoyu Chen, Jiachen Hu, Chi Jin, Lihong Li, LiWei Wang

Reinforcement learning encounters many challenges when applied directly in the real world.

no code implementations • 1 Jul 2021 • Yi Liu, Lihong Li

The rich body of Bandit literature not only offers a diverse toolbox of algorithms, but also makes it hard for a practitioner to find the right solution to solve the problem at hand.

no code implementations • 6 Apr 2021 • Chenjun Xiao, Yifan Wu, Tor Lattimore, Bo Dai, Jincheng Mei, Lihong Li, Csaba Szepesvari, Dale Schuurmans

First, we introduce a class of confidence-adjusted index algorithms that unifies optimistic and pessimistic principles in a common framework, which enables a general analysis.

no code implementations • 8 Feb 2021 • Jiachen Hu, Xiaoyu Chen, Chi Jin, Lihong Li, LiWei Wang

This paper studies representation learning for multi-task linear bandits and multi-task episodic RL with linear value function approximation.

no code implementations • 1 Jan 2021 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup

Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.

no code implementations • NeurIPS 2020 • Jincheng Mei, Chenjun Xiao, Bo Dai, Lihong Li, Csaba Szepesvari, Dale Schuurmans

Both findings are based on an analysis of convergence rates using the Non-uniform \L{}ojasiewicz (N\L{}) inequalities.

no code implementations • NeurIPS 2020 • Bo Dai, Ofir Nachum, Yinlam Chow, Lihong Li, Csaba Szepesvári, Dale Schuurmans

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning, where the goal is to estimate a confidence interval on a target policy's value, given only access to a static experience dataset collected by unknown behavior policies.

2 code implementations • ICLR 2021 • Weitong Zhang, Dongruo Zhou, Lihong Li, Quanquan Gu

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems.

no code implementations • ICLR 2021 • Xiaoyu Chen, Jiachen Hu, Lihong Li, Li-Wei Wang

The regret of FMDP-BF is shown to be exponentially smaller than that of optimal algorithms designed for non-factored MDPs, and improves on the best previous result for FMDPs~\citep{osband2014near} by a factored of $\sqrt{H|\mathcal{S}_i|}$, where $|\mathcal{S}_i|$ is the cardinality of the factored state subspace and $H$ is the planning horizon.

no code implementations • 27 Jul 2020 • Andrew Bennett, Nathan Kallus, Lihong Li, Ali Mousavi

We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders, where states and actions can act as proxies for the unobserved confounders.

no code implementations • NeurIPS 2020 • Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans

The recently proposed distribution correction estimation (DICE) family of estimators has advanced the state of the art in off-policy evaluation from behavior-agnostic data.

no code implementations • ICLR 2020 • Ali Mousavi, Lihong Li, Qiang Liu, Denny Zhou

Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics, where high-fidelity simulators may not be available and on-policy evaluation is expensive or impossible.

1 code implementation • ICML 2020 • Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.

1 code implementation • ICLR 2020 • Ruiyi Zhang, Bo Dai, Lihong Li, Dale Schuurmans

An important problem that arises in reinforcement learning and Monte Carlo methods is estimating quantities defined by the stationary distribution of a Markov chain.

no code implementations • 12 Feb 2020 • Ge Liu, Rui Wu, Heng-Tze Cheng, Jing Wang, Jayden Ooi, Lihong Li, Ang Li, Wai Lok Sibon Li, Craig Boutilier, Ed Chi

Deep Reinforcement Learning (RL) is proven powerful for decision making in simulated environments.

no code implementations • 4 Dec 2019 • Ofir Nachum, Bo Dai, Ilya Kostrikov, Yin-Lam Chow, Lihong Li, Dale Schuurmans

In many real-world applications of reinforcement learning (RL), interactions with the environment are limited due to cost or feasibility.

4 code implementations • ICML 2020 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, it is the first neural network-based contextual bandit algorithm with a near-optimal regret guarantee.

no code implementations • ICLR 2020 • Ziyang Tang, Yihao Feng, Lihong Li, Dengyong Zhou, Qiang Liu

Our method is doubly robust in that the bias vanishes when either the density ratio or the value function estimation is perfect.

no code implementations • 25 Sep 2019 • Dongruo Zhou, Lihong Li, Quanquan Gu

To the best of our knowledge, our algorithm is the first neural network-based contextual bandit algorithm with near-optimal regret guarantee.

no code implementations • 21 Jun 2019 • Branislav Kveton, Manzil Zaheer, Csaba Szepesvari, Lihong Li, Mohammad Ghavamzadeh, Craig Boutilier

The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution.

2 code implementations • NeurIPS 2019 • Ofir Nachum, Yin-Lam Chow, Bo Dai, Lihong Li

In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset.

1 code implementation • NeurIPS 2019 • Yihao Feng, Lihong Li, Qiang Liu

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms.

2 code implementations • ICLR 2019 • Honghua Dong, Jiayuan Mao, Tian Lin, Chong Wang, Lihong Li, Denny Zhou

We propose the Neural Logic Machine (NLM), a neural-symbolic architecture for both inductive learning and logic reasoning.

no code implementations • 7 Nov 2018 • Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

The performance of a reinforcement learning algorithm can vary drastically during learning because of exploration.

no code implementations • NeurIPS 2018 • Kwang-Sung Jun, Lihong Li, Yuzhe ma, Xiaojin Zhu

We study adversarial attacks that manipulate the reward signals to control the actions chosen by a stochastic multi-armed bandit algorithm.

2 code implementations • NeurIPS 2018 • Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy.

no code implementations • ACL 2018 • Jianfeng Gao, Michel Galley, Lihong Li

The present paper surveys neural approaches to conversational AI that have been developed in the last few years.

no code implementations • 17 Aug 2018 • Yuzhe Ma, Kwang-Sung Jun, Lihong Li, Xiaojin Zhu

We provide a general attack framework based on convex optimization and show that by slightly manipulating rewards in the data, an attacker can force the bandit algorithm to pull a target arm for a target contextual vector.

no code implementations • ICML 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

no code implementations • 27 Apr 2018 • Yi-Chen Chen, Lihong Li, Mengdi Wang

In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided.

no code implementations • EMNLP 2018 • Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara

Experiments with simulated and real users show that our approach performs competitively against a state-of-the-art method that requires human-defined subgoals.

no code implementations • ICLR 2018 • Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng

Many practical reinforcement learning problems contain catastrophic states that the optimal policy visits infrequently or never.

no code implementations • ICLR 2018 • Ricky Loynd, Matthew Hausknecht, Lihong Li, Li Deng

Humans rely on episodic memory constantly, in remembering the name of someone they met 10 minutes ago, the plot of a movie as it unfolds, or where they parked the car.

no code implementations • ICLR 2018 • Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC.

no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song

When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.

no code implementations • NeurIPS 2017 • Jianshu Chen, Chong Wang, Lin Xiao, Ji He, Lihong Li, Li Deng

In sequential decision making, it is often important and useful for end users to understand the underlying patterns or causes that lead to the corresponding decisions.

no code implementations • 15 Nov 2017 • Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

no code implementations • EMNLP 2017 • Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong

Building a dialogue agent to fulfill complex tasks, such as travel planning, is challenging because the agent has to learn to collectively complete multiple subtasks.

no code implementations • 21 Mar 2017 • Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz

Language understanding is a key component in a spoken dialogue system.

13 code implementations • IJCNLP 2017 • Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz

One of the major drawbacks of modularized task-completion dialogue systems is that each module is trained individually, which presents several challenges.

no code implementations • 28 Feb 2017 • Asli Celikyilmaz, Li Deng, Lihong Li, Chong Wang

We introduce a new paradigm of learning for reasoning, understanding, and prediction, as well as the scaffolding network to implement this paradigm.

no code implementations • ICML 2017 • Lihong Li, Yu Lu, Dengyong Zhou

Contextual bandits are widely used in Internet services from news recommendation to advertising, and to Web search.

no code implementations • ICML 2017 • Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy.

10 code implementations • 17 Dec 2016 • Xiujun Li, Zachary C. Lipton, Bhuwan Dhingra, Lihong Li, Jianfeng Gao, Yun-Nung Chen

Then, one can train reinforcement learning agents in an online fashion as they interact with the simulator.

no code implementations • NeurIPS 2016 • Tzu-Kuo Huang, Lihong Li, Ara Vartanian, Saleema Amershi, Jerry Zhu

We present a theoretical analysis of active learning with more realistic interactions with human oracles.

no code implementations • 6 Nov 2016 • Emilio Parisotto, Abdel-rahman Mohamed, Rishabh Singh, Lihong Li, Dengyong Zhou, Pushmeet Kohli

While achieving impressive results, these approaches have a number of important limitations: (a) they are computationally expensive and hard to train, (b) a model has to be trained for each task (program) separately, and (c) it is hard to interpret or verify the correctness of the learnt mapping (as it is defined by a neural network).

no code implementations • 3 Nov 2016 • Zachary C. Lipton, Kamyar Azizzadenesheli, Abhishek Kumar, Lihong Li, Jianfeng Gao, Li Deng

We introduce intrinsic fear (IF), a learned reward shaping that guards DRL agents against periodic catastrophes.

1 code implementation • ACL 2017 • Bhuwan Dhingra, Lihong Li, Xiujun Li, Jianfeng Gao, Yun-Nung Chen, Faisal Ahmed, Li Deng

In this paper, we address this limitation by replacing symbolic queries with an induced "soft" posterior distribution over the KB that indicates which entities the user is interested in.

no code implementations • 17 Aug 2016 • Zachary C. Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, Li Deng

We present a new algorithm that significantly improves the efficiency of exploration for deep Q-learning agents in dialogue systems.

1 code implementation • EMNLP 2016 • Ji He, Mari Ostendorf, Xiaodong He, Jianshu Chen, Jianfeng Gao, Lihong Li, Li Deng

We introduce an online popularity prediction and tracking task as a benchmark task for reinforcement learning with a combinatorial, natural language action space.

3 code implementations • ACL 2016 • Ji He, Jianshu Chen, Xiaodong He, Jianfeng Gao, Lihong Li, Li Deng, Mari Ostendorf

This paper introduces a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language, as found in text-based games.

2 code implementations • 11 Nov 2015 • Nan Jiang, Lihong Li

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy.

no code implementations • 10 Sep 2015 • Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states.

no code implementations • 10 Jun 2015 • Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL).

no code implementations • 10 Jun 2015 • Shipra Agrawal, Nikhil R. Devanur, Lihong Li

This problem was introduced by Badanidiyuru et al. (2014), who gave a computationally inefficient algorithm with near-optimal regret bounds for it.

no code implementations • 10 Jun 2015 • Che-Yu Liu, Lihong Li

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties.

no code implementations • 28 Apr 2015 • Dragomir Yankov, Pavel Berkhin, Lihong Li

An offline framework is thus necessary to let us decide what policy and how we should apply in a production environment to ensure positive outcome.

no code implementations • 10 Mar 2015 • Miroslav Dudík, Dumitru Erhan, John Langford, Lihong Li

As such, we expect the doubly robust approach to become common practice in policy evaluation and optimization.

no code implementations • 12 Sep 2014 • Lihong Li, Remi Munos, Csaba Szepesvari

This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy.

no code implementations • 7 Mar 2014 • Lihong Li, Shunbao Chen, Jim Kleban, Ankur Gupta

Optimizing an interactive system against a predefined online metric is particularly challenging, when the metric is computed from user feedback such as clicks and payments.

1 code implementation • 4 Feb 2014 • Alekh Agarwal, Daniel Hsu, Satyen Kale, John Langford, Lihong Li, Robert E. Schapire

We present a new algorithm for the contextual bandit learning problem, where the learner repeatedly takes one of $K$ actions in response to the observed context, and observes the reward only for that chosen action.

no code implementations • 18 Dec 2013 • Zhen Qin, Vaclav Petricek, Nikos Karampatziakis, Lihong Li, John Langford

Bootstrapping is a useful technique for estimating the uncertainty of a predictor, for example, confidence intervals for prediction.

no code implementations • 27 Oct 2013 • Lihong Li

Similar to most expert-learning algorithms, Generalized Thompson Sampling uses a loss function to adjust the experts' weights.

no code implementations • 26 Sep 2013 • Emma Brunskill, Lihong Li

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications.

no code implementations • NeurIPS 2011 • Olivier Chapelle, Lihong Li

Thompson sampling is one of oldest heuristic to address the exploration / exploitation trade-off, but it is surprisingly not very popular in the literature.

1 code implementation • 23 Mar 2011 • Miroslav Dudik, John Langford, Lihong Li

The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy.

no code implementations • NeurIPS 2010 • Martin Zinkevich, Markus Weimer, Lihong Li, Alex J. Smola

With the increase in available data parallel machine learning has become an increasingly pressing problem.

4 code implementations • 31 Mar 2010 • Lihong Li, Wei Chu, John Langford, Xuanhui Wang

\emph{Offline} evaluation of the effectiveness of new algorithms in these applications is critical for protecting online user experiences but very challenging due to their "partial-label" nature.

11 code implementations • 28 Feb 2010 • Lihong Li, Wei Chu, John Langford, Robert E. Schapire

In this work, we model personalized recommendation of news articles as a contextual bandit problem, a principled approach in which a learning algorithm sequentially selects articles to serve users based on contextual information about the users and articles, while simultaneously adapting its article-selection strategy based on user-click feedback to maximize total user clicks.

no code implementations • NeurIPS 2010 • Alex Strehl, John Langford, Sham Kakade, Lihong Li

We provide a sound and consistent foundation for the use of \emph{nonrandom} exploration data in "contextual bandit" or "partially labeled" settings where only the value of a chosen action is learned.

no code implementations • NeurIPS 2008 • John Langford, Lihong Li, Tong Zhang

We propose a general method called truncated gradient to induce sparsity in the weights of online-learning algorithms with convex loss.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.