Search Results for author: Benjamin Van Roy

Found 78 papers, 12 papers with code

Deep Exploration via Bootstrapped DQN

6 code implementations • NeurIPS 2016 • Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

Efficient exploration in complex environments remains a major challenge for reinforcement learning.

Ranked #6 on Atari Games on Atari 2600 Breakout

Atari Games Efficient Exploration +2

76,571

Paper
Code

Behaviour Suite for Reinforcement Learning

2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

reinforcement-learning Reinforcement Learning (RL)

1,466

Paper
Code

A Tutorial on Thompson Sampling

2 code implementations • 7 Jul 2017 • Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

340

Paper
Code

Epistemic Neural Networks

1 code implementation • NeurIPS 2023 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

We introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty.

268

Paper
Code

The Neural Testbed: Evaluating Joint Predictions

1 code implementation • 9 Oct 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

Predictive distributions quantify uncertainties ignored by point estimates.

188

Paper
Code

Evaluating High-Order Predictive Distributions in Deep Learning

1 code implementation • 28 Feb 2022 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy

Previous work has developed methods for assessing low-order predictive distributions with inputs sampled i. i. d.

regression Vocal Bursts Intensity Prediction

188

Paper
Code

Fine-Tuning Language Models via Epistemic Neural Networks

1 code implementation • 3 Nov 2022 • Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data.

Active Learning Language Modelling

188

Paper
Code

Approximate Thompson Sampling via Epistemic Neural Networks

1 code implementation • 18 Feb 2023 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost.

Thompson Sampling

Paper
Code

Posterior Sampling for Reinforcement Learning Without Episodes

1 code implementation • 9 Aug 2016 • Ian Osband, Benjamin Van Roy

- Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Generalization and Exploration via Randomized Value Functions

1 code implementation • 4 Feb 2014 • Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration reinforcement-learning +1

Paper
Code

Langevin DQN

2 code implementations • 17 Feb 2020 • Vikranth Dwaracherla, Benjamin Van Roy

Algorithms that tackle deep exploration -- an important challenge in reinforcement learning -- have relied on epistemic uncertainty representation through ensembles or other hypermodels, exploration bonuses, or visitation count distributions.

Computational Efficiency Open-Ended Question Answering +2

Paper
Code

Deep Exploration via Randomized Value Functions

no code implementations • 22 Mar 2017 • Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

no code implementations • NeurIPS 2018 • Shi Dong, Benjamin Van Roy

We also offer a bound for the logistic bandit that dramatically improves on the best previously available, though this bound depends on an information-theoretic statistic that we have only been able to quantify via computation.

Thompson Sampling

Paper
Add Code

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

1 code implementation • NeurIPS 2018 • Maria Dimakopoulou, Ian Osband, Benjamin Van Roy

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Satisficing in Time-Sensitive Bandit Learning

no code implementations • 7 Mar 2018 • Daniel Russo, Benjamin Van Roy

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action.

Thompson Sampling

Paper
Add Code

Gaussian-Dirichlet Posterior Dominance in Sequential Learning

no code implementations • 14 Feb 2017 • Ian Osband, Benjamin Van Roy

We consider the problem of sequential learning from categorical observations bounded in [0, 1].

Paper
Add Code

Coordinated Exploration in Concurrent Reinforcement Learning

no code implementations • ICML 2018 • Maria Dimakopoulou, Benjamin Van Roy

We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Ensemble Sampling

no code implementations • NeurIPS 2017 • Xiuyuan Lu, Benjamin Van Roy

Thompson sampling has emerged as an effective heuristic for a broad range of online decision problems.

Thompson Sampling

Paper
Add Code

Learning to Price with Reference Effects

no code implementations • 29 Aug 2017 • Abbas Kazerouni, Benjamin Van Roy

As a firm varies the price of a product, consumers exhibit reference effects, making purchase decisions based not only on the prevailing price but also the product's price history.

Thompson Sampling

Paper
Add Code

Learning to Optimize via Information-Directed Sampling

no code implementations • NeurIPS 2014 • Daniel Russo, Benjamin Van Roy

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.

Paper
Add Code

On Optimistic versus Randomized Exploration in Reinforcement Learning

no code implementations • 13 Jun 2017 • Ian Osband, Benjamin Van Roy

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning.

Computational Efficiency reinforcement-learning +1

Paper
Add Code

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

no code implementations • ICML 2017 • Ian Osband, Benjamin Van Roy

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

no code implementations • 28 Apr 2017 • Daniel Russo, David Tse, Benjamin Van Roy

We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.

Thompson Sampling

Paper
Add Code

Conservative Contextual Linear Bandits

no code implementations • NeurIPS 2017 • Abbas Kazerouni, Mohammad Ghavamzadeh, Yasin Abbasi-Yadkori, Benjamin Van Roy

We prove an upper-bound on the regret of CLUCB and show that it can be decomposed into two terms: 1) an upper-bound for the regret of the standard linear UCB algorithm that grows with the time horizon and 2) a constant (does not grow with the time horizon) term that accounts for the loss of being conservative in order to satisfy the safety constraint.

Decision Making Marketing

Paper
Add Code

On Lower Bounds for Regret in Reinforcement Learning

no code implementations • 9 Aug 2016 • Ian Osband, Benjamin Van Roy

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

no code implementations • 18 Jul 2013 • Zheng Wen, Benjamin Van Roy

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Bootstrapped Thompson Sampling and Deep Exploration

no code implementations • 1 Jul 2015 • Ian Osband, Benjamin Van Roy

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

An Information-Theoretic Analysis of Thompson Sampling

no code implementations • 21 Mar 2014 • Daniel Russo, Benjamin Van Roy

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback.

Thompson Sampling

Paper
Add Code

Model-based Reinforcement Learning and the Eluder Dimension

no code implementations • NeurIPS 2014 • Ian Osband, Benjamin Van Roy

We consider the problem of learning to optimize an unknown Markov decision process (MDP).

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Near-optimal Reinforcement Learning in Factored MDPs

no code implementations • NeurIPS 2014 • Ian Osband, Benjamin Van Roy

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning to Optimize Via Posterior Sampling

no code implementations • 11 Jan 2013 • Daniel Russo, Benjamin Van Roy

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems.

Thompson Sampling

Paper
Add Code

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations • NeurIPS 2013 • Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

no code implementations • NeurIPS 2012 • Morteza Ibrahimi, Adel Javanmard, Benjamin Van Roy

In particular, our algorithm has an average cost of $(1+\eps)$ times the optimum cost after $T = \polylog(p) O(1/\eps^2)$.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Eluder Dimension and the Sample Complexity of Optimistic Exploration

no code implementations • NeurIPS 2013 • Daniel Russo, Benjamin Van Roy

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms.

Thompson Sampling

Paper
Add Code

Efficient Exploration and Value Function Generalization in Deterministic Systems

no code implementations • NeurIPS 2013 • Zheng Wen, Benjamin Van Roy

Efficient Exploration reinforcement-learning +1

Paper
Add Code

On the Performance of Thompson Sampling on Logistic Bandits

no code implementations • 12 May 2019 • Shi Dong, Tengyu Ma, Benjamin Van Roy

Specifically, we establish that, when the set of feasible actions is identical to the set of possible coefficient vectors, the Bayesian regret of Thompson sampling is $\tilde{O}(d\sqrt{T})$.

Thompson Sampling

Paper
Add Code

Comments on the Du-Kakade-Wang-Yang Lower Bounds

no code implementations • 18 Nov 2019 • Benjamin Van Roy, Shi Dong

Du, Kakade, Wang, and Yang recently established intriguing lower bounds on sample complexity, which suggest that reinforcement learning with a misspecified representation is intractable.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Information-Theoretic Confidence Bounds for Reinforcement Learning

no code implementations • NeurIPS 2019 • Xiuyuan Lu, Benjamin Van Roy

We integrate information-theoretic concepts into the design and analysis of optimistic algorithms and Thompson sampling.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Provably Efficient Reinforcement Learning with Aggregated States

no code implementations • 13 Dec 2019 • Shi Dong, Benjamin Van Roy, Zhengyuan Zhou

We establish that an optimistic variant of Q-learning applied to a fixed-horizon episodic Markov decision process with an aggregated state representation incurs regret $\tilde{\mathcal{O}}(\sqrt{H^5 M K} + \epsilon HK)$, where $H$ is the horizon, $M$ is the number of aggregate states, $K$ is the number of episodes, and $\epsilon$ is the largest difference between any pair of optimal state-action values associated with a common aggregate state.

Q-Learning reinforcement-learning +1

Paper
Add Code

Adaptive Execution: Exploration and Learning of Price Impact

no code implementations • 26 Jul 2012 • Beomsoo Park, Benjamin Van Roy

The trader must learn coefficients of a price impact model while trading.

Trading and Market Microstructure

Paper
Add Code

Hypermodels for Exploration

no code implementations • ICLR 2020 • Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

This generalizes and extends the use of ensembles to approximate Thompson sampling.

Thompson Sampling

Paper
Add Code

Randomized Value Functions via Posterior State-Abstraction Sampling

no code implementations • 5 Oct 2020 • Dilip Arumugam, Benjamin Van Roy

State abstraction has been an essential tool for dramatically improving the sample efficiency of reinforcement-learning algorithms.

Paper
Add Code

On Efficiency in Hierarchical Reinforcement Learning

no code implementations • NeurIPS 2020 • Zheng Wen, Doina Precup, Morteza Ibrahimi, Andre Barreto, Benjamin Van Roy, Satinder Singh

Hierarchical Reinforcement Learning (HRL) approaches promise to provide more efficient solutions to sequential decision making problems, both in terms of statistical as well as computational efficiency.

Computational Efficiency Decision Making +4

Paper
Add Code

Deciding What to Learn: A Rate-Distortion Approach

no code implementations • 15 Jan 2021 • Dilip Arumugam, Benjamin Van Roy

Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature.

Decision Making Thompson Sampling

Paper
Add Code

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States

no code implementations • 10 Feb 2021 • Shi Dong, Benjamin Van Roy, Zhengyuan Zhou

The time it takes to approach asymptotic performance is polynomial in the complexity of the agent's state representation and the time required to evaluate the best policy that the agent can represent.

Q-Learning reinforcement-learning +2

Paper
Add Code

A Bit Better? Quantifying Information for Bandit Learning

no code implementations • 18 Feb 2021 • Adithya M. Devraj, Benjamin Van Roy, Kuang Xu

The information ratio offers an approach to assessing the efficacy with which an agent balances between exploration and exploitation.

Paper
Add Code

Reinforcement Learning, Bit by Bit

no code implementations • 6 Mar 2021 • Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

From Predictions to Decisions: The Importance of Joint Predictive Distributions

no code implementations • 20 Jul 2021 • Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes?

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Deep Exploration for Recommendation Systems

no code implementations • 26 Sep 2021 • Zheqing Zhu, Benjamin Van Roy

Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback.

Recommendation Systems Thompson Sampling

Paper
Add Code

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

no code implementations • 29 Sep 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Dieterich Lawson, Brendan O'Donoghue, Botao Hao, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Uncertainty Quantification

Paper
Add Code

The Value of Information When Deciding What to Learn

no code implementations • NeurIPS 2021 • Dilip Arumugam, Benjamin Van Roy

All sequential decision-making agents explore so as to acquire knowledge about a particular target.

Decision Making

Paper
Add Code

Gaussian Imagination in Bandit Learning

no code implementations • 6 Jan 2022 • Yueyang Liu, Adithya M. Devraj, Benjamin Van Roy, Kuang Xu

We study the performance of an agent that attains a bounded information ratio with respect to a bandit environment with a Gaussian prior distribution and a Gaussian likelihood function when applied instead to a Bernoulli bandit.

Paper
Add Code

An Information-Theoretic Framework for Supervised Learning

no code implementations • 1 Mar 2022 • Hong Jun Jeon, Yifan Zhu, Benjamin Van Roy

For a particular prior distribution on weights, we establish sample complexity bounds that are simultaneously width independent and linear in depth.

Paper
Add Code

An Analysis of Ensemble Sampling

no code implementations • 2 Mar 2022 • Chao Qin, Zheng Wen, Xiuyuan Lu, Benjamin Van Roy

Ensemble sampling serves as a practical approximation to Thompson sampling when maintaining an exact posterior distribution over model parameters is computationally intractable.

Thompson Sampling

Paper
Add Code

Non-Stationary Bandit Learning via Predictive Sampling

no code implementations • 4 May 2022 • Yueyang Liu, Xu Kuang, Benjamin Van Roy

We attribute such failures to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity.

Attribute Thompson Sampling

Paper
Add Code

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

no code implementations • 4 Jun 2022 • Dilip Arumugam, Benjamin Van Roy

To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model.

Decision Making Model-based Reinforcement Learning +2

Paper
Add Code

Between Rate-Distortion Theory & Value Equivalence in Model-Based Reinforcement Learning

no code implementations • 4 Jun 2022 • Dilip Arumugam, Benjamin Van Roy

The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment.

Decision Making Model-based Reinforcement Learning +2

Paper
Add Code

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

no code implementations • 8 Jun 2022 • Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions.

Paper
Add Code

Robustness of Epinets against Distributional Shifts

no code implementations • 1 Jul 2022 • Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

However, these improvements are relatively small compared to the outstanding issues in distributionally-robust deep learning.

Paper
Add Code

Is Stochastic Gradient Descent Near Optimal?

no code implementations • 18 Sep 2022 • Yifan Zhu, Hong Jun Jeon, Benjamin Van Roy

However, existing computational theory suggests that, even for single-hidden-layer teacher networks, to attain small error for all such teacher networks, the computation required to achieve this sample complexity is intractable.

Paper
Add Code

On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning

no code implementations • 30 Oct 2022 • Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy

Throughout the cognitive-science literature, there is widespread agreement that decision-making agents operating in the real world do so under limited information-processing capabilities and without access to unbounded cognitive or computational resources.

Decision Making reinforcement-learning +1

Paper
Add Code

Posterior Sampling for Continuing Environments

no code implementations • 29 Nov 2022 • Wanqiao Xu, Shi Dong, Benjamin Van Roy

We develop an extension of posterior sampling for reinforcement learning (PSRL) that is suited for a continuing agent-environment interface and integrates naturally into agent designs that scale to complex environments.

Paper
Add Code

An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws

no code implementations • 2 Dec 2022 • Hong Jun Jeon, Benjamin Van Roy

For a particular learning model inspired by barron 1993, we establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes.

Language Modelling

Paper
Add Code

Inclusive Artificial Intelligence

no code implementations • 24 Dec 2022 • Dilip Arumugam, Shi Dong, Benjamin Van Roy

Prevailing methods for assessing and comparing generative AIs incentivize responses that serve a hypothetical representative individual.

Paper
Add Code

Leveraging Demonstrations to Improve Online Learning: Quality Matters

no code implementations • 7 Feb 2023 • Botao Hao, Rahul Jain, Tor Lattimore, Benjamin Van Roy, Zheng Wen

This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level.

Thompson Sampling

Paper
Add Code

A Definition of Non-Stationary Bandits

no code implementations • 23 Feb 2023 • Yueyang Liu, Xu Kuang, Benjamin Van Roy

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones.

Paper
Add Code

Bayesian Reinforcement Learning with Limited Cognitive Load

no code implementations • 5 May 2023 • Dilip Arumugam, Mark K. Ho, Noah D. Goodman, Benjamin Van Roy

All biological and artificial agents must learn and make decisions given limits on their ability to process information.

Decision Making reinforcement-learning

Paper
Add Code

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

no code implementations • 19 May 2023 • Wanqiao Xu, Shi Dong, Dilip Arumugam, Benjamin Van Roy

In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function.

Efficient Exploration Language Modelling +2

Paper
Add Code

Scalable Neural Contextual Bandit for Recommender Systems

no code implementations • 26 Jun 2023 • Zheqing Zhu, Benjamin Van Roy

In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms.

Recommendation Systems Thompson Sampling

Paper
Add Code

Continual Learning as Computationally Constrained Reinforcement Learning

no code implementations • 10 Jul 2023 • Saurabh Kumar, Henrik Marklund, Ashish Rao, Yifan Zhu, Hong Jun Jeon, Yueyang Liu, Benjamin Van Roy

The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning.

Continual Learning reinforcement-learning

Paper
Add Code

A Definition of Continual Reinforcement Learning

no code implementations • NeurIPS 2023 • David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh

Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents.

Continual Learning reinforcement-learning

Paper
Add Code

On the Convergence of Bounded Agents

no code implementations • 20 Jul 2023 • David Abel, André Barreto, Hado van Hasselt, Benjamin Van Roy, Doina Precup, Satinder Singh

Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing.

reinforcement-learning

Paper
Add Code

Maintaining Plasticity in Continual Learning via Regenerative Regularization

no code implementations • 23 Aug 2023 • Saurabh Kumar, Henrik Marklund, Benjamin Van Roy

In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters.

Continual Learning L2 Regularization

Paper
Add Code

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

no code implementations • 11 Oct 2023 • Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy

Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.

Multi-Armed Bandits

Paper
Add Code

RLHF and IIA: Perverse Incentives

no code implementations • 2 Dec 2023 • Wanqiao Xu, Shi Dong, Xiuyuan Lu, Grace Lam, Zheng Wen, Benjamin Van Roy

Existing algorithms for reinforcement learning from human feedback (RLHF) can incentivize responses at odds with preferences because they are based on models that assume independence of irrelevant alternatives (IIA).

reinforcement-learning

Paper
Add Code

Adaptive Crowdsourcing Via Self-Supervised Learning

no code implementations • 24 Jan 2024 • Anmol Kagrecha, Henrik Marklund, Benjamin Van Roy, Hong Jun Jeon, Richard Zeckhauser

Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate.

Self-Supervised Learning

Paper
Add Code

An Information-Theoretic Analysis of In-Context Learning

no code implementations • 28 Jan 2024 • Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted.

In-Context Learning Meta-Learning

Paper
Add Code

Efficient Exploration for LLMs

no code implementations • 1 Feb 2024 • Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao, Benjamin Van Roy

We present evidence of substantial benefit from efficient exploration in gathering human feedback to improve large language models.

Efficient Exploration Thompson Sampling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.