Search Results for author: Daniel Russo

Found 31 papers, 6 papers with code

A Tutorial on Thompson Sampling

2 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

1 code implementation19 Jul 2023 Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards.

Recommendation Systems

Benchmarking the Generation of Fact Checking Explanations

1 code implementation29 Aug 2023 Daniel Russo, Serra Sinem Tekiroglu, Marco Guerini

Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances.

Abstractive Text Summarization Benchmarking +3

Countering Misinformation via Emotional Response Generation

1 code implementation17 Nov 2023 Daniel Russo, Shane Peter Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini

The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy.

Misinformation Response Generation

Simple Bayesian Algorithms for Best Arm Identification

no code implementations26 Feb 2016 Daniel Russo

This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.

Thompson Sampling

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

no code implementations6 Jun 2018 Jalaj Bhandari, Daniel Russo, Raghav Singal

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.

Q-Learning

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Satisficing in Time-Sensitive Bandit Learning

no code implementations7 Mar 2018 Daniel Russo, Benjamin Van Roy

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action.

Thompson Sampling

Learning to Optimize via Information-Directed Sampling

no code implementations NeurIPS 2014 Daniel Russo, Benjamin Van Roy

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.

Improving the Expected Improvement Algorithm

no code implementations NeurIPS 2017 Chao Qin, Diego Klabjan, Daniel Russo

To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm.

Bayesian Optimization

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

no code implementations28 Apr 2017 Daniel Russo, David Tse, Benjamin Van Roy

We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.

Thompson Sampling

How much does your data exploration overfit? Controlling bias via information usage

no code implementations16 Nov 2015 Daniel Russo, James Zou

But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.

Clustering

An Information-Theoretic Analysis of Thompson Sampling

no code implementations21 Mar 2014 Daniel Russo, Benjamin Van Roy

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback.

Thompson Sampling

Learning to Optimize Via Posterior Sampling

no code implementations11 Jan 2013 Daniel Russo, Benjamin Van Roy

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems.

Thompson Sampling

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations NeurIPS 2013 Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration reinforcement-learning +1

Eluder Dimension and the Sample Complexity of Optimistic Exploration

no code implementations NeurIPS 2013 Daniel Russo, Benjamin Van Roy

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms.

Thompson Sampling

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

no code implementations9 Apr 2019 Daniel Russo

This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms.

Global Optimality Guarantees For Policy Gradient Methods

no code implementations5 Jun 2019 Jalaj Bhandari, Daniel Russo

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.

Policy Gradient Methods

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

no code implementations NeurIPS 2019 Daniel Russo

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems

no code implementations4 Sep 2019 Paolo Ciancarini, Andrea Giovanni Nuzzolese, Valentina Presutti, Daniel Russo

The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards.

On Linear Convergence of Policy Gradient Methods for Finite MDPs

no code implementations21 Jul 2020 Jalaj Bhandari, Daniel Russo

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations.

Policy Gradient Methods

Approximation Benefits of Policy Gradient Methods with Aggregated States

no code implementations22 Jul 2020 Daniel Russo

With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $\epsilon/(1-\gamma)$, where $\gamma$ is a discount factor.

Policy Gradient Methods

Learning to Stop with Surprisingly Few Samples

no code implementations19 Feb 2021 Daniel Russo, Assaf Zeevi, Tianyi Zhang

We consider a discounted infinite horizon optimal stopping problem.

Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation

no code implementations18 Feb 2022 Chao Qin, Daniel Russo

We investigate experiments that are designed to select a treatment arm for population deployment.

Thompson Sampling

On the Statistical Benefits of Temporal Difference Learning

no code implementations30 Jan 2023 David Cheikhi, Daniel Russo

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data.

An Information-Theoretic Analysis of Nonstationary Bandit Learning

no code implementations9 Feb 2023 Seungki Min, Daniel Russo

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves.

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization

1 code implementation20 Jun 2023 Matias Alvo, Daniel Russo, Yash Kanoria

The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.

Management Reinforcement Learning (RL) +1

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification

no code implementations16 Feb 2024 Chao Qin, Daniel Russo

Practitioners conducting adaptive experiments often encounter two competing priorities: reducing the cost of experimentation by effectively assigning treatments during the experiment itself, and gathering information swiftly to conclude the experiment and implement a treatment across the population.

Thompson Sampling

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

no code implementations11 Mar 2024 David Cheikhi, Daniel Russo

In several, there is no information loss and value-based methods are as statistically efficient as model based ones.

Cannot find the paper you are looking for? You can Submit a new open access paper.