Search Results for author: Daniel Russo

Found 31 papers, 6 papers with code

A Tutorial on Thompson Sampling

2 code implementations • 7 Jul 2017 • Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

340

Paper
Code

Empirical Standards for Software Engineering Research

1 code implementation • 7 Oct 2020 • Paul Ralph, Nauman bin Ali, Sebastian Baltes, Domenico Bianculli, Jessica Diaz, Yvonne Dittrich, Neil Ernst, Michael Felderer, Robert Feldt, Antonio Filieri, Breno Bernard Nicolau de França, Carlo Alberto Furia, Greg Gay, Nicolas Gold, Daniel Graziotin, Pinjia He, Rashina Hoda, Natalia Juristo, Barbara Kitchenham, Valentina Lenarduzzi, Jorge Martínez, Jorge Melegati, Daniel Mendez, Tim Menzies, Jefferson Molleri, Dietmar Pfahl, Romain Robbes, Daniel Russo, Nyyti Saarimäki, Federica Sarro, Janet Siegmund, Diomidis Spinellis, Miroslaw Staron, Klaas Stol, Margaret-Anne Storey, Davide Taibi, Damian Tamburri, Marco Torchiano, Christoph Treude, Burak Turhan, XiaoFeng Wang, Sira Vegas

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e. g. a questionnaire survey).

Software Engineering General Literature

277

Paper
Code

Impatient Bandits: Optimizing Recommendations for the Long-Term Without Delay

1 code implementation • 19 Jul 2023 • Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek

In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards.

Recommendation Systems

Paper
Code

Benchmarking the Generation of Fact Checking Explanations

1 code implementation • 29 Aug 2023 • Daniel Russo, Serra Sinem Tekiroglu, Marco Guerini

Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances.

Abstractive Text Summarization Benchmarking +3

Paper
Code

Countering Misinformation via Emotional Response Generation

1 code implementation • 17 Nov 2023 • Daniel Russo, Shane Peter Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini

The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy.

Misinformation Response Generation

Paper
Code

Simple Bayesian Algorithms for Best Arm Identification

no code implementations • 26 Feb 2016 • Daniel Russo

This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.

Thompson Sampling

Paper
Add Code

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

no code implementations • 6 Jun 2018 • Jalaj Bhandari, Daniel Russo, Raghav Singal

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.

Q-Learning

Paper
Add Code

Deep Exploration via Randomized Value Functions

no code implementations • 22 Mar 2017 • Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Satisficing in Time-Sensitive Bandit Learning

no code implementations • 7 Mar 2018 • Daniel Russo, Benjamin Van Roy

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action.

Thompson Sampling

Paper
Add Code

Learning to Optimize via Information-Directed Sampling

no code implementations • NeurIPS 2014 • Daniel Russo, Benjamin Van Roy

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.

Paper
Add Code

Improving the Expected Improvement Algorithm

no code implementations • NeurIPS 2017 • Chao Qin, Diego Klabjan, Daniel Russo

To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm.

Bayesian Optimization

Paper
Add Code

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

no code implementations • 28 Apr 2017 • Daniel Russo, David Tse, Benjamin Van Roy

We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.

Thompson Sampling

Paper
Add Code

How much does your data exploration overfit? Controlling bias via information usage

no code implementations • 16 Nov 2015 • Daniel Russo, James Zou

But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.

Clustering

Paper
Add Code

An Information-Theoretic Analysis of Thompson Sampling

no code implementations • 21 Mar 2014 • Daniel Russo, Benjamin Van Roy

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback.

Thompson Sampling

Paper
Add Code

Learning to Optimize Via Posterior Sampling

no code implementations • 11 Jan 2013 • Daniel Russo, Benjamin Van Roy

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems.

Thompson Sampling

Paper
Add Code

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations • NeurIPS 2013 • Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Eluder Dimension and the Sample Complexity of Optimistic Exploration

no code implementations • NeurIPS 2013 • Daniel Russo, Benjamin Van Roy

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms.

Thompson Sampling

Paper
Add Code

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

no code implementations • 9 Apr 2019 • Daniel Russo

This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms.

Paper
Add Code

Global Optimality Guarantees For Policy Gradient Methods

no code implementations • 5 Jun 2019 • Jalaj Bhandari, Daniel Russo

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.

Policy Gradient Methods

Paper
Add Code

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

no code implementations • NeurIPS 2019 • Daniel Russo

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems

no code implementations • 4 Sep 2019 • Paolo Ciancarini, Andrea Giovanni Nuzzolese, Valentina Presutti, Daniel Russo

The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards.

Paper
Add Code

On Linear Convergence of Policy Gradient Methods for Finite MDPs

no code implementations • 21 Jul 2020 • Jalaj Bhandari, Daniel Russo

We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations.

Policy Gradient Methods

Paper
Add Code

Approximation Benefits of Policy Gradient Methods with Aggregated States

no code implementations • 22 Jul 2020 • Daniel Russo

With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $\epsilon/(1-\gamma)$, where $\gamma$ is a discount factor.

Policy Gradient Methods

Paper
Add Code

Learning to Stop with Surprisingly Few Samples

no code implementations • 19 Feb 2021 • Daniel Russo, Assaf Zeevi, Tianyi Zhang

We consider a discounted infinite horizon optimal stopping problem.

Paper
Add Code

Adaptive Experimentation in the Presence of Exogenous Nonstationary Variation

no code implementations • 18 Feb 2022 • Chao Qin, Daniel Russo

We investigate experiments that are designed to select a treatment arm for population deployment.

Thompson Sampling

Paper
Add Code

On the Statistical Benefits of Temporal Difference Learning

no code implementations • 30 Jan 2023 • David Cheikhi, Daniel Russo

Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data.

Paper
Add Code

Optimizing Audio Recommendations for the Long-Term: A Reinforcement Learning Perspective

no code implementations • 7 Feb 2023 • Lucas Maystre, Daniel Russo, Yu Zhao

We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months.

Recommendation Systems reinforcement-learning +1

Paper
Add Code

An Information-Theoretic Analysis of Nonstationary Bandit Learning

no code implementations • 9 Feb 2023 • Seungki Min, Daniel Russo

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves.

Paper
Add Code

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization

1 code implementation • 20 Jun 2023 • Matias Alvo, Daniel Russo, Yash Kanoria

The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.

Management Reinforcement Learning (RL) +1

Paper
Code

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification

no code implementations • 16 Feb 2024 • Chao Qin, Daniel Russo

Practitioners conducting adaptive experiments often encounter two competing priorities: reducing the cost of experimentation by effectively assigning treatments during the experiment itself, and gathering information swiftly to conclude the experiment and implement a treatment across the population.

Thompson Sampling

Paper
Add Code

On the Limited Representational Power of Value Functions and its Links to Statistical (In)Efficiency

no code implementations • 11 Mar 2024 • David Cheikhi, Daniel Russo

In several, there is no information loss and value-based methods are as statistically efficient as model based ones.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.