2 code implementations • 7 Jul 2017 • Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.
1 code implementation • 7 Oct 2020 • Paul Ralph, Nauman bin Ali, Sebastian Baltes, Domenico Bianculli, Jessica Diaz, Yvonne Dittrich, Neil Ernst, Michael Felderer, Robert Feldt, Antonio Filieri, Breno Bernard Nicolau de França, Carlo Alberto Furia, Greg Gay, Nicolas Gold, Daniel Graziotin, Pinjia He, Rashina Hoda, Natalia Juristo, Barbara Kitchenham, Valentina Lenarduzzi, Jorge Martínez, Jorge Melegati, Daniel Mendez, Tim Menzies, Jefferson Molleri, Dietmar Pfahl, Romain Robbes, Daniel Russo, Nyyti Saarimäki, Federica Sarro, Janet Siegmund, Diomidis Spinellis, Miroslaw Staron, Klaas Stol, Margaret-Anne Storey, Davide Taibi, Damian Tamburri, Marco Torchiano, Christoph Treude, Burak Turhan, XiaoFeng Wang, Sira Vegas
Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e. g. a questionnaire survey).
Software Engineering General Literature
1 code implementation • 19 Jul 2023 • Thomas M. McDonald, Lucas Maystre, Mounia Lalmas, Daniel Russo, Kamil Ciosek
In this context, we study a content exploration task, which we formalize as a multi-armed bandit problem with delayed rewards.
1 code implementation • 29 Aug 2023 • Daniel Russo, Serra Sinem Tekiroglu, Marco Guerini
Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances.
1 code implementation • 17 Nov 2023 • Daniel Russo, Shane Peter Kaszefski-Yaschuk, Jacopo Staiano, Marco Guerini
The proliferation of misinformation on social media platforms (SMPs) poses a significant danger to public health, social cohesion and ultimately democracy.
no code implementations • 26 Feb 2016 • Daniel Russo
This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.
no code implementations • 6 Jun 2018 • Jalaj Bhandari, Daniel Russo, Raghav Singal
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.
no code implementations • 22 Mar 2017 • Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen
We study the use of randomized value functions to guide deep exploration in reinforcement learning.
no code implementations • 7 Mar 2018 • Daniel Russo, Benjamin Van Roy
Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action.
no code implementations • NeurIPS 2014 • Daniel Russo, Benjamin Van Roy
We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.
no code implementations • NeurIPS 2017 • Chao Qin, Diego Klabjan, Daniel Russo
To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm.
no code implementations • 28 Apr 2017 • Daniel Russo, David Tse, Benjamin Van Roy
We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.
no code implementations • 16 Nov 2015 • Daniel Russo, James Zou
But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.
no code implementations • 21 Mar 2014 • Daniel Russo, Benjamin Van Roy
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback.
no code implementations • 11 Jan 2013 • Daniel Russo, Benjamin Van Roy
This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems.
no code implementations • NeurIPS 2013 • Ian Osband, Daniel Russo, Benjamin Van Roy
This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.
no code implementations • NeurIPS 2013 • Daniel Russo, Benjamin Van Roy
This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms.
no code implementations • 9 Apr 2019 • Daniel Russo
This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms.
no code implementations • 5 Jun 2019 • Jalaj Bhandari, Daniel Russo
Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.
no code implementations • NeurIPS 2019 • Daniel Russo
This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.
no code implementations • 4 Sep 2019 • Paolo Ciancarini, Andrea Giovanni Nuzzolese, Valentina Presutti, Daniel Russo
The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards.
no code implementations • 21 Jul 2020 • Jalaj Bhandari, Daniel Russo
We revisit the finite time analysis of policy gradient methods in the one of the simplest settings: finite state and action MDPs with a policy class consisting of all stochastic policies and with exact gradient evaluations.
no code implementations • 22 Jul 2020 • Daniel Russo
With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as $\epsilon/(1-\gamma)$, where $\gamma$ is a discount factor.
no code implementations • 19 Feb 2021 • Daniel Russo, Assaf Zeevi, Tianyi Zhang
We consider a discounted infinite horizon optimal stopping problem.
no code implementations • 18 Feb 2022 • Chao Qin, Daniel Russo
We investigate experiments that are designed to select a treatment arm for population deployment.
no code implementations • 30 Jan 2023 • David Cheikhi, Daniel Russo
Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data.
no code implementations • 7 Feb 2023 • Lucas Maystre, Daniel Russo, Yu Zhao
We study the problem of optimizing a recommender system for outcomes that occur over several weeks or months.
no code implementations • 9 Feb 2023 • Seungki Min, Daniel Russo
In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves.
1 code implementation • 20 Jun 2023 • Matias Alvo, Daniel Russo, Yash Kanoria
The first is Hindsight Differentiable Policy Optimization (HDPO), which performs stochastic gradient descent to optimize policy performance while avoiding the need to repeatedly deploy randomized policies in the environment-as is common with generic policy gradient methods.
no code implementations • 16 Feb 2024 • Chao Qin, Daniel Russo
Practitioners conducting adaptive experiments often encounter two competing priorities: reducing the cost of experimentation by effectively assigning treatments during the experiment itself, and gathering information swiftly to conclude the experiment and implement a treatment across the population.
no code implementations • 11 Mar 2024 • David Cheikhi, Daniel Russo
In several, there is no information loss and value-based methods are as statistically efficient as model based ones.