Search Results for author: Daniel Russo

Found 21 papers, 2 papers with code

Learning to Stop with Surprisingly Few Samples

no code implementations19 Feb 2021 Daniel Russo, Assaf Zeevi, Tianyi Zhang

We consider a discounted infinite horizon optimal stopping problem.

Approximation Benefits of Policy Gradient Methods with Aggregated States

no code implementations22 Jul 2020 Daniel Russo

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration.

Policy Gradient Methods

A Note on the Linear Convergence of Policy Gradient Methods

no code implementations21 Jul 2020 Jalaj Bhandari, Daniel Russo

This note instead takes a policy iteration perspective and highlights that many versions of policy gradient succeed with extremely large stepsizes and attain a linear rate of convergence.

Policy Gradient Methods

SQuAP-Ont: an Ontology of Software Quality Relational Factors from Financial Systems

no code implementations4 Sep 2019 Paolo Ciancarini, Andrea Giovanni Nuzzolese, Valentina Presutti, Daniel Russo

The SQuAP model (Software Quality, Architecture, Process) describes twenty-eight main factors that impact on software quality in banking systems, and each factor is described as a relation among some characteristics from the three ISO standards.

Worst-Case Regret Bounds for Exploration via Randomized Value Functions

no code implementations NeurIPS 2019 Daniel Russo

This paper studies a recent proposal to use randomized value functions to drive exploration in reinforcement learning.

Efficient Exploration

Global Optimality Guarantees For Policy Gradient Methods

no code implementations5 Jun 2019 Jalaj Bhandari, Daniel Russo

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices.

Policy Gradient Methods

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents

no code implementations9 Apr 2019 Daniel Russo

This note gives a short, self-contained, proof of a sharp connection between Gittins indices and Bayesian upper confidence bound algorithms.

A Finite Time Analysis of Temporal Difference Learning With Linear Function Approximation

no code implementations6 Jun 2018 Jalaj Bhandari, Daniel Russo, Raghav Singal

Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process.

Q-Learning

Satisficing in Time-Sensitive Bandit Learning

no code implementations7 Mar 2018 Daniel Russo, Benjamin Van Roy

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action.

A Tutorial on Thompson Sampling

3 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation

Improving the Expected Improvement Algorithm

no code implementations NeurIPS 2017 Chao Qin, Diego Klabjan, Daniel Russo

To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm.

Time-Sensitive Bandit Learning and Satisficing Thompson Sampling

no code implementations28 Apr 2017 Daniel Russo, David Tse, Benjamin Van Roy

We propose satisficing Thompson sampling -- a variation of Thompson sampling -- and establish a strong discounted regret bound for this new algorithm.

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration

Simple Bayesian Algorithms for Best Arm Identification

no code implementations26 Feb 2016 Daniel Russo

This paper proposes three simple and intuitive Bayesian algorithms for adaptively allocating measurement effort, and formalizes a sense in which these seemingly naive rules are the best possible.

How much does your data exploration overfit? Controlling bias via information usage

no code implementations16 Nov 2015 Daniel Russo, James Zou

But while %the adaptive nature of exploration any data-exploration renders standard statistical theory invalid, experience suggests that different types of exploratory analysis can lead to disparate levels of bias, and the degree of bias also depends on the particulars of the data set.

Learning to Optimize via Information-Directed Sampling

no code implementations NeurIPS 2014 Daniel Russo, Benjamin Van Roy

We propose information-directed sampling -- a new approach to online optimization problems in which a decision-maker must balance between exploration and exploitation while learning from partial feedback.

An Information-Theoretic Analysis of Thompson Sampling

no code implementations21 Mar 2014 Daniel Russo, Benjamin Van Roy

We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback.

Eluder Dimension and the Sample Complexity of Optimistic Exploration

no code implementations NeurIPS 2013 Daniel Russo, Benjamin Van Roy

This paper considers the sample complexity of the multi-armed bandit with dependencies among the arms.

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations NeurIPS 2013 Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration

Learning to Optimize Via Posterior Sampling

no code implementations11 Jan 2013 Daniel Russo, Benjamin Van Roy

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.