Search Results for author: Michael N. Katehakis

Found 10 papers, 0 papers with code

Optimal Activation of Halting Multi-Armed Bandit Models

no code implementations20 Apr 2023 Wesley Cowan, Michael N. Katehakis, Sheldon M. Ross

We study new types of dynamic allocation problems the {\sl Halting Bandit} models.

Multi-Armed Bandits

Accelerating the Computation of UCB and Related Indices for Reinforcement Learning

no code implementations28 Sep 2019 Wesley Cowan, Michael N. Katehakis, Daniel Pirutinsky

In this paper we derive an efficient method for computing the indices associated with an asymptotically optimal upper confidence bound algorithm (MDP-UCB) of Burnetas and Katehakis (1997) that only requires solving a system of two non-linear equations with two unknowns, irrespective of the cardinality of the state space of the Markovian decision process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

no code implementations13 Sep 2019 Wesley Cowan, Michael N. Katehakis, Daniel Pirutinsky

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities.

reinforcement-learning Reinforcement Learning (RL)

Optimal Data Driven Resource Allocation under Multi-Armed Bandit Observations

no code implementations30 Nov 2018 Apostolos N. Burnetas, Odysseas Kanavetas, Michael N. Katehakis

This paper introduces the first asymptotically optimal strategy for a multi armed bandit (MAB) model under side constraints.

Inventory Control Involving Unknown Demand of Discrete Nonperishable Items - Analysis of a Newsvendor-based Policy

no code implementations22 Oct 2015 Michael N. Katehakis, Jian Yang, Tingting Zhou

Inventory control with unknown demand distribution is considered, with emphasis placed on the case involving discrete nonperishable items.

Asymptotically Optimal Sequential Experimentation Under Generalized Ranking

no code implementations7 Oct 2015 Wesley Cowan, Michael N. Katehakis

We consider the \mnk{classical} problem of a controller activating (or sampling) sequentially from a finite number of $N \geq 2$ populations, specified by unknown distributions.

Asymptotically Optimal Multi-Armed Bandit Policies under a Cost Constraint

no code implementations9 Sep 2015 Apostolos N. Burnetas, Odysseas Kanavetas, Michael N. Katehakis

Then we construct a class of f-UF policies and provide conditions under which they are asymptotically optimal within the class of f-UF policies, achieving this asymptotic lower bound.

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

no code implementations12 May 2015 Wesley Cowan, Michael N. Katehakis

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies $\pi$.

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

no code implementations8 May 2015 Wesley Cowan, Michael N. Katehakis

The objective is to have a policy $\pi$ for deciding, based on available data, from which of the $N$ populations to sample from at any time $n=1, 2,\ldots$ so as to maximize the expected sum of outcomes of $n$ samples or equivalently to minimize the regret due to lack on information of the parameters $\{ a_i \}$ and $\{ b_i \}$.

Normal Bandits of Unknown Means and Variances: Asymptotic Optimality, Finite Horizon Regret Bounds, and a Solution to an Open Problem

no code implementations22 Apr 2015 Wesley Cowan, Junya Honda, Michael N. Katehakis

Consider the problem of sampling sequentially from a finite number of $N \geq 2$ populations, specified by random variables $X^i_k$, $ i = 1,\ldots , N,$ and $k = 1, 2, \ldots$; where $X^i_k$ denotes the outcome from population $i$ the $k^{th}$ time it is sampled.

Cannot find the paper you are looking for? You can Submit a new open access paper.