Search Results for author: Maurits Kaptein

Found 11 papers, 6 papers with code

Provably Efficient Exploration in Constrained Reinforcement Learning:Posterior Sampling Is All You Need

no code implementations27 Sep 2023 Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

We present a new algorithm based on posterior sampling for learning in constrained Markov decision processes (CMDP) in the infinite-horizon undiscounted setting.

Efficient Exploration

The Impact of Batch Learning in Stochastic Linear Bandits

1 code implementation14 Feb 2022 Danil Provodin, Pratik Gajane, Mykola Pechenizkiy, Maurits Kaptein

Our main theoretical results show that the impact of batch learning is a multiplicative factor of batch size relative to the regret of online behavior.

Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

no code implementations21 Aug 2019 Jules Kruijswijk, Petri Parvinen, Maurits Kaptein

We propose and evaluate an extension of the existing method such that it can be used to evaluate CAB policies.

Decision Making

Continuous-Time Birth-Death MCMC for Bayesian Regression Tree Models

1 code implementation19 Apr 2019 Reza Mohammadi, Matthew Pratola, Maurits Kaptein

In a Bayesian framework for regression trees, Markov Chain Monte Carlo (MCMC) search algorithms are required to generate samples of tree models according to their posterior probabilities.

regression

contextual: Evaluating Contextual Multi-Armed Bandit Problems in R

no code implementations6 Nov 2018 Robin van Emden, Maurits Kaptein

Over the past decade, contextual bandit algorithms have been gaining in popularity due to their effectiveness and flexibility in solving sequential decision problems---from online advertising and finance to clinical trial design and personalized medicine.

Object

StreamingBandit; Experimenting with Bandit Policies

2 code implementations22 Feb 2016 Jules Kruijswijk, Robin van Emden, Petri Parvinen, Maurits Kaptein

A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem.

Human-Computer Interaction Computers and Society

Lock in Feedback in Sequential Experiments

no code implementations2 Feb 2015 Maurits Kaptein, Davide Iannuzzi

We often encounter situations in which an experimenter wants to find, by sequential experimentation, $x_{max} = \arg\max_{x} f(x)$, where $f(x)$ is a (possibly unknown) function of a well controllable variable $x$.

Thompson sampling with the online bootstrap

no code implementations15 Oct 2014 Dean Eckles, Maurits Kaptein

Subsequently, we detail why BTS using the online bootstrap is more scalable than regular Thompson sampling, and we show through simulation that BTS is more robust to a misspecified error distribution.

Thompson Sampling

Cannot find the paper you are looking for? You can Submit a new open access paper.