no code implementations • 20 Jun 2019 • Aristide Tossou, Christos Dimitrakakis, Debabrota Basu
We derive the first polynomial time Bayesian algorithm, BUCRL{} that achieves up to logarithm factors, a regret (i. e the difference between the accumulated rewards of the optimal policy and our algorithm) of the optimal order $\tilde{\mathcal{O}}(\sqrt{DSAT})$.
no code implementations • 4 Jun 2019 • Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann
We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution.
no code implementations • 29 May 2019 • Debabrota Basu, Christos Dimitrakakis, Aristide Tossou
We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions.
no code implementations • 27 May 2019 • Aristide Tossou, Debabrota Basu, Christos Dimitrakakis
We study model-based reinforcement learning in an unknown finite communicating Markov decision process.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 30 Jul 2017 • Philip Ekman, Sebastian Bellevik, Christos Dimitrakakis, Aristide Tossou
One specific such problem involves matching a set of workers to a set of tasks.
no code implementations • 27 Nov 2015 • Aristide Tossou, Christos Dimitrakakis
This is a significant improvement over previous results, which only achieve poly-log regret $O(\epsilon^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism.
no code implementations • 9 Aug 2014 • Aristide Tossou, Christos Dimitrakakis
To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents.