no code implementations • 24 Jun 2018 • Aristide C. Y. Tossou, Christos Dimitrakakis
This compares favorably to the previous result for Thompson Sampling in the literature ((Mishra & Thakurta, 2015)) which adds a term of $\mathcal{O}(\frac{K \ln^3 T}{\epsilon^2})$ to the regret in order to achieve the same privacy level.
no code implementations • 16 Jan 2017 • Aristide C. Y. Tossou, Christos Dimitrakakis, Devdatt Dubhashi
We present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing.
no code implementations • 16 Jan 2017 • Aristide C. Y. Tossou, Christos Dimitrakakis
This allows us to reach $\mathcal{O}{(\sqrt{\ln T})}$-DP, with a regret of $\mathcal{O}{(T^{2/3})}$ that holds against an adaptive adversary, an improvement from the best known of $\mathcal{O}{(T^{3/4})}$.
no code implementations • 14 Jul 2013 • Aristide C. Y. Tossou, Christos Dimitrakakis
To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents.