no code implementations • 4 Feb 2025 • Victor Villin, Thomas Kleine Buening, Christos Dimitrakakis
We propose a minimax-Bayes approach to Ad Hoc Teamwork (AHT) that optimizes policies against an adversarial prior over partners, explicitly accounting for uncertainty about partners at time of deployment.
1 code implementation • 6 Jan 2025 • Andreas Athanasopoulos, Anne-Marie George, Christos Dimitrakakis
We consider a learning problem for the stable marriage model under unknown preferences for the left side of the market.
no code implementations • 30 Dec 2024 • Emilio Jorge, Christos Dimitrakakis, Debabrota Basu
First, we show that the Posterior Sampling-based RL (PSRL) yields sublinear regret if the data distributions satisfy LSI under some mild additional assumptions.
no code implementations • 1 Jun 2024 • Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can strategically misreport privately observed contexts to the learner.
1 code implementation • 18 Dec 2023 • Anne-Marie George, Christos Dimitrakakis
Furthermore, if all agents' preferences are strict rankings over the alternatives, we provide means to prune confidence intervals and thereby guide a more efficient elicitation.
no code implementations • 27 Nov 2023 • Thomas Kleine Buening, Aadirupa Saha, Christos Dimitrakakis, Haifeng Xu
We study a strategic variant of the multi-armed bandit problem, which we coin the strategic click-bandit.
1 code implementation • 21 Feb 2023 • Thomas Kleine Buening, Christos Dimitrakakis, Hannes Eriksson, Divya Grover, Emilio Jorge
While the Bayesian decision-theoretic framework offers an elegant solution to the problem of decision making under uncertainty, one question is how to appropriately select the prior distribution.
no code implementations • 18 Feb 2023 • Hannes Eriksson, Debabrota Basu, Tommy Tram, Mina Alibeigi, Christos Dimitrakakis
Then, we propose a generic two-stage algorithm, MLEMTRL, to address the MTRL problem in discrete and continuous settings.
1 code implementation • 26 Oct 2022 • Thomas Kleine Buening, Victor Villin, Christos Dimitrakakis
Even with abundant data, current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics.
no code implementations • 18 Mar 2022 • Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis
In existing literature, the risk in stochastic games has been studied in terms of the inherent uncertainty evoked by the variability of transitions and actions.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 8 Nov 2021 • Thomas Kleine Buening, Anne-Marie George, Christos Dimitrakakis
How should the first agent act in order to learn the joint reward function as quickly as possible and so that the joint policy is as close to optimal as possible?
no code implementations • 23 Apr 2021 • Hannes Eriksson, Christos Dimitrakakis, Lars Carlsson
We study the problem of performing automated experiment design for drug screening through Bayesian inference and optimisation.
1 code implementation • 15 Apr 2021 • Divya Grover, Christos Dimitrakakis
We instead propose an adaptive belief discretization scheme, and give its associated planning error.
no code implementations • 23 Feb 2021 • Thomas Kleine Buening, Meirav Segal, Debabrota Basu, Christos Dimitrakakis, Anne-Marie George
Typically, merit is defined with respect to some intrinsic measure of worth.
no code implementations • 22 Feb 2021 • Hannes Eriksson, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis
In this paper, we consider risk-sensitive sequential decision-making in Reinforcement Learning (RL).
no code implementations • NeurIPS Workshop ICBINB 2020 • Hannes Eriksson, Emilio Jorge, Christos Dimitrakakis, Debabrota Basu, Divya Grover
Bayesian reinforcement learning (BRL) offers a decision-theoretic solution for reinforcement learning.
no code implementations • 20 Jun 2019 • Aristide Tossou, Christos Dimitrakakis, Debabrota Basu
We derive the first polynomial time Bayesian algorithm, BUCRL{} that achieves up to logarithm factors, a regret (i. e the difference between the accumulated rewards of the optimal policy and our algorithm) of the optimal order $\tilde{\mathcal{O}}(\sqrt{DSAT})$.
no code implementations • 14 Jun 2019 • Hannes Eriksson, Christos Dimitrakakis
The risk-averse behavior is then compared with the behavior of the optimal risk-neutral policy in environments with epistemic risk.
no code implementations • 4 Jun 2019 • Aristide Tossou, Christos Dimitrakakis, Jaroslaw Rzepecki, Katja Hofmann
We study two-player general sum repeated finite games where the rewards of each player are generated from an unknown distribution.
no code implementations • 29 May 2019 • Debabrota Basu, Christos Dimitrakakis, Aristide Tossou
We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions.
no code implementations • 27 May 2019 • Aristide Tossou, Debabrota Basu, Christos Dimitrakakis
We study model-based reinforcement learning in an unknown finite communicating Markov decision process.
Model-based Reinforcement Learning
reinforcement-learning
+2
no code implementations • 6 Apr 2019 • Nikolaos Tziortziotis, Christos Dimitrakakis, Michalis Vazirgiannis
We introduce Bayesian least-squares policy iteration (BLSPI), an off-policy, model-free, policy iteration algorithm that uses the Bayesian least-squares temporal-difference (BLSTD) learning algorithm to evaluate policies.
1 code implementation • 7 Feb 2019 • Divya Grover, Debabrota Basu, Christos Dimitrakakis
We address the problem of Bayesian reinforcement learning using efficient model-based online planning.
no code implementations • 24 Jun 2018 • Aristide C. Y. Tossou, Christos Dimitrakakis
This compares favorably to the previous result for Thompson Sampling in the literature ((Mishra & Thakurta, 2015)) which adds a term of $\mathcal{O}(\frac{K \ln^3 T}{\epsilon^2})$ to the regret in order to achieve the same privacy level.
no code implementations • NeurIPS 2017 • Christos Dimitrakakis, David C. Parkes, Goran Radanovic, Paul Tylkin
We consider a two-player sequential game in which agents have the same reward function but may disagree on the transition probabilities of an underlying Markovian model of the world.
no code implementations • 30 Jul 2017 • Philip Ekman, Sebastian Bellevik, Christos Dimitrakakis, Aristide Tossou
One specific such problem involves matching a set of workers to a set of tasks.
no code implementations • 6 Jul 2017 • Yang Liu, Goran Radanovic, Christos Dimitrakakis, Debmalya Mandal, David C. Parkes
In addition, we define the {\em fairness regret}, which corresponds to the degree to which an algorithm is not calibrated, where perfect calibration requires that the probability of selecting an arm is equal to the probability with which the arm has the best quality realization.
no code implementations • 31 May 2017 • Christos Dimitrakakis, Yang Liu, David Parkes, Goran Radanovic
We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty.
no code implementations • 16 Jan 2017 • Aristide C. Y. Tossou, Christos Dimitrakakis
This allows us to reach $\mathcal{O}{(\sqrt{\ln T})}$-DP, with a regret of $\mathcal{O}{(T^{2/3})}$ that holds against an adaptive adversary, an improvement from the best known of $\mathcal{O}{(T^{3/4})}$.
no code implementations • 16 Jan 2017 • Aristide C. Y. Tossou, Christos Dimitrakakis, Devdatt Dubhashi
We present a novel extension of Thompson Sampling for stochastic sequential decision problems with graph feedback, even when the graph structure itself is unknown and/or changing.
no code implementations • 22 Dec 2015 • Zuhe Zhang, Benjamin Rubinstein, Christos Dimitrakakis
We study how to communicate findings of Bayesian inference to third parties, while preserving the strong guarantee of differential privacy.
no code implementations • 27 Nov 2015 • Aristide Tossou, Christos Dimitrakakis
This is a significant improvement over previous results, which only achieve poly-log regret $O(\epsilon^{-2} \log^{2} T)$, because of our use of a novel interval-based mechanism.
no code implementations • 10 Dec 2014 • Emmanouil G. Androulakis, Christos Dimitrakakis
Bayesian methods suffer from the problem of how to specify prior beliefs.
no code implementations • 9 Aug 2014 • Aristide Tossou, Christos Dimitrakakis
To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents.
no code implementations • 14 Jul 2013 • Aristide C. Y. Tossou, Christos Dimitrakakis
To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents.
no code implementations • 5 Jun 2013 • Christos Dimitrakakis, Blaine Nelson, and Zuhe Zhang, Aikaterini Mitrokotsa, Benjamin Rubinstein
All our general results hold for arbitrary database metrics, including those for the common definition of differential privacy.
no code implementations • 8 May 2013 • Nikolaos Tziortziotis, Christos Dimitrakakis, Konstantinos Blekas
This paper proposes an online tree-based Bayesian approach for reinforcement learning.
no code implementations • 27 Mar 2013 • Christos Dimitrakakis, Nikolaos Tziortziotis
This paper introduces a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC).
no code implementations • 4 Mar 2013 • Florent Garcin, Christos Dimitrakakis, Boi Faltings
The profusion of online news articles makes it difficult to find interesting articles, a problem that can be assuaged by using a recommender system to bring the most relevant news stories to readers.