no code implementations • ICML 2020 • Yuxuan Xie, Jilles Dibangoye, Olivier Buffet
Optimally solving decentralized partially observable Markov decision processes under either full or no information sharing received significant attention in recent years.
no code implementations • ICML 2020 • Yuxuan Xie, Jilles Dibangoye, Olivier Buffet
Optimally solving decentralized partially observable Markov decision processes under either full or no information sharing received significant attention in recent years.
no code implementations • 17 Apr 2024 • Salomé Lepers, Sophie Lemonnier, Vincent Thomas, Olivier Buffet
This paper looks at predictability problems, i. e., wherein an agent must choose its strategy in order to optimize the predictions that an external observer could make.
no code implementations • 5 Feb 2024 • Johan Peralez, Aurélien Delage, Olivier Buffet, Jilles S. Dibangoye
A recent theory shows that a multi-player decentralized partially observable Markov decision process can be transformed into an equivalent single-player game, enabling the application of \citeauthor{bellman}'s principle of optimality to solve the single-player game by breaking it down into single-stage subgames.
no code implementations • 19 May 2023 • Yang You, Vincent Thomas, Francis Colas, Olivier Buffet
Decentralized partially observable Markov decision processes (Dec-POMDPs) formalize the problem of designing individual controllers for a group of collaborative agents under stochastic dynamics and partial observability.
no code implementations • 27 Feb 2023 • Yang You, Vincent Thomas, Francis Colas, Rachid Alami, Olivier Buffet
Based on this, we propose two contributions: 1) an approach to automatically generate an uncertain human behavior (a policy) for each given objective function while accounting for possible robot behaviors; and 2) a robot planning algorithm that is robust to the above-mentioned uncertainties and relies on solving a partially observable Markov decision process (POMDP) obtained by reasoning on a distribution over human behaviors.
no code implementations • 26 Oct 2022 • Aurélien Delage, Olivier Buffet, Jilles S. Dibangoye, Abdallah Saffidine
State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems.
no code implementations • 25 Oct 2021 • Aurélien Delage, Olivier Buffet, Jilles Dibangoye
Dynamic programming and heuristic search are at the core of state-of-the-art solvers for sequential decision-making problems.
no code implementations • 17 Sep 2021 • Yang You, Vincent Thomas, Francis Colas, Olivier Buffet
This paper looks at solving collaborative planning problems formalized as Decentralized POMDPs (Dec-POMDPs) by searching for Nash equilibria, i. e., situations where each agent's policy is a best response to the other agents' (fixed) policies.
no code implementations • 21 Mar 2021 • Vincent Thomas, Gérémy Hutin, Olivier Buffet
In this article, we discuss how to solve information-gathering problems expressed as rho-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward rho depends on the belief state.
no code implementations • 29 Jun 2020 • Olivier Buffet, Jilles Dibangoye, Aurélien Delage, Abdallah Saffidine, Vincent Thomas
Many non-trivial sequential decision-making problems are efficiently solved by relying on Bellman's optimality principle, i. e., exploiting the fact that sub-problems are nested recursively within the original problem.
no code implementations • 29 May 2020 • Olivier Buffet, Olivier Pietquin, Paul Weng
Reinforcement learning (RL) is a general framework for adaptive control, which has proven to be efficient in many domains, e. g., board games, video games or autonomous vehicles.
no code implementations • NeurIPS 2018 • Mathieu Fehr, Olivier Buffet, Vincent Thomas, Jilles Dibangoye
In this paper, we focus on POMDPs and ρ-POMDPs with λ ρ -Lipschitz reward function, and demonstrate that, for finite horizons, the optimal value function is Lipschitz-continuous.
no code implementations • ICML 2018 • Jilles Dibangoye, Olivier Buffet
We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 31 Jul 2013 • Carlos Sarraute, Olivier Buffet, Joerg Hoffmann
Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks.
no code implementations • 30 Jul 2013 • Carlos Sarraute, Olivier Buffet, Joerg Hoffmann
Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks.
no code implementations • 19 Jun 2013 • Carlos Sarraute, Olivier Buffet, Joerg Hoffmann
Penetration Testing is a methodology for assessing network security, by generating and executing possible attacks.
no code implementations • NeurIPS 2010 • Mauricio Araya, Olivier Buffet, Vincent Thomas, Françcois Charpillet
Partially Observable Markov Decision Processes (POMDPs) model sequential decision-making problems under uncertainty and partial observability.