no code implementations • 10 Jul 2020 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
We consider the problem of exploration-exploitation in communicating Markov Decision Processes.
no code implementations • 30 Jan 2020 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
We investigate concentration inequalities for Dirichlet and Multinomial random variables.
1 code implementation • NeurIPS 2019 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).
no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.
no code implementations • 11 Dec 2018 • Jian Qian, Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
We introduce and analyse two algorithms for exploration-exploitation in discrete and continuous Markov Decision Processes (MDPs) based on exploration bonuses.
1 code implementation • NeurIPS 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric
While designing the state space of an MDP, it is common to include states that are transient or not reachable by any policy (e. g., in mountain car, the product space of speed and position contains configurations that are not physically reachable).
1 code implementation • ICML 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.
no code implementations • NeurIPS 2017 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Emma Brunskill
The option framework integrates temporal abstraction into the reinforcement learning model through the introduction of macro-actions (i. e., options).
no code implementations • 25 Mar 2017 • Ronan Fruit, Alessandro Lazaric
While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited.