no code implementations • 15 Dec 2023 • Milad Kazemi, Mateo Perez, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Alvaro Velasquez
We present a modular approach to \emph{reinforcement learning} (RL) in environments consisting of simpler components evolving in parallel.
no code implementations • 14 Dec 2023 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Regular decision processes (RDPs) are a subclass of non-Markovian decision processes where the transition and reward functions are guarded by some regular property of the past (a lookback).
no code implementations • 18 Oct 2023 • Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning.
no code implementations • 14 Aug 2023 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Reinforcement learning (RL) is a powerful approach for training agents to perform tasks, but designing an appropriate reward mechanism is critical to its success.
no code implementations • 26 May 2023 • Rajeev Alur, Osbert Bastani, Kishor Jothimurugan, Mateo Perez, Fabio Somenzi, Ashutosh Trivedi
The difficulty of manually specifying reward functions has led to an interest in using linear temporal logic (LTL) to express objectives for reinforcement learning (RL).
no code implementations • 6 Aug 2022 • Abolfazl Lavaei, Mateo Perez, Milad Kazemi, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Majid Zamani
A key contribution is to leverage the convergence results for adversarial RL (minimax Q-learning) on finite stochastic arenas to provide control strategies maximizing the probability of satisfaction over the network of continuous-space systems.
no code implementations • 23 Jun 2022 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Recursion is the fundamental paradigm to finitely describe potentially infinite objects.
no code implementations • 6 May 2022 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
The surprising answer is that we have to pay significantly less when we instead expand the good-for-MDP property to alternating automata: like the nondeterministic GFM automata obtained from deterministic Rabin automata, the alternating good-for-MDP automata we produce from deterministic Streett automata are bi-linear in the the size of the deterministic automaton and its index, and can therefore be exponentially more succinct than minimal nondeterministic B\"uchi automata.
no code implementations • 16 Jun 2021 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
Reinforcement learning synthesizes controllers without prior knowledge of the system.
no code implementations • 12 Jun 2021 • Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, Dominik Wojtczak
We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs).
no code implementations • 2 Mar 2020 • Abolfazl Lavaei, Fabio Somenzi, Sadegh Soudjani, Ashutosh Trivedi, Majid Zamani
A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPs while providing probabilistic closeness guarantees.