no code implementations • 7 Apr 2024 • Zetong Xuan, Alper Kamil Bozkurt, Miroslav Pajic, Yu Wang
In a widely-adopted surrogate reward approach, two discount factors are used to ensure that the expected return approximates the satisfaction probability of the LTL objective.
no code implementations • 26 Mar 2021 • Alper Kamil Bozkurt, Yu Wang, Miroslav Pajic
We study the problem of learning safe control policies that are also effective; i. e., maximizing the probability of satisfying a linear temporal logic (LTL) specification of a task, and the discounted reward capturing the (classic) control performance.
no code implementations • 8 Feb 2021 • Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy.
2 code implementations • 16 Sep 2019 • Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
We present a reinforcement learning (RL) framework to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP).