no code implementations • 3 Apr 2024 • Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche
Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time.
no code implementations • 5 Feb 2024 • Hans Riess, Manolis Veveakis, Michael M. Zavlanos
The path signature, having enjoyed recent success in the machine learning community, is a theoretically-driven method for engineering features from irregular paths.
no code implementations • 15 Sep 2023 • Yi Shen, Pan Xu, Michael M. Zavlanos
To overcome these limitations, we propose a novel DRO approach that employs the Wasserstein distance instead.
no code implementations • 23 Mar 2023 • Zifan Wang, Yulong Gao, Siyi Wang, Michael M. Zavlanos, Alessandro Abate, Karl H. Johansson
Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL.
no code implementations • 9 Sep 2022 • Yi Shen, Jessilyn Dunn, Michael M. Zavlanos
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB.
no code implementations • 6 Sep 2022 • Zifan Wang, Yi Shen, Zachary I. Bell, Scott Nivison, Michael M. Zavlanos, Karl H. Johansson
Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions.
no code implementations • 16 Mar 2022 • Zifan Wang, Yi Shen, Michael M. Zavlanos
To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions.
no code implementations • 22 Jun 2021 • Panagiotis Vlantis, Leila J. Bridgeman, Michael M. Zavlanos
As a result, our method can learn a safe vector field for the closed-loop system and, at the same time, provide worst-case bounds on safety violation over the whole configuration space, defined by the overlap between the over-approximation of the forward reachable set of the closed-loop system and the set of unsafe states.
no code implementations • 7 Jun 2021 • Chenyu Liu, Yan Zhang, Yi Shen, Michael M. Zavlanos
We assume that this context is not accessible to a learner agent who can only observe the expert data.
no code implementations • 8 Mar 2021 • Shiqi Sun, Yan Zhang, Xusheng Luo, Panagiotis Vlantis, Miroslav Pajic, Michael M. Zavlanos
Using this abstraction, we propose a method to compute tight bounds on the safety probabilities of nodes in this graph, despite possible over-approximations of the transition probabilities between these nodes.
no code implementations • 8 Feb 2021 • Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
By deriving distinct rewards and discount factors from the acceptance condition of the DPA, we reduce the maximization of the worst-case probability of satisfying the LTL specification into the maximization of a discounted reward objective in the product game; this enables the use of model-free RL algorithms to learn an optimal controller strategy.
no code implementations • 14 Jan 2021 • Xusheng Luo, Michael M. Zavlanos
To obtain a scalable solution to this complex temporal logic task allocation problem, we propose a hierarchical approach that first allocates specific robots to tasks using the information about the tasks contained in the Nondeterministic Buchi Automaton (NBA) that captures the LTL specification, and then designs low-level executable plans for the robots that respect the high-level assignment.
Robotics
no code implementations • 8 Dec 2020 • Reza Khodayi-mehr, Matthew W. Urban, Michael M. Zavlanos, Wilkins Aquino
Currently, commercial methods for SWE rely on directional filtering based on the prior knowledge of the wave propagation direction, to remove complicated wave patterns formed due to reflection and refraction.
no code implementations • 14 Oct 2020 • Yan Zhang, Yi Zhou, Kaiyi Ji, Michael M. Zavlanos
As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems.
no code implementations • 18 Jun 2020 • Yan Zhang, Yi Zhou, Kaiyi Ji, Michael M. Zavlanos
When optimizing a deterministic Lipschitz function, we show that the query complexity of ZO with the proposed one-point residual feedback matches that of ZO with the existing two-point schemes.
no code implementations • 18 Jun 2020 • Yan Zhang, Michael M. Zavlanos
The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 9 Mar 2020 • Yan Zhang, Michael M. Zavlanos
Then, the goal is to transfer this experience, excluding the underlying contextual information, to a learner agent that does not have access to the environmental context, so that they can learn a control policy using fewer samples.
1 code implementation • L4DC 2020 • Reza Khodayi-mehr, Michael M. Zavlanos
In this paper we propose a new model-based unsupervised learning method, called VarNet, for the solution of partial differential equations (PDEs) using deep neural networks (NNs).
no code implementations • 12 Nov 2019 • Yan Zhang, Robert J. Ravier, Michael M. Zavlanos, Vahid Tarokh
In this paper, we consider the problem of distributed online convex optimization, where a network of local agents aim to jointly optimize a convex function over a period of multiple time steps.
2 code implementations • 16 Sep 2019 • Alper Kamil Bozkurt, Yu Wang, Michael M. Zavlanos, Miroslav Pajic
We present a reinforcement learning (RL) framework to synthesize a control policy from a given linear temporal logic (LTL) specification in an unknown stochastic environment that can be modeled as a Markov Decision Process (MDP).
no code implementations • 21 Mar 2019 • Yan Zhang, Michael M. Zavlanos
In this paper, we propose a distributed off-policy actor critic method to solve multi-agent reinforcement learning problems.
Multi-agent Reinforcement Learning reinforcement-learning +1
no code implementations • 11 Dec 2018 • Reza Khodayi-mehr, Michael M. Zavlanos
Unlike passive cloaking methods that use metamaterials to steer the mass flux, our method is the first to use mobile robots to actively control the concentration levels and create safe zones independent of environmental conditions.
no code implementations • 10 Dec 2018 • Reza Khodayi-mehr, Michael M. Zavlanos
We propose a physics-based method to learn environmental fields (EFs) using a mobile robot.