no code implementations • 31 May 2024 • Shangding Gu, Laixi Shi, Yuhao Ding, Alois Knoll, Costas Spanos, Adam Wierman, Ming Jin
Safe reinforcement learning (RL) is crucial for deploying RL agents in real-world applications, as it aims to maximize long-term rewards while satisfying safety constraints.
1 code implementation • 26 May 2024 • Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, QIngwei Lin, Alois Knoll, Ming Jin
In numerous reinforcement learning (RL) problems involving safety-critical systems, a key challenge lies in balancing multiple objectives while simultaneously meeting all stringent safety constraints.
Multi-Objective Reinforcement Learning reinforcement-learning +1
no code implementations • 26 May 2024 • Vanshaj Khattar, Yuhao Ding, Bilgehan Sel, Javad Lavaei, Ming Jin
Meta-reinforcement learning has widely been used as a learning-to-learn framework to solve unseen tasks with limited experience.
3 code implementations • 2 May 2024 • Shangding Gu, Bilgehan Sel, Yuhao Ding, Lu Wang, QIngwei Lin, Ming Jin, Alois Knoll
Ensuring the safety of Reinforcement Learning (RL) is crucial for its deployment in real-world applications.
no code implementations • 24 Feb 2023 • Jiajun Zhou, Jiajun Wu, Yizhao Gao, Yuhao Ding, Chaofan Tao, Boyu Li, Fengbin Tu, Kwang-Ting Cheng, Hayden Kwok-Hay So, Ngai Wong
To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched.
no code implementations • 15 Feb 2023 • Donghao Ying, Yuhao Ding, Alec Koppel, Javad Lavaei
The objective is to find a localized policy that maximizes the average of the team's local utility functions without the full observability of each agent in the team.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 19 Nov 2022 • Yuhao Ding, Ming Jin, Javad Lavaei
We study risk-sensitive reinforcement learning (RL) based on an entropic risk measure in episodic non-stationary Markov decision processes (MDPs).
1 code implementation • 22 May 2022 • Donghao Ying, Mengzi Amy Guo, Hyunin Lee, Yuhao Ding, Javad Lavaei, Zuo-Jun Max Shen
In the exact setting, we prove an $O(T^{-1/3})$ convergence rate for both the average optimality gap and constraint violation, which further improves to $O(T^{-1/2})$ under strong concavity of the objective in the occupancy measure.
no code implementations • 28 Jan 2022 • Yuhao Ding, Javad Lavaei
We consider primal-dual-based reinforcement learning (RL) in episodic constrained Markov decision processes (CMDPs) with non-stationary objectives and constraints, which plays a central role in ensuring the safety of RL in time-varying environments.
no code implementations • 19 Oct 2021 • Yuhao Ding, Junzi Zhang, Hyunin Lee, Javad Lavaei
Our result is the first global convergence and sample complexity results for the stochastic entropy-regularized vanilla PG method.
no code implementations • 19 Oct 2021 • Yuhao Ding, Junzi Zhang, Javad Lavaei
For the generic Fisher-non-degenerate policy parametrizations, our result is the first single-loop and finite-batch PG algorithm achieving $\tilde{O}(\epsilon^{-3})$ global optimality sample complexity.
no code implementations • 17 Oct 2021 • Donghao Ying, Yuhao Ding, Javad Lavaei
We study entropy-regularized constrained Markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the entropy-regularized value function while satisfying constraints on the expected total utility.
no code implementations • 25 Aug 2021 • Yuhao Ding, Yik-Cheung Tam
In multi-domain task-oriented dialog system, user utterances and system responses may mention multiple named entities and attributes values.