no code implementations • 27 Aug 2024 • Shervin Khalafi, Dongsheng Ding, Alejandro Ribeiro
Diffusion models have attained prominence for their ability to synthesize a probability distribution for a given dataset via a diffusion process, enabling the generation of new data points with high fidelity.
no code implementations • 19 Aug 2024 • Sergio Rozada, Dongsheng Ding, Antonio G. Marques, Alejandro Ribeiro
Designing deterministic policy gradient methods in continuous state and action spaces is particularly challenging due to the lack of enumerable state-action pairs and the adoption of deterministic policies, hindering the application of existing policy gradient methods for constrained MDPs.
no code implementations • 29 May 2024 • Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding
The growing safety concerns surrounding Large Language Models (LLMs) raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety.
no code implementations • 28 Dec 2023 • Dongsheng Ding, Zhengyan Huan, Alejandro Ribeiro
It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making.
no code implementations • NeurIPS 2023 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Alejandro Ribeiro
To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
no code implementations • 31 May 2023 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 6 Jun 2022 • Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Başar, Mihailo R. Jovanović
We study sequential decision making problems aimed at maximizing the expected total reward while satisfying a constraint on the expected total utility.
no code implementations • 8 Feb 2022 • Dongsheng Ding, Chen-Yu Wei, Kaiqing Zhang, Mihailo R. Jovanović
When there is no uncertainty in the gradient evaluation, we show that our algorithm finds an $\epsilon$-Nash equilibrium with $O(1/\epsilon^2)$ iteration complexity which does not explicitly depend on the state space size.
Multi-agent Reinforcement Learning Policy Gradient Methods +1
no code implementations • NeurIPS 2020 • Dongsheng Ding, Kaiqing Zhang, Tamer Basar, Mihailo Jovanovic
To the best of our knowledge, our work is the first to establish non-asymptotic convergence guarantees of policy-based primal-dual methods for solving infinite-horizon discounted CMDPs.
no code implementations • 1 Mar 2020 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.
no code implementations • 2 Oct 2019 • Dongsheng Ding, Mihailo R. Jovanović
For a class of nonsmooth composite optimization problems with linear equality constraints, we utilize a Lyapunov-based approach to establish the global exponential stability of the primal-dual gradient flow dynamics based on the proximal augmented Lagrangian.
no code implementations • 7 Aug 2019 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.
Multi-agent Reinforcement Learning Reinforcement Learning +1