no code implementations • 17 Oct 2024 • Yufeng Yang, Erin Tripp, Yifan Sun, Shaofeng Zou, Yi Zhou
Recent studies have shown that many nonconvex machine learning problems meet a so-called generalized-smooth condition that extends beyond traditional smooth nonconvex optimization.
no code implementations • 24 Jun 2024 • Yudan Wang, Shaofeng Zou, Yue Wang
We develop algorithms for uncertainty sets defined by total variation, Chi-square divergence, and KL divergence, and provide finite sample analyses under all three cases.
no code implementations • 3 Jun 2024 • Yudan Wang, Yue Wang, Yi Zhou, Shaofeng Zou
Specifically, existing studies show that AC converges to an $\epsilon+\varepsilon_{\text{critic}}$ neighborhood of stationary points with the best known sample complexity of $\mathcal{O}(\epsilon^{-2})$ (up to a log factor), and NAC converges to an $\epsilon+\varepsilon_{\text{critic}}+\sqrt{\varepsilon_{\text{actor}}}$ neighborhood of the global optimum with the best known sample complexity of $\mathcal{O}(\epsilon^{-3})$, where $\varepsilon_{\text{critic}}$ is the approximation error of the critic and $\varepsilon_{\text{actor}}$ is the approximation error induced by the insufficient expressive power of the parameterized policy class.
no code implementations • 29 May 2024 • Qi Zhang, Peiyao Xiao, Shaofeng Zou, Kaiyi Ji
We provide a comprehensive convergence analysis of these algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i. e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively.
no code implementations • 25 May 2024 • Yudan Wang, Peiyao Xiao, Hao Ban, Kaiyi Ji, Shaofeng Zou
However, these methods often suffer from the issue of \textit{gradient conflict} such that the tasks with larger gradients dominate the update direction, resulting in a performance degeneration on other tasks.
no code implementations • 2 May 2024 • Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou
Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment.
no code implementations • 1 Apr 2024 • Qi Zhang, Yi Zhou, Shaofeng Zou
Specifically, to solve the challenges due to dependence among adaptive update, unbounded gradient estimate and Lipschitz constant, we demonstrate that the first-order term in the descent lemma converges and its denominator is upper bounded by a function of gradient norm.
no code implementations • 1 Apr 2024 • Qi Zhang, Yi Zhou, Ashley Prater-Bennette, Lixin Shen, Shaofeng Zou
We prove that our algorithm finds an $\epsilon$-stationary point with a computational complexity of $\mathcal O(\epsilon^{-3k_*-5})$, where $k_*$ is the parameter of the Cressie-Read divergence.
no code implementations • 5 Feb 2024 • Junze Deng, Yuan Cheng, Shaofeng Zou, Yingbin Liang
Our result for the second model is the first-known result for such a type of function approximation models.
no code implementations • 13 Oct 2023 • Zhongchang Sun, Shaofeng Zou
The data-driven setting where the disturbance signal parameters are unknown is further investigated, and an online and computationally efficient gradient ascent CuSum algorithm is designed.
1 code implementation • 30 Jul 2023 • Sihong He, Songyang Han, Sanbao Su, Shuo Han, Shaofeng Zou, Fei Miao
Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees.
no code implementations • 22 May 2023 • Yue Wang, JinJun Xiong, Shaofeng Zou
We show that an improved sample complexity of $\mathcal{O}(SC^{\pi^*}\epsilon^{-2}(1-\gamma)^{-3})$ can be obtained, which asymptotically matches with the minimax lower bound for offline reinforcement learning, and thus is asymptotically minimax optimal.
no code implementations • 17 May 2023 • Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette, Shaofeng Zou
Robust Markov decision processes (MDPs) address the challenge of model uncertainty by optimizing the worst-case performance over an uncertainty set of MDPs.
no code implementations • 2 Jan 2023 • Yue Wang, Alvaro Velasquez, George Atia, Ashley Prater-Bennette, Shaofeng Zou
We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy.
1 code implementation • 6 Dec 2022 • Songyang Han, Sanbao Su, Sihong He, Shuo Han, Haizhao Yang, Shaofeng Zou, Fei Miao
Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information.
Deep Reinforcement Learning Multi-agent Reinforcement Learning +2
no code implementations • 21 Oct 2022 • Qi Zhang, Zhongchang Sun, Luis C. Herrera, Shaofeng Zou
The WADD is at most of the order of the logarithm of the ARL.
no code implementations • 17 Sep 2022 • Sihong He, Yue Wang, Shuo Han, Shaofeng Zou, Fei Miao
In this work, we design a robust and constrained multi-agent reinforcement learning (MARL) framework with state transition kernel uncertainty for EV AMoD systems.
no code implementations • 14 Sep 2022 • Yue Wang, Fei Miao, Shaofeng Zou
We then investigate a concrete example of $\delta$-contamination uncertainty set, design an online and model-free algorithm and theoretically characterize its sample complexity.
no code implementations • 6 Sep 2022 • Yue Wang, Yi Zhou, Shaofeng Zou
Our finite-time error bounds match with one of the stochastic gradient descent algorithms for general smooth non-convex optimization problems, despite its additonal challenge in the two time-scale updates.
no code implementations • 4 Sep 2022 • Zhongchang Sun, Shaofeng Zou
The goal of the fusion center is to detect the anomaly with minimal detection delay subject to false alarm constraints.
no code implementations • 13 Jun 2022 • Tengyu Xu, Yue Wang, Shaofeng Zou, Yingbin Liang
The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair.
no code implementations • 15 May 2022 • Yue Wang, Shaofeng Zou
We further develop a smoothed robust policy gradient method and show that to achieve an $\epsilon$-global optimum, the complexity is $\mathcal O(\epsilon^{-3})$.
no code implementations • 23 Mar 2022 • Zhongchang Sun, Shaofeng Zou
For the Bayesian setting where the goal is to minimize the worst-case error probability, an optimal test is firstly obtained when the alphabet is finite.
no code implementations • 26 Feb 2022 • Zhongchang Sun, Shaofeng Zou, Ruizhi Zhang, Qunwei Li
The problem of quickest change detection (QCD) in anonymous heterogeneous sensor networks is studied.
no code implementations • 20 Oct 2021 • Tianjiao Li, Ziwei Guan, Shaofeng Zou, Tengyu Xu, Yingbin Liang, Guanghui Lan
Despite the challenge of the nonconcave objective subject to nonconcave constraints, the proposed approach is shown to converge to the global optimum with a complexity of $\tilde{\mathcal O}(1/\epsilon)$ in terms of the optimality gap and the constraint violation, which improves the complexity of the existing primal-dual approach by a factor of $\mathcal O(1/\epsilon)$ \citep{ding2020natural, paternain2019constrained}.
no code implementations • NeurIPS 2021 • Yue Wang, Shaofeng Zou
In this paper, we focus on model-free robust RL, where the uncertainty set is defined to be centering at a misspecified MDP that generates a single sample trajectory sequentially and is assumed to be unknown.
no code implementations • 8 Sep 2021 • Ziyi Chen, Yi Zhou, Rongrong Chen, Shaofeng Zou
Actor-critic (AC) algorithms have been widely adopted in decentralized multi-agent systems to learn the optimal joint control policy.
no code implementations • NeurIPS 2021 • Yue Wang, Shaofeng Zou, Yi Zhou
Temporal-difference learning with gradient correction (TDC) is a two time-scale algorithm for policy evaluation in reinforcement learning.
no code implementations • ICLR 2021 • Shaocong Ma, Ziyi Chen, Yi Zhou, Shaofeng Zou
Greedy-GQ is a value-based reinforcement learning (RL) algorithm for optimal control.
no code implementations • 7 Dec 2020 • Qunwei Li, Shaofeng Zou, Wenliang Zhong
Two types of GNNs are investigated, depending on whether labels are attached to nodes or graphs.
no code implementations • NeurIPS 2020 • Shaocong Ma, Yi Zhou, Shaofeng Zou
In the Markovian setting, our algorithm achieves the state-of-the-art sample complexity $O(\epsilon^{-1} \log {\epsilon}^{-1})$ that is near-optimal.
no code implementations • 20 May 2020 • Yue Wang, Shaofeng Zou
Greedy-GQ is an off-policy two timescale algorithm for optimal control in reinforcement learning.
1 code implementation • Conference 2019 • Shaofeng Zou, Mingzhu Long, Xuyang Wang, Xiang Xie, Guolin Li, Zhihua Wang
The number of iterations is reduced about 36% by using transfer learning in our DIP process.
no code implementations • NeurIPS 2019 • Tengyu Xu, Shaofeng Zou, Yingbin Liang
Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios.
no code implementations • NeurIPS 2019 • Shaofeng Zou, Tengyu Xu, Yingbin Liang
For this fitted SARSA algorithm, we also provide its finite-sample analysis.
1 code implementation • 27 Jan 2019 • Yuheng Bu, Weihao Gao, Shaofeng Zou, Venugopal V. Veeravalli
We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression.
no code implementations • 15 Jan 2019 • Yuheng Bu, Shaofeng Zou, Venugopal V. Veeravalli
The bound is derived under more general conditions on the loss function than in existing studies; nevertheless, it provides a tighter characterization of the generalization error.
no code implementations • 21 Jan 2017 • Yuheng Bu, Shaofeng Zou, Venugopal V. Veeravalli
A sequence is considered as outlying if the observations therein are generated by a distribution different from those generating the observations in the majority of the sequences.
no code implementations • 5 Apr 2016 • Shaofeng Zou, Yingbin Liang, H. Vincent Poor
Sufficient conditions on minimum and maximum sizes of candidate anomalous intervals are characterized in order to guarantee the proposed test to be consistent.
no code implementations • 25 Apr 2014 • Shaofeng Zou, Yingbin Liang, H. Vincent Poor, Xinghua Shi
samples drawn from a distribution p, whereas each anomalous sequence contains m i. i. d.
no code implementations • 1 Apr 2014 • Shaofeng Zou, Yingbin Liang, H. Vincent Poor
If anomalous interval does not exist, then all nodes receive samples generated by p. It is assumed that the distributions p and q are arbitrary, and are unknown.