no code implementations • 9 Mar 2024 • Min Cheng, Ruida Zhou, P. R. Kumar, Chao Tian
We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.
1 code implementation • 19 Feb 2024 • Kaan Ozkara, Bruce Huang, Ruida Zhou, Suhas Diggavi
Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored.
no code implementations • 4 Feb 2024 • Yuning You, Ruida Zhou, Yang shen
Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics.
no code implementations • 4 Jan 2024 • Qiang Zhang, Ruida Zhou, Yang shen, Tie Liu
This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of ``offline" data examples.
no code implementations • 26 Dec 2023 • Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen
Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning.
1 code implementation • NeurIPS 2023 • Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.
no code implementations • 1 May 2023 • Ruida Zhou, Chao Tian, Tie Liu
We provide a new information-theoretic generalization error bound that is exactly tight (i. e., matching even the constant) for the canonical quadratic Gaussian (location) problem.
2 code implementations • 10 Jun 2022 • Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian
We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off.
no code implementations • 11 Apr 2022 • Ruida Zhou, Chao Tian
We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting.
no code implementations • 11 Apr 2022 • Wenjing Chen, Ruida Zhou, Chao Tian, Cong Shen
In the special case of $m=2$, i. e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds.
no code implementations • 31 Oct 2021 • Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian
We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $\mathcal{O}(\log(T)/T)$ convergence rate for both the optimality gap and the constraint violation.
1 code implementation • 27 Sep 2021 • Tao Liu, P. R. Kumar, Ruida Zhou, Xi Liu
Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful.
no code implementations • NeurIPS 2021 • Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian
We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$.
no code implementations • 17 Dec 2020 • Ruida Zhou, Chao Tian, Tie Liu
We propose a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou.
no code implementations • 25 Sep 2019 • Tao Guo, Ruida Zhou, Chao Tian
We further characterize the optimal tradeoff between the minimum amount of common randomness and the total leakage.
no code implementations • 23 Jan 2019 • Chao Gan, Jing Yang, Ruida Zhou, Cong Shen
We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant.
no code implementations • 22 May 2018 • Ruida Zhou, Chao Gan, Jing Yan, Cong Shen
For the online setting, we propose a Cost-aware Cas- cading Upper Confidence Bound (CC-UCB) algo- rithm, and show that the cumulative regret scales in O(log T ).
no code implementations • 11 Apr 2018 • Chao Gan, Ruida Zhou, Jing Yang, Cong Shen
Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i. e., reward-minus-cost).
no code implementations • 22 Feb 2018 • Zhiyang Wang, Ruida Zhou, Cong Shen
We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter.