Search Results for author: Ruida Zhou

Found 19 papers, 4 papers with code

Provable Policy Gradient Methods for Average-Reward Markov Potential Games

no code implementations9 Mar 2024 Min Cheng, Ruida Zhou, P. R. Kumar, Chao Tian

We prove that both algorithms based on independent policy gradient and independent natural policy gradient converge globally to a Nash equilibrium for the average reward criterion.

Policy Gradient Methods

Hierarchical Bayes Approach to Personalized Federated Unsupervised Learning

1 code implementation19 Feb 2024 Kaan Ozkara, Bruce Huang, Ruida Zhou, Suhas Diggavi

Though there has been a plethora of algorithms proposed for personalized supervised learning, discovering the structure of local data through personalized unsupervised learning is less explored.

Dimensionality Reduction Federated Learning +1

Correlational Lagrangian Schrödinger Bridge: Learning Dynamics with Population-Level Regularization

no code implementations4 Feb 2024 Yuning You, Ruida Zhou, Yang shen

Accurate modeling of system dynamics holds intriguing potential in broad scientific fields including cytodynamics and fluid mechanics.

From Function to Distribution Modeling: A PAC-Generative Approach to Offline Optimization

no code implementations4 Jan 2024 Qiang Zhang, Ruida Zhou, Yang shen, Tie Liu

This paper considers the problem of offline optimization, where the objective function is unknown except for a collection of ``offline" data examples.

Harnessing the Power of Federated Learning in Federated Contextual Bandits

no code implementations26 Dec 2023 Chengshuai Shi, Ruida Zhou, Kun Yang, Cong Shen

Federated learning (FL) has demonstrated great potential in revolutionizing distributed machine learning, and tremendous efforts have been made to extend it beyond the original focus on supervised learning.

Decision Making Federated Learning +1

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

1 code implementation NeurIPS 2023 Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment.

reinforcement-learning Reinforcement Learning (RL)

Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem

no code implementations1 May 2023 Ruida Zhou, Chao Tian, Tie Liu

We provide a new information-theoretic generalization error bound that is exactly tight (i. e., matching even the constant) for the canonical quadratic Gaussian (location) problem.

Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

2 code implementations10 Jun 2022 Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off.

Fairness Multi-Objective Reinforcement Learning +1

Approximate Top-$m$ Arm Identification with Heterogeneous Reward Variances

no code implementations11 Apr 2022 Ruida Zhou, Chao Tian

We study the effect of reward variance heterogeneity in the approximate top-$m$ arm identification setting.

On Top-$k$ Selection from $m$-wise Partial Rankings via Borda Counting

no code implementations11 Apr 2022 Wenjing Chen, Ruida Zhou, Chao Tian, Cong Shen

In the special case of $m=2$, i. e., pairwise comparison, the resultant bound is tighter than that given by Shah et al., leading to a reduced gap between the error probability upper and lower bounds.

Policy Optimization for Constrained MDPs with Provable Fast Global Convergence

no code implementations31 Oct 2021 Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster $\mathcal{O}(\log(T)/T)$ convergence rate for both the optimality gap and the constraint violation.

Learning from Few Samples: Transformation-Invariant SVMs with Composition and Locality at Multiple Scales

1 code implementation27 Sep 2021 Tao Liu, P. R. Kumar, Ruida Zhou, Xi Liu

Motivated by the problem of learning with small sample sizes, this paper shows how to incorporate into support-vector machines (SVMs) those properties that have made convolutional neural networks (CNNs) successful.

Learning Policies with Zero or Bounded Constraint Violation for Constrained MDPs

no code implementations NeurIPS 2021 Tao Liu, Ruida Zhou, Dileep Kalathil, P. R. Kumar, Chao Tian

We show that when a strictly safe policy is known, then one can confine the system to zero constraint violation with arbitrarily high probability while keeping the reward regret of order $\tilde{\mathcal{O}}(\sqrt{K})$.

Safe Exploration

Individually Conditional Individual Mutual Information Bound on Generalization Error

no code implementations17 Dec 2020 Ruida Zhou, Chao Tian, Tie Liu

We propose a new information-theoretic bound on generalization error based on a combination of the error decomposition technique of Bu et al. and the conditional mutual information (CMI) construction of Steinke and Zakynthinou.

LEMMA

On the Information Leakage in Private Information Retrieval Systems

no code implementations25 Sep 2019 Tao Guo, Ruida Zhou, Chao Tian

We further characterize the optimal tradeoff between the minimum amount of common randomness and the total leakage.

Information Retrieval Retrieval

Online Learning with Diverse User Preferences

no code implementations23 Jan 2019 Chao Gan, Jing Yang, Ruida Zhou, Cong Shen

We aim to show that when the user preferences are sufficiently diverse and each arm can be optimal for certain users, the O(log T) regret incurred by exploring the sub-optimal arms under the standard stochastic MAB setting can be reduced to a constant.

Cost-aware Cascading Bandits

no code implementations22 May 2018 Ruida Zhou, Chao Gan, Jing Yan, Cong Shen

For the online setting, we propose a Cost-aware Cas- cading Upper Confidence Bound (CC-UCB) algo- rithm, and show that the cumulative regret scales in O(log T ).

Cost-Aware Learning and Optimization for Opportunistic Spectrum Access

no code implementations11 Apr 2018 Chao Gan, Ruida Zhou, Jing Yang, Cong Shen

Our objective is to understand how the costs and reward of the actions would affect the optimal behavior of the user in both offline and online settings, and design the corresponding opportunistic spectrum access strategies to maximize the expected cumulative net reward (i. e., reward-minus-cost).

Regional Multi-Armed Bandits

no code implementations22 Feb 2018 Zhiyang Wang, Ruida Zhou, Cong Shen

We consider a variant of the classic multi-armed bandit problem where the expected reward of each arm is a function of an unknown parameter.

Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.