3 code implementations • 29 Aug 2018 • Zhipeng Liang, Hao Chen, Junhao Zhu, Kangkang Jiang, Yan-ran Li
In this paper, we implement three state-of-art continuous reinforcement learning algorithms, Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO) and Policy Gradient (PG)in portfolio management.
1 code implementation • NeurIPS 2021 • Yuxuan Han, Zhipeng Liang, Yang Wang, Jiheng Zhang
In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings.
1 code implementation • 19 Sep 2022 • Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, Bingzhe Wu, Changqing Zhang, Jianhua Yao
Importance reweighting is a normal way to handle the subpopulation shift issue by imposing constant or adaptive sampling weights on each sample in the training dataset.
1 code implementation • 16 Jun 2022 • Yuxuan Han, Zhicong Liang, Zhipeng Liang, Yang Wang, Yuan YAO, Jiheng Zhang
To address such a challenge as the online convex optimization with privacy protection, we propose a private variant of online Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data.
no code implementations • 16 Apr 2022 • Bingzhe Wu, Zhipeng Liang, Yuxuan Han, Yatao Bian, Peilin Zhao, Junzhou Huang
In this paper, we propose a general framework to solve the above two challenges simultaneously.
no code implementations • 14 Sep 2022 • Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou
Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).
no code implementations • 20 Oct 2022 • Zeyu Cao, Zhipeng Liang, Shu Zhang, Hangyu Li, Ouyang Wen, Yu Rong, Peilin Zhao, Bingzhe Wu
In this paper, we investigate a novel problem of building contextual bandits in the vertical federated setting, i. e., contextual information is vertically distributed over different departments.
no code implementations • 27 Jan 2023 • Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou
As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI).
no code implementations • 9 Apr 2023 • Zongbo Han, Zhipeng Liang, Fan Yang, Liu Liu, Lanqing Li, Yatao Bian, Peilin Zhao, QinGhua Hu, Bingzhe Wu, Changqing Zhang, Jianhua Yao
Subpopulation shift exists widely in many real-world applications, which refers to the training and test distributions that contain the same subpopulation groups but with different subpopulation proportions.