no code implementations • 18 Mar 2024 • Junyi Fan, Yuxuan Han, Jialin Zeng, Jian-Feng Cai, Yang Wang, Yang Xiang, Jiheng Zhang
Up to a logarithmic dependence on the size of the state space, Lin-Confident-FTRL learns $\epsilon$-CCE with a provable optimal accuracy bound $O(\epsilon^{-2})$ and gets rids of the linear dependency on the action space, while scaling polynomially with relevant problem parameters (such as the number of agents and time horizon).
no code implementations • 29 Aug 2023 • Xueping Gong, Jiheng Zhang
In this paper, we investigate the stochastic contextual bandit with general function space and graph feedback.
no code implementations • 7 Aug 2023 • Xueping Gong, Jiheng Zhang
We then show how causal bounds can be applied to improving classical bandit algorithms and affect the regrets with respect to the size of action sets and function spaces.
1 code implementation • 10 Feb 2023 • Qing Zhang, Xiaoying Zhang, Yang Liu, Hongning Wang, Min Gao, Jiheng Zhang, Ruocheng Guo
Confounding bias arises due to the presence of unmeasured variables (e. g., the socio-economic status of a user) that can affect both a user's exposure and feedback.
no code implementations • 27 Jan 2023 • Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, Jiheng Zhang, Zhengyuan Zhou
As a framework for sequential decision-making, Reinforcement Learning (RL) has been regarded as an essential component leading to Artificial General Intelligence (AGI).
1 code implementation • 21 Oct 2022 • Yuxuan Han, Jialin Zeng, Yang Wang, Yang Xiang, Jiheng Zhang
We study the stochastic contextual bandit with knapsacks (CBwK) problem, where each action, taken upon a context, not only leads to a random reward but also costs a random resource consumption in a vector form.
no code implementations • 14 Sep 2022 • Xiaoteng Ma, Zhipeng Liang, Jose Blanchet, Mingwen Liu, Li Xia, Jiheng Zhang, Qianchuan Zhao, Zhengyuan Zhou
Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e. g., a simulator).
no code implementations • 7 Sep 2022 • Xueping Gong, Jiheng Zhang
The contextual bandit problem is a theoretically justified framework with wide applications in various fields.
1 code implementation • 16 Jun 2022 • Yuxuan Han, Zhicong Liang, Zhipeng Liang, Yang Wang, Yuan YAO, Jiheng Zhang
To address such a challenge as the online convex optimization with privacy protection, we propose a private variant of online Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data.
1 code implementation • NeurIPS 2021 • Yuxuan Han, Zhipeng Liang, Yang Wang, Jiheng Zhang
In this paper, we design LDP algorithms for stochastic generalized linear bandits to achieve the same regret bound as in non-privacy settings.