no code implementations • 7 Mar 2024 • Xiangxin Zhou, Liang Wang, Yichi Zhou
Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.
2 code implementations • 6 Dec 2023 • Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Ryota Tomioka, Tian Xie
We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset.
no code implementations • 16 Jun 2022 • Fang Kong, Yichi Zhou, Shuai Li
With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments.
no code implementations • 29 Sep 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu
In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.
no code implementations • 29 Jun 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu
However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.
no code implementations • ICLR 2020 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu
In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.
no code implementations • ICLR 2020 • Yichi Zhou, Jialian Li, Jun Zhu
Posterior sampling for reinforcement learning (PSRL) is a useful framework for making decisions in an unknown environment.
no code implementations • ICLR 2019 • Yichi Zhou, Jun Zhu
We provide insights into the relationship between $A^*$ sampling and probability matching by analyzing a nontrivial special case in which the state space is partitioned into two subsets.
no code implementations • 10 Oct 2018 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu
In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update.
no code implementations • 19 Jul 2018 • Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, Lydia Y. Chen
Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1. 5 percent points for synthetic and real-world datasets, respectively.
no code implementations • ICML 2018 • Yichi Zhou, Jun Zhu, Jingwei Zhuo
Thompson sampling has impressive empirical performance for many multi-armed bandit problems.
no code implementations • ICML 2017 • Yichi Zhou, Jialian Li, Jun Zhu
We study the problem on how to learn the pure Nash Equilibrium of a two-player zero-sum static game with random payoffs under unknown distributions via efficient payoff queries.