Search Results for author: Yichi Zhou

Found 12 papers, 0 papers with code

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

no code implementations7 Mar 2024 Xiangxin Zhou, Liang Wang, Yichi Zhou

Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.

Policy Gradient Methods

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

no code implementations16 Jun 2022 Fang Kong, Yichi Zhou, Shuai Li

With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments.

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations29 Sep 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Multi-Armed Bandits Thompson Sampling

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations29 Jun 2021 Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information

no code implementations ICLR 2020 Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.

counterfactual

$A^*$ sampling with probability matching

no code implementations ICLR 2019 Yichi Zhou, Jun Zhu

We provide insights into the relationship between $A^*$ sampling and probability matching by analyzing a nontrivial special case in which the state space is partitioned into two subsets.

Decision Making

Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information

no code implementations10 Oct 2018 Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update.

counterfactual

Online Label Aggregation: A Variational Bayesian Approach

no code implementations19 Jul 2018 Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, Lydia Y. Chen

Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1. 5 percent points for synthetic and real-world datasets, respectively.

Bayesian Inference Stochastic Optimization

Identify the Nash Equilibrium in Static Games with Random Payoffs

no code implementations ICML 2017 Yichi Zhou, Jialian Li, Jun Zhu

We study the problem on how to learn the pure Nash Equilibrium of a two-player zero-sum static game with random payoffs under unknown distributions via efficient payoff queries.

Cannot find the paper you are looking for? You can Submit a new open access paper.