Search Results for author: Yichi Zhou

Found 12 papers, 0 papers with code

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process

no code implementations • 7 Mar 2024 • Xiangxin Zhou, Liang Wang, Yichi Zhou

Nevertheless, when applying policy gradients to SDEs, since the policy gradient is estimated on a finite set of trajectories, it can be ill-defined, and the policy behavior in data-scarce regions may be uncontrolled.

Policy Gradient Methods

Paper
Add Code

MatterGen: a generative model for inorganic materials design

no code implementations • 6 Dec 2023 • Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Sasha Shysheya, Jonathan Crabbé, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Ryota Tomioka, Tian Xie

We further introduce adapter modules to enable fine-tuning towards any given property constraints with a labeled dataset.

Paper
Add Code

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

no code implementations • 16 Jun 2022 • Fang Kong, Yichi Zhou, Shuai Li

With a general feedback graph, the observation of an arm may not be available when this arm is pulled, which makes the exploration more expensive and the algorithms more challenging to perform optimally in both environments.

Paper
Add Code

Regularized-OFU: an efficient algorithm for general contextual bandit with optimization oracles

no code implementations • 29 Sep 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

In contextual bandit, one major challenge is to develop theoretically solid and empirically efficient algorithms for general function classes.

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit

no code implementations • 29 Jun 2021 • Yichi Zhou, Shihong Song, Huishuai Zhang, Jun Zhu, Wei Chen, Tie-Yan Liu

However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function.

Multi-Armed Bandits

Paper
Add Code

Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information

no code implementations • ICLR 2020 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present Lazy-CFR, a CFR algorithm that adopts a lazy update strategy to avoid traversing the whole game tree in each round.

counterfactual

Paper
Add Code

Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information

no code implementations • ICLR 2020 • Yichi Zhou, Jialian Li, Jun Zhu

Posterior sampling for reinforcement learning (PSRL) is a useful framework for making decisions in an unknown environment.

counterfactual Multi-agent Reinforcement Learning +2

Paper
Add Code

$A^*$ sampling with probability matching

no code implementations • ICLR 2019 • Yichi Zhou, Jun Zhu

We provide insights into the relationship between $A^*$ sampling and probability matching by analyzing a nontrivial special case in which the state space is partitioned into two subsets.

Decision Making

Paper
Add Code

Lazy-CFR: fast and near optimal regret minimization for extensive games with imperfect information

no code implementations • 10 Oct 2018 • Yichi Zhou, Tongzheng Ren, Jialian Li, Dong Yan, Jun Zhu

In this paper, we present a novel technique, lazy update, which can avoid traversing the whole game tree in CFR, as well as a novel analysis on the regret of CFR with lazy update.

counterfactual

Paper
Add Code

Online Label Aggregation: A Variational Bayesian Approach

no code implementations • 19 Jul 2018 • Chi Hong, Amirmasoud Ghiassi, Yichi Zhou, Robert Birke, Lydia Y. Chen

Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1. 5 percent points for synthetic and real-world datasets, respectively.

Bayesian Inference Stochastic Optimization

Paper
Add Code

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

no code implementations • ICML 2018 • Yichi Zhou, Jun Zhu, Jingwei Zhuo

Thompson sampling has impressive empirical performance for many multi-armed bandit problems.

Thompson Sampling

Paper
Add Code

Identify the Nash Equilibrium in Static Games with Random Payoffs

no code implementations • ICML 2017 • Yichi Zhou, Jialian Li, Jun Zhu

We study the problem on how to learn the pure Nash Equilibrium of a two-player zero-sum static game with random payoffs under unknown distributions via efficient payoff queries.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.