Search Results for author: Chengzhuo Ni

Found 10 papers, 1 papers with code

Diffusion Model for Data-Driven Black-Box Optimization

no code implementations • 20 Mar 2024 • Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang

In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables.

Paper
Add Code

Representation Learning for General-sum Low-rank Markov Games

no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang

To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.

Representation Learning

Paper
Add Code

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang

We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.

BIG-bench Machine Learning Evolutionary Algorithms +2

Paper
Add Code

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

no code implementations • 10 Feb 2022 • Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang

We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.

Off-policy evaluation

Paper
Add Code

Optimal Estimation of Off-Policy Policy Gradient via Double Fitted Iteration

no code implementations • 31 Jan 2022 • Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang

Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.

Paper
Add Code

Cell2State: Learning Cell State Representations From Barcoded Single-Cell Gene-Expression Transitions

no code implementations • 29 Sep 2021 • Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang

Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.

Dimensionality Reduction

Paper
Add Code

Learning Good State and Action Representations via Tensor Decomposition

no code implementations • 3 May 2021 • Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang

The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.

Tensor Decomposition

Paper
Add Code

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

no code implementations • NeurIPS 2021 • Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang

By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.

Reinforcement Learning (RL)

Paper
Add Code

Learning to Control in Metric Space with Optimal Regret

1 code implementation • 5 May 2019 • Lin F. Yang, Chengzhuo Ni, Mengdi Wang

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Accelerated Value Iteration via Anderson Mixing

no code implementations • 27 Sep 2018 • YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang

A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.

Atari Games Q-Learning +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.