no code implementations • 20 Mar 2024 • Zihao Li, Hui Yuan, Kaixuan Huang, Chengzhuo Ni, Yinyu Ye, Minshuo Chen, Mengdi Wang
In this paper, we focus on diffusion models, a powerful generative AI technology, and investigate their potential for black-box optimization over complex structured variables.
no code implementations • 30 Oct 2022 • Chengzhuo Ni, Yuda Song, Xuezhou Zhang, Chi Jin, Mengdi Wang
To our best knowledge, this is the first sample-efficient algorithm for multi-agent general-sum Markov games that incorporates (non-linear) function approximation.
no code implementations • 5 Jun 2022 • Hui Yuan, Chengzhuo Ni, Huazheng Wang, Xuezhou Zhang, Le Cong, Csaba Szepesvári, Mengdi Wang
We propose a Thompson Sampling-guided Directed Evolution (TS-DE) framework for sequence optimization, where the sequence-to-function mapping is unknown and querying a single value is subject to costly and noisy measurements.
no code implementations • 10 Feb 2022 • Ruiqi Zhang, Xuezhou Zhang, Chengzhuo Ni, Mengdi Wang
We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator.
no code implementations • 31 Jan 2022 • Chengzhuo Ni, Ruiqi Zhang, Xiang Ji, Xuezhou Zhang, Mengdi Wang
Policy gradient (PG) estimation becomes a challenge when we are not allowed to sample with the target policy but only have access to a dataset generated by some unknown behavior policy.
no code implementations • 29 Sep 2021 • Yu Wu, Joseph Chahn Kim, Chengzhuo Ni, Le Cong, Mengdi Wang
Genetic barcoding coupled with single-cell sequencing technology enables direct measurement of cell-to-cell transitions and gene-expression evolution over a long timespan.
no code implementations • 3 May 2021 • Chengzhuo Ni, Yaqi Duan, Munther Dahleh, Anru Zhang, Mengdi Wang
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure.
no code implementations • NeurIPS 2021 • Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang
By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.
1 code implementation • 5 May 2019 • Lin F. Yang, Chengzhuo Ni, Mengdi Wang
We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.
no code implementations • 27 Sep 2018 • YuJun Li, Chengzhuo Ni, Guangzeng Xie, Wenhao Yang, Shuchang Zhou, Zhihua Zhang
A2VI is more efficient than the modified policy iteration, which is a classical approximate method for policy evaluation.