no code implementations • 26 Mar 2024 • Shi Feng, Nuoya Xiong, Zhijie Zhang, Wei Chen
Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits.
no code implementations • 10 Oct 2023 • Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
no code implementations • 3 Oct 2023 • Nuoya Xiong, Lijun Ding, Simon S. Du
This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence.
no code implementations • 26 Jun 2023 • Nuoya Xiong, Zhaoran Wang, Zhuoran Yang
We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning.
1 code implementation • 31 Jan 2023 • Shi Feng, Nuoya Xiong, Wei Chen
This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs.
no code implementations • 16 Jun 2022 • Nuoya Xiong, Wei Chen
The combinatorial pure exploration of causal bandits is the following online learning task: given a causal graph with unknown causal inference distributions, in each round we choose a subset of variables to intervene or do no intervention, and observe the random outcomes of all random variables, with the goal that using as few rounds as possible, we can output an intervention that gives the best (or almost best) expected outcome on the reward variable $Y$ with probability at least $1-\delta$, where $\delta$ is a given confidence level.