Search Results for author: Nuoya Xiong

Found 6 papers, 1 papers with code

A Correction of Pseudo Log-Likelihood Method

no code implementations26 Mar 2024 Shi Feng, Nuoya Xiong, Zhijie Zhang, Wei Chen

Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits.

Multi-Armed Bandits

Sample-Efficient Multi-Agent RL: An Optimization Perspective

no code implementations10 Oct 2023 Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang

We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.

Multi-agent Reinforcement Learning

How Over-Parameterization Slows Down Gradient Descent in Matrix Sensing: The Curses of Symmetry and Initialization

no code implementations3 Oct 2023 Nuoya Xiong, Lijun Ding, Simon S. Du

This linear convergence result in the over-parameterization case is especially significant because one can apply the asymmetric parameterization to the symmetric setting to speed up from $\Omega (1/T^2)$ to linear convergence.

A General Framework for Sequential Decision-Making under Adaptivity Constraints

no code implementations26 Jun 2023 Nuoya Xiong, Zhaoran Wang, Zhuoran Yang

We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning.

Decision Making

Combinatorial Causal Bandits without Graph Skeleton

1 code implementation31 Jan 2023 Shi Feng, Nuoya Xiong, Wei Chen

This paper studies the CCB problem without the graph structure on binary general causal models and BGLMs.

Combinatorial Pure Exploration of Causal Bandits

no code implementations16 Jun 2022 Nuoya Xiong, Wei Chen

The combinatorial pure exploration of causal bandits is the following online learning task: given a causal graph with unknown causal inference distributions, in each round we choose a subset of variables to intervene or do no intervention, and observe the random outcomes of all random variables, with the goal that using as few rounds as possible, we can output an intervention that gives the best (or almost best) expected outcome on the reward variable $Y$ with probability at least $1-\delta$, where $\delta$ is a given confidence level.

Causal Inference Multi-Armed Bandits

Cannot find the paper you are looking for? You can Submit a new open access paper.