no code implementations • 12 Mar 2025 • Leo Widmer, Jiawei Huang, Niao He
Our work presents an effective framework for steering agents behaviors in large-population systems under uncertainty.
no code implementations • 26 Feb 2025 • Jiawei Huang, Bingcong Li, Christoph Dann, Niao He
This paper studies how to transfer knowledge from those imperfect reward models in online RLHF.
no code implementations • 8 Feb 2025 • Yun Gong, Zebang Shen, Niao He
To understand the convergence behavior of stochastic dynamics in such landscapes, we propose to study the class of \logPLmeasure\ measures $\mu_\epsilon \propto \exp(-V/\epsilon)$, where the potential $V$ satisfies a local Polyak-{\L}ojasiewicz (P\L) inequality, and its set of local minima is provably \emph{connected}.
no code implementations • 24 Oct 2024 • Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He
This work revisits the classical low-rank matrix factorization problem and unveils the critical role of initialization in shaping convergence rates for such nonconvex and nonsmooth optimization.
no code implementations • 18 Oct 2024 • Bingcong Li, Liang Zhang, Niao He
Sharpness-aware minimization (SAM) improves generalization of various deep learning tasks.
no code implementations • 17 Oct 2024 • Florian Hübler, Ilyas Fatkhullin, Niao He
In the setting where all problem parameters are known, we show this complexity is improved to $\mathcal{O}\left(\varepsilon^{-\frac{3p-2}{p-1}}\right)$, matching the previously known lower bound for all first-order methods in all problem dependent parameters.
no code implementations • 27 Aug 2024 • Batuhan Yardim, Niao He
We show that TD learning converges up to a small bias using trajectories of the $N$-player game with finite-sample guarantees, permitting symmetrized learning without building an explicit MFG model.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 20 Aug 2024 • Yifan Hu, Jie Wang, Xin Chen, Niao He
This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning.
no code implementations • 15 Aug 2024 • Pragnya Alatur, Anas Barakat, Niao He
In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs.
no code implementations • 3 Aug 2024 • Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran
Furthermore, we show that a projected variance-reduced first-order algorithm can obtain the upper complexity bound of $\mathcal{O}(\epsilon^{-{2}/{\alpha}})$, matching the lower bound.
1 code implementation • 14 Jul 2024 • Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He
We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem.
no code implementations • 5 Jun 2024 • Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu
This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence.
no code implementations • 28 May 2024 • Xiang Li, Zebang Shen, Liang Zhang, Niao He
Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points.
no code implementations • 19 Mar 2024 • Liang Zhang, Niao He, Michael Muehlebach
These algorithms along with their theoretical analysis often require the existence and prior knowledge of the optimal Lagrange multipliers.
1 code implementation • 27 Feb 2024 • Philip Jordan, Anas Barakat, Niao He
We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state.
no code implementations • 27 Feb 2024 • Ilyas Fatkhullin, Niao He
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary nonconvex optimization setting.
no code implementations • 24 Feb 2024 • Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He
As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations.
1 code implementation • 8 Feb 2024 • Jiawei Huang, Niao He, Andreas Krause
We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy.
no code implementations • 15 Nov 2023 • Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.
1 code implementation • 6 Nov 2023 • Florian Hübler, Junchi Yang, Xiang Li, Niao He
However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize.
no code implementations • NeurIPS 2023 • Liang Zhang, Junchi Yang, Amin Karbasi, Niao He
Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization.
1 code implementation • 14 Oct 2023 • Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy.
1 code implementation • 21 Sep 2023 • Kei Ishikawa, Niao He, Takafumi Kanamori
We study policy evaluation of offline contextual bandits subject to unobserved confounders.
1 code implementation • 8 Sep 2023 • Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He
Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator.
no code implementations • 25 Jun 2023 • Jun Song, Niao He, Lijun Ding, Chaoyue Zhao
Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.
no code implementations • 13 Jun 2023 • Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective.
no code implementations • 12 Jun 2023 • Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He
Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.
no code implementations • 2 Jun 2023 • Anas Barakat, Ilyas Fatkhullin, Niao He
We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure.
no code implementations • 18 May 2023 • Jiawei Huang, Batuhan Yardim, Niao He
In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation.
2 code implementations • 26 Feb 2023 • Kei Ishikawa, Niao He
It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence.
1 code implementation • NeurIPS 2023 • Jiawei Huang, Niao He
In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel.
no code implementations • 3 Feb 2023 • Ilyas Fatkhullin, Anas Barakat, Anastasia Kireeva, Niao He
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations.
no code implementations • 29 Dec 2022 • Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He
Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.
no code implementations • 14 Nov 2022 • Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai
In real-world decision-making, uncertainty is important yet difficult to handle.
no code implementations • 31 Oct 2022 • Xiang Li, Junchi Yang, Niao He
Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems.
no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant
Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.
no code implementations • 1 Jun 2022 • Junchi Yang, Xiang Li, Niao He
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates.
no code implementations • 1 Jun 2022 • Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He
We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off.
no code implementations • 28 May 2022 • Siqi Zhang, Yifan Hu, Liang Zhang, Niao He
We further study the algorithm-dependent generalization bounds via stability arguments of algorithms.
no code implementations • 25 May 2022 • Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran
We prove that the total sample complexity of SCRN in achieving $\epsilon$-global optimum is $\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for $1\le\alpha< 3/2$ and $\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for $3/2\le\alpha\le 2$.
no code implementations • 17 May 2022 • Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran
SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration.
no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.
no code implementations • 19 Jan 2022 • Kiran Koshy Thekumparampil, Niao He, Sewoong Oh
We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$.
1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He
Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.
no code implementations • NeurIPS 2021 • Yifan Hu, Xin Chen, Niao He
We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs.
no code implementations • 29 Sep 2021 • Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher
Our sample complexities also match the best-known results for global convergence of policy gradient and two time-scale actor-critic algorithms in the single agent setting.
no code implementations • 29 Sep 2021 • Jun Song, Chaoyue Zhao, Niao He
Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.
no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant
Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.
no code implementations • 29 Mar 2021 • Siqi Zhang, Junchi Yang, Cristóbal Guzmán, Negar Kiyavash, Niao He
In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.
no code implementations • 14 Mar 2021 • Donghwan Lee, Niao He, Seungjae Lee, Panagiota Karava, Jianghai Hu
The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings.
no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant
In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.
no code implementations • 17 Feb 2021 • Donghwan Lee, Jianghai Hu, Niao He
Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used.
no code implementations • NeurIPS 2020 • Donghwan Lee, Niao He
This paper develops a novel and unified framework to analyze the convergence of a large family of Q-learning algorithms from the switching system perspective.
no code implementations • NeurIPS 2020 • Yingxiang Yang, Negar Kiyavash, Le Song, Niao He
Macroscopic data aggregated from microscopic events are pervasive in machine learning, such as country-level COVID-19 infection statistics based on city-level data.
no code implementations • NeurIPS 2020 • Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He
We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives.
no code implementations • NeurIPS 2020 • Junchi Yang, Negar Kiyavash, Niao He
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.
1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant
In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.
no code implementations • NeurIPS 2020 • Yifan Hu, Siqi Zhang, Xin Chen, Niao He
Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning.
no code implementations • L4DC 2020 • Donghwan Lee, Niao He
The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.
no code implementations • 22 Feb 2020 • Junchi Yang, Negar Kiyavash, Niao He
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.
no code implementations • 4 Dec 2019 • Donghwan Lee, Niao He
In this paper, we introduce a unified framework for analyzing a large family of Q-learning algorithms, based on switching system perspectives and ODE-based stochastic approximation.
no code implementations • NeurIPS 2019 • Yingxiang Yang, Haoxiang Wang, Negar Kiyavash, Niao He
The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes.
no code implementations • 1 Dec 2019 • Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher
This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate.
Distributed Optimization
Multi-agent Reinforcement Learning
+3
no code implementations • 28 May 2019 • Yifan Hu, Xin Chen, Niao He
In this paper, we study a class of stochastic optimization problems, referred to as the \emph{Conditional Stochastic Optimization} (CSO), in the form of $\min_{x \in \mathcal{X}} \EE_{\xi}f_\xi\Big({\EE_{\eta|\xi}[g_\eta(x,\xi)]}\Big)$, which finds a wide spectrum of applications including portfolio selection, reinforcement learning, robust learning, causal inference and so on.
1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans
We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.
no code implementations • 24 Apr 2019 • Donghwan Lee, Niao He
The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side.
no code implementations • 26 Feb 2019 • Pan Li, Niao He, Olgica Milenkovic
We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization (QDSFM), which allows to model a number of learning tasks on graphs and hypergraphs.
1 code implementation • NeurIPS 2018 • Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song
This flexible function class couples the variational distribution with the original parameters in the graphical models, allowing end-to-end learning of the graphical models by back-propagation through the variational distribution.
no code implementations • NeurIPS 2018 • Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He
Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.
1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He
We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.
1 code implementation • NeurIPS 2018 • Pan Li, Niao He, Olgica Milenkovic
The problem is closely related to decomposable submodular function minimization and arises in many learning on graphs and hypergraphs settings, such as graph-based semi-supervised learning and PageRank.
no code implementations • 25 Jan 2018 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes.
no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song
When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.
no code implementations • ICLR 2018 • Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song
This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC.
no code implementations • NeurIPS 2017 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash
We develop a nonparametric and online learning algorithm that estimates the triggering functions of a multivariate Hawkes process (MHP).
2 code implementations • ICML 2017 • Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song
Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.
no code implementations • 3 Aug 2016 • Niao He, Zaid Harchaoui, Yichen Wang, Le Song
Since almost all gradient-based optimization algorithms rely on Lipschitz-continuity, optimizing Poisson likelihood models with a guarantee of convergence can be challenging, especially for large-scale problems.
no code implementations • 15 Jul 2016 • Bo Dai, Niao He, Yunpeng Pan, Byron Boots, Le Song
In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$.
no code implementations • NeurIPS 2015 • Nan Du, Yichen Wang, Niao He, Jimeng Sun, Le Song
By making personalized suggestions, a recommender system is playing a crucial role in improving the engagement of users in modern web-services.
no code implementations • NeurIPS 2015 • Niao He, Zaid Harchaoui
We propose a new first-order optimisation algorithm to solve high-dimensional non-smooth composite minimisation problems.
no code implementations • 9 Jun 2015 • Bo Dai, Niao He, Hanjun Dai, Le Song
Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters.
1 code implementation • NeurIPS 2014 • Bo Dai, Bo Xie, Niao He, YIngyu Liang, Anant Raj, Maria-Florina Balcan, Le Song
The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems.