1 code implementation • 14 Jul 2024 • Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He

We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem.

no code implementations • 5 Jun 2024 • Yan Huang, Xiang Li, Yipeng Shen, Niao He, Jinming Xu

This protocol ensures the consistency among stepsizes of nodes, eliminating the steady-state error due to the lack of coordination of stepsizes among nodes that commonly exists in vanilla distributed adaptive methods, and thus guarantees exact convergence.

no code implementations • 28 May 2024 • Xiang Li, Zebang Shen, Liang Zhang, Niao He

Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points.

no code implementations • 19 Mar 2024 • Liang Zhang, Niao He, Michael Muehlebach

In this work, we propose a simple primal method, termed Constrained Gradient Method (CGM), for addressing functional constrained variational inequality problems, without necessitating any information on the optimal Lagrange multipliers.

1 code implementation • 27 Feb 2024 • Philip Jordan, Anas Barakat, Niao He

We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria: Each agent observes their own actions and rewards, along with a shared state.

no code implementations • 27 Feb 2024 • Ilyas Fatkhullin, Niao He

This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary nonconvex optimization setting.

no code implementations • 24 Feb 2024 • Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He

As Efroni et al. (2020) pointed out, it is an open question whether primal-dual algorithms can provably achieve sublinear regret if we do not allow error cancellations.

1 code implementation • 8 Feb 2024 • Jiawei Huang, Niao He, Andreas Krause

We study the sample complexity of reinforcement learning (RL) in Mean-Field Games (MFGs) with model-based function approximation that requires strategic exploration to find a Nash Equilibrium policy.

no code implementations • 15 Nov 2023 • Sadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Niao He, Matthias Grossglauser

Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.

no code implementations • 6 Nov 2023 • Florian Hübler, Junchi Yang, Xiang Li, Niao He

However, as the assumption is relaxed to the more realistic $(L_0, L_1)$-smoothness, all existing convergence results still necessitate tuning of the stepsize.

no code implementations • NeurIPS 2023 • Liang Zhang, Junchi Yang, Amin Karbasi, Niao He

Particularly, given the inexact initialization oracle, our regularization-based algorithms achieve the best of both worlds - optimal reproducibility and near-optimal gradient complexity - for minimization and minimax optimization.

1 code implementation • 14 Oct 2023 • Liang Zhang, Bingcong Li, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

The widespread practice of fine-tuning large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy.

1 code implementation • 21 Sep 2023 • Kei Ishikawa, Niao He, Takafumi Kanamori

We study policy evaluation of offline contextual bandits subject to unobserved confounders.

1 code implementation • 8 Sep 2023 • Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He

Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator.

no code implementations • 25 Jun 2023 • Jun Song, Niao He, Lijun Ding, Chaoyue Zhao

Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.

no code implementations • 13 Jun 2023 • Pragnya Alatur, Giorgia Ramponi, Niao He, Andreas Krause

Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective.

no code implementations • 12 Jun 2023 • Adrian Müller, Pragnya Alatur, Giorgia Ramponi, Niao He

Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.

no code implementations • 2 Jun 2023 • Anas Barakat, Ilyas Fatkhullin, Niao He

We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure.

no code implementations • 18 May 2023 • Jiawei Huang, Batuhan Yardim, Niao He

In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation.

2 code implementations • 26 Feb 2023 • Kei Ishikawa, Niao He

It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence.

1 code implementation • NeurIPS 2023 • Jiawei Huang, Niao He

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel.

no code implementations • 3 Feb 2023 • Ilyas Fatkhullin, Anas Barakat, Anastasia Kireeva, Niao He

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations.

no code implementations • 29 Dec 2022 • Batuhan Yardim, Semih Cayci, Matthieu Geist, Niao He

Instead, we show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $\widetilde{\mathcal{O}}(\varepsilon^{-2})$ samples from a single sample trajectory without a population generative model, up to a standard $\mathcal{O}(\frac{1}{\sqrt{N}})$ error due to the mean field.

no code implementations • 14 Nov 2022 • Hanjun Dai, Yuan Xue, Niao He, Bethany Wang, Na Li, Dale Schuurmans, Bo Dai

In real-world decision-making, uncertainty is important yet difficult to handle.

no code implementations • 31 Oct 2022 • Xiang Li, Junchi Yang, Niao He

Adaptive gradient methods have shown their ability to adjust the stepsizes on the fly in a parameter-agnostic manner, and empirically achieve faster convergence for solving minimization problems.

no code implementations • 2 Jun 2022 • Semih Cayci, Niao He, R. Srikant

Natural actor-critic (NAC) and its variants, equipped with the representation power of neural networks, have demonstrated impressive empirical success in solving Markov decision problems with large state spaces.

no code implementations • 1 Jun 2022 • Junchi Yang, Xiang Li, Niao He

Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates.

no code implementations • 1 Jun 2022 • Liang Zhang, Kiran Koshy Thekumparampil, Sewoong Oh, Niao He

We provide a general framework for solving differentially private stochastic minimax optimization (DP-SMO) problems, which enables the practitioners to bring their own base optimization algorithm and use it as a black-box to obtain the near-optimal privacy-loss trade-off.

no code implementations • 28 May 2022 • Siqi Zhang, Yifan Hu, Liang Zhang, Niao He

We further study the algorithm-dependent generalization bounds via stability arguments of algorithms.

no code implementations • 25 May 2022 • Saeed Masiha, Saber Salehkaleybar, Niao He, Negar Kiyavash, Patrick Thiran

We prove that the total sample complexity of SCRN in achieving $\epsilon$-global optimum is $\mathcal{O}(\epsilon^{-7/(2\alpha)+1})$ for $1\le\alpha< 3/2$ and $\mathcal{\tilde{O}}(\epsilon^{-2/(\alpha)})$ for $3/2\le\alpha\le 2$.

no code implementations • 17 May 2022 • Saber Salehkaleybar, Sadegh Khorasani, Negar Kiyavash, Niao He, Patrick Thiran

SHARP algorithm is parameter-free, achieving $\epsilon$-approximate first-order stationary point with $O(\epsilon^{-3})$ number of trajectories, while using a batch size of $O(1)$ at each iteration.

no code implementations • 20 Feb 2022 • Semih Cayci, Niao He, R. Srikant

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain.

no code implementations • 19 Jan 2022 • Kiran Koshy Thekumparampil, Niao He, Sewoong Oh

We also provide a direct single-loop algorithm, using the LPD method, that achieves the iteration complexity of $O(\sqrt{\frac{L_x}{\varepsilon}} + \frac{\|A\|}{\sqrt{\mu_y \varepsilon}} + \sqrt{\frac{L_y}{\varepsilon}})$.

1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He

Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.

no code implementations • NeurIPS 2021 • Yifan Hu, Xin Chen, Niao He

We consider stochastic optimization when one only has access to biased stochastic oracles of the objective, and obtaining stochastic gradients with low biases comes at high costs.

no code implementations • 29 Sep 2021 • Ahmet Alacaoglu, Luca Viano, Niao He, Volkan Cevher

Our sample complexities also match the best-known results for global convergence of policy gradient and two time-scale actor-critic algorithms in the single agent setting.

no code implementations • 29 Sep 2021 • Jun Song, Chaoyue Zhao, Niao He

Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning.

no code implementations • 8 Jun 2021 • Semih Cayci, Niao He, R. Srikant

Furthermore, under mild regularity conditions on the concentrability coefficient and basis vectors, we prove that entropy-regularized NPG exhibits \emph{linear convergence} up to a function approximation error.

no code implementations • 29 Mar 2021 • Siqi Zhang, Junchi Yang, Cristóbal Guzmán, Negar Kiyavash, Niao He

In the averaged smooth finite-sum setting, our proposed algorithm improves over previous algorithms by providing a nearly-tight dependence on the condition number.

no code implementations • 14 Mar 2021 • Donghwan Lee, Niao He, Seungjae Lee, Panagiota Karava, Jianghai Hu

The building sector consumes the largest energy in the world, and there have been considerable research interests in energy consumption and comfort management of buildings.

no code implementations • 2 Mar 2021 • Semih Cayci, Siddhartha Satpathi, Niao He, R. Srikant

In this paper, we study the dynamics of temporal difference learning with neural network-based value function approximation over a general state space, namely, \emph{Neural TD learning}.

no code implementations • 17 Feb 2021 • Donghwan Lee, Jianghai Hu, Niao He

Based on these two systems, we derive a new finite-time error bound of asynchronous Q-learning when a constant stepsize is used.

no code implementations • NeurIPS 2020 • Donghwan Lee, Niao He

This paper develops a novel and unified framework to analyze the convergence of a large family of Q-learning algorithms from the switching system perspective.

no code implementations • NeurIPS 2020 • Junchi Yang, Negar Kiyavash, Niao He

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.

no code implementations • NeurIPS 2020 • Yingxiang Yang, Negar Kiyavash, Le Song, Niao He

Macroscopic data aggregated from microscopic events are pervasive in machine learning, such as country-level COVID-19 infection statistics based on city-level data.

no code implementations • NeurIPS 2020 • Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He

We introduce a generic \emph{two-loop} scheme for smooth minimax optimization with strongly-convex-concave objectives.

1 code implementation • NeurIPS 2020 • Wentao Weng, Harsh Gupta, Niao He, Lei Ying, R. Srikant

In this paper, we establish a theoretical comparison between the asymptotic mean-squared error of Double Q-learning and Q-learning.

no code implementations • NeurIPS 2020 • Yifan Hu, Siqi Zhang, Xin Chen, Niao He

Conditional stochastic optimization covers a variety of applications ranging from invariant learning and causal inference to meta-learning.

no code implementations • L4DC 2020 • Donghwan Lee, Niao He

The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited.

no code implementations • 22 Feb 2020 • Junchi Yang, Negar Kiyavash, Niao He

Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning.

no code implementations • 4 Dec 2019 • Donghwan Lee, Niao He

In this paper, we introduce a unified framework for analyzing a large family of Q-learning algorithms, based on switching system perspectives and ODE-based stochastic approximation.

no code implementations • NeurIPS 2019 • Yingxiang Yang, Haoxiang Wang, Negar Kiyavash, Niao He

The nonparametric learning of positive-valued functions appears widely in machine learning, especially in the context of estimating intensity functions of point processes.

no code implementations • 1 Dec 2019 • Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher

This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate.

Distributed Optimization
Multi-agent Reinforcement Learning
**+2**

no code implementations • 28 May 2019 • Yifan Hu, Xin Chen, Niao He

In this paper, we study a class of stochastic optimization problems, referred to as the \emph{Conditional Stochastic Optimization} (CSO), in the form of $\min_{x \in \mathcal{X}} \EE_{\xi}f_\xi\Big({\EE_{\eta|\xi}[g_\eta(x,\xi)]}\Big)$, which finds a wide spectrum of applications including portfolio selection, reinforcement learning, robust learning, causal inference and so on.

1 code implementation • NeurIPS 2019 • Bo Dai, Zhen Liu, Hanjun Dai, Niao He, Arthur Gretton, Le Song, Dale Schuurmans

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks.

no code implementations • 24 Apr 2019 • Donghwan Lee, Niao He

The use of target networks has been a popular and key component of recent deep Q-learning algorithms for reinforcement learning, yet little is known from the theory side.

no code implementations • 26 Feb 2019 • Pan Li, Niao He, Olgica Milenkovic

We introduce a new convex optimization problem, termed quadratic decomposable submodular function minimization (QDSFM), which allows to model a number of learning tasks on graphs and hypergraphs.

no code implementations • NeurIPS 2018 • Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He

Approximate Bayesian computation (ABC) is an important methodology for Bayesian inference when the likelihood function is intractable.

1 code implementation • NeurIPS 2018 • Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song

This flexible function class couples the variational distribution with the original parameters in the graphical models, allowing end-to-end learning of the graphical models by back-propagation through the variational distribution.

1 code implementation • 6 Nov 2018 • Bo Dai, Hanjun Dai, Arthur Gretton, Le Song, Dale Schuurmans, Niao He

We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space.

1 code implementation • NeurIPS 2018 • Pan Li, Niao He, Olgica Milenkovic

The problem is closely related to decomposable submodular function minimization and arises in many learning on graphs and hypergraphs settings, such as graph-based semi-supervised learning and PageRank.

no code implementations • 25 Jan 2018 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash

In this paper, we design a nonparametric online algorithm for estimating the triggering functions of multivariate Hawkes processes.

no code implementations • ICML 2018 • Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song

When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades.

no code implementations • ICLR 2018 • Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song

This paper proposes a new actor-critic-style algorithm called Dual Actor-Critic or Dual-AC.

no code implementations • NeurIPS 2017 • Yingxiang Yang, Jalal Etesami, Niao He, Negar Kiyavash

We develop a nonparametric and online learning algorithm that estimates the triggering functions of a multivariate Hawkes process (MHP).

2 code implementations • ICML 2017 • Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song

Learning-based binary hashing has become a powerful paradigm for fast search and retrieval in massive databases.

no code implementations • 3 Aug 2016 • Niao He, Zaid Harchaoui, Yichen Wang, Le Song

Since almost all gradient-based optimization algorithms rely on Lipschitz-continuity, optimizing Poisson likelihood models with a guarantee of convergence can be challenging, especially for large-scale problems.

no code implementations • 15 Jul 2016 • Bo Dai, Niao He, Yunpeng Pan, Byron Boots, Le Song

In such problems, each sample $x$ itself is associated with a conditional distribution $p(z|x)$ represented by samples $\{z_i\}_{i=1}^M$, and the goal is to learn a function $f$ that links these conditional distributions to target values $y$.

no code implementations • NeurIPS 2015 • Nan Du, Yichen Wang, Niao He, Jimeng Sun, Le Song

By making personalized suggestions, a recommender system is playing a crucial role in improving the engagement of users in modern web-services.

no code implementations • NeurIPS 2015 • Niao He, Zaid Harchaoui

We propose a new first-order optimisation algorithm to solve high-dimensional non-smooth composite minimisation problems.

no code implementations • 9 Jun 2015 • Bo Dai, Niao He, Hanjun Dai, Le Song

Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters.

1 code implementation • NeurIPS 2014 • Bo Dai, Bo Xie, Niao He, YIngyu Liang, Anant Raj, Maria-Florina Balcan, Le Song

The general perception is that kernel methods are not scalable, and neural nets are the methods of choice for nonlinear learning problems.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.