no code implementations • ICML 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.
no code implementations • 25 Oct 2024 • Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang
Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance.
no code implementations • 4 Oct 2024 • Yu Chen, Jiatai Huang, Yan Dai, Longbo Huang
To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem.
no code implementations • 3 Oct 2024 • Rui Hu, Yifan Zhang, Zhuoran Li, Longbo Huang
In general, GFlowNets are trained by fitting the forward flow to the backward flow on sampled training objects.
no code implementations • 28 Sep 2024 • Pihe Hu, Shaolong Li, Zhuoran Li, Ling Pan, Longbo Huang
However, a direct adoption of DST fails to yield satisfactory MARL agents, leading to breakdowns in value learning within deep sparse value-based MARL models.
no code implementations • 29 Aug 2024 • Yan Dai, Longbo Huang
Stochastic Network Optimization (SNO) concerns scheduling in stochastic queueing systems.
no code implementations • 21 Aug 2024 • Pihe Hu, Shaolong Li, Longbo Huang
Throughout these phases, the model is trained with a dynamically evolving sparse topology and an HSA mechanism to maintain performance and minimize training FLOPs concurrently.
no code implementations • 7 Mar 2024 • Boning Li, Zhixuan Fang, Longbo Huang
Effective action abstraction is crucial in tackling challenges associated with large action spaces in Imperfect Information Extensive-Form Games (IIEFGs).
no code implementations • 28 Feb 2024 • Tonghe Zhang, Yu Chen, Longbo Huang
This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration.
no code implementations • 28 Feb 2024 • Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang
In this paper, we introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
Distributional Reinforcement Learning reinforcement-learning +2
2 code implementations • 9 Nov 2023 • Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps.
1 code implementation • 22 Oct 2023 • Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang
In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.
no code implementations • 21 Oct 2023 • Yiqin Tan, Ling Pan, Longbo Huang
Deep reinforcement learning has achieved remarkable performance in various domains by leveraging deep neural networks for approximating value functions and policies.
4 code implementations • 6 Oct 2023 • Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao
Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).
no code implementations • 6 Jul 2023 • Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang
Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk.
no code implementations • 4 Jul 2023 • Zhuoran Li, Ling Pan, Longbo Huang
We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL).
no code implementations • 3 Mar 2023 • Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang
Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory.
no code implementations • 3 Mar 2023 • Jiatai Huang, Leana Golubchik, Longbo Huang
In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions.
1 code implementation • 2 Mar 2023 • Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora
Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.
1 code implementation • 19 Feb 2023 • Ling Pan, Dinghuai Zhang, Moksh Jain, Longbo Huang, Yoshua Bengio
Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control".
no code implementations • 9 Feb 2023 • Yihan Du, Longbo Huang, Wen Sun
In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.
no code implementations • 25 Jan 2023 • Jiatai Huang, Yan Dai, Longbo Huang
\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.
no code implementations • 16 Nov 2022 • Yihan Du, Siwei Wang, Longbo Huang
DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.
no code implementations • 7 Oct 2022 • Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio
We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments.
1 code implementation • 30 Aug 2022 • Pihe Hu, Ling Pan, Yu Chen, Zhixuan Fang, Longbo Huang
Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.
no code implementations • 23 Jun 2022 • Pihe Hu, Yu Chen, Longbo Huang
We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s, a)$.
no code implementations • 18 Jun 2022 • Yu Huang, Yingbin Liang, Longbo Huang
Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited.
no code implementations • 6 Jun 2022 • Yihan Du, Siwei Wang, Longbo Huang
For Worst Path RL, we propose an efficient algorithm with constant upper and lower bounds.
1 code implementation • 30 May 2022 • Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang
Training deep reinforcement learning (DRL) models usually requires high computation costs.
no code implementations • 19 Apr 2022 • Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang
A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.
no code implementations • 23 Mar 2022 • Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, Longbo Huang
Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information.
no code implementations • 28 Jan 2022 • Jiatai Huang, Yan Dai, Longbo Huang
Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.
1 code implementation • NeurIPS 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
1 code implementation • 22 Nov 2021 • Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu
Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.
no code implementations • 15 Nov 2021 • Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang
In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations.
no code implementations • 29 Oct 2021 • Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang
In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.
no code implementations • 26 Oct 2021 • Jiatai Huang, Yan Dai, Longbo Huang
We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.
no code implementations • 16 Jun 2021 • Jiatai Huang, Longbo Huang
In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.
no code implementations • NeurIPS 2021 • Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang
The world provides us with data of multiple modalities.
no code implementations • NeurIPS 2021 • Tiancheng Jin, Longbo Huang, Haipeng Luo
We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\text{polylog}(T))$ regret when the losses are (almost) stochastic.
1 code implementation • NeurIPS 2021 • Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang
In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.
no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
no code implementations • NeurIPS 2021 • Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang
To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.
no code implementations • 14 Dec 2020 • Yihan Du, Siwei Wang, Longbo Huang
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.
no code implementations • 13 Dec 2020 • Siwei Wang, Haoyun Wang, Longbo Huang
Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms.
no code implementations • NeurIPS 2020 • Siwei Wang, Longbo Huang, John C. S. Lui
Compared to existing algorithms, our result eliminates the exponential factor (in $M, N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.
1 code implementation • NeurIPS 2020 • Ling Pan, Qingpeng Cai, Longbo Huang
A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.
no code implementations • 23 Jun 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao
For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.
1 code implementation • NeurIPS 2021 • Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman
We study multi-agent reinforcement learning (MARL) in a stochastic network of agents.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 11 Jun 2020 • Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li
In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.
no code implementations • 11 Nov 2019 • Ling Pan, Qingpeng Cai, Longbo Huang
Recent years have witnessed a tremendous improvement of deep reinforcement learning.
1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu
In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.
no code implementations • NeurIPS 2018 • Siwei Wang, Longbo Huang
We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for $T$ steps.
no code implementations • NeurIPS 2019 • Yue Yu, Jiaxiang Wu, Longbo Huang
In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients.
no code implementations • 4 May 2018 • Kun Chen, Kechao Cai, Longbo Huang, John C. S. Lui
The web link selection problem is to select a small subset of web links from a large web link pool, and to place the selected links on a web page that can only accommodate a limited number of links, e. g., advertisements, recommendations, or news feeds.
no code implementations • 13 Feb 2018 • Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.
no code implementations • 8 Sep 2017 • Kechao Cai, Kun Chen, Longbo Huang, John C. S. Lui
To our best knowledge, we are the first to model the links selection problem as a constrained multi-armed bandit problem and design an effective links selection algorithm by learning the links' multi-level structure with provable \emph{sub-linear} regret and violation bounds.
no code implementations • 11 May 2017 • Yue Yu, Longbo Huang
We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning.
no code implementations • 6 Apr 2014 • Longbo Huang, Xin Liu, Xiaohong Hao
We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(\epsilon), O([\log(1/\epsilon)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(\epsilon^{-2/3})$ convergence time.