Search Results for author: Longbo Huang

Found 59 papers, 13 papers with code

Combinatorial Pure Exploration for Dueling Bandit

no code implementations ICML 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration

no code implementations25 Oct 2024 Hai Zhong, Xun Wang, Zhuoran Li, Longbo Huang

Offline-to-Online Reinforcement Learning has emerged as a powerful paradigm, leveraging offline data for initialization and online fine-tuning to enhance both sample efficiency and performance.

Efficient Exploration reinforcement-learning +3

uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

no code implementations4 Oct 2024 Yu Chen, Jiatai Huang, Yan Dai, Longbo Huang

To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem.

Multi-Armed Bandits Scheduling

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks

no code implementations3 Oct 2024 Rui Hu, Yifan Zhang, Zhuoran Li, Longbo Huang

In general, GFlowNets are trained by fitting the forward flow to the backward flow on sampled training objects.

Diversity regression

Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training

no code implementations28 Sep 2024 Pihe Hu, Shaolong Li, Zhuoran Li, Ling Pan, Longbo Huang

However, a direct adoption of DST fails to yield satisfactory MARL agents, leading to breakdowns in value learning within deep sparse value-based MARL models.

Model Compression Multi-agent Reinforcement Learning

Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining

no code implementations21 Aug 2024 Pihe Hu, Shaolong Li, Longbo Huang

Throughout these phases, the model is trained with a dynamically evolving sparse topology and an HSA mechanism to maintain performance and minimize training FLOPs concurrently.

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

no code implementations7 Mar 2024 Boning Li, Zhixuan Fang, Longbo Huang

Effective action abstraction is crucial in tackling challenges associated with large action spaces in Imperfect Information Extensive-Form Games (IIEFGs).

counterfactual Reinforcement Learning (RL)

Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

no code implementations28 Feb 2024 Tonghe Zhang, Yu Chen, Longbo Huang

This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration.

reinforcement-learning Reinforcement Learning

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

no code implementations28 Feb 2024 Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang

In this paper, we introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.

Distributional Reinforcement Learning reinforcement-learning +2

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

2 code implementations9 Nov 2023 Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps.

Image Generation

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation22 Oct 2023 Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

Deep Learning

One is More: Diverse Perspectives within a Single Network for Efficient DRL

no code implementations21 Oct 2023 Yiqin Tan, Ling Pan, Longbo Huang

Deep reinforcement learning has achieved remarkable performance in various domains by leveraging deep neural networks for approximating value functions and policies.

Deep Reinforcement Learning reinforcement-learning

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

4 code implementations6 Oct 2023 Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

Text-to-Image Generation

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

no code implementations4 Jul 2023 Zhuoran Li, Ling Pan, Longbo Huang

We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL).

Data Augmentation Diversity +3

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

no code implementations3 Mar 2023 Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory.

Data Augmentation Language Modelling +4

Queue Scheduling with Adversarial Bandit Learning

no code implementations3 Mar 2023 Jiatai Huang, Leana Golubchik, Longbo Huang

In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions.

Multi-Armed Bandits Scheduling

Why (and When) does Local SGD Generalize Better than SGD?

1 code implementation2 Mar 2023 Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.

Stochastic Generative Flow Networks

1 code implementation19 Feb 2023 Ling Pan, Dinghuai Zhang, Moksh Jain, Longbo Huang, Yoshua Bengio

Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control".

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations9 Feb 2023 Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning +1

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning

no code implementations25 Jan 2023 Jiatai Huang, Yan Dai, Longbo Huang

\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

Dueling Bandits: From Two-dueling to Multi-dueling

no code implementations16 Nov 2022 Yihan Du, Siwei Wang, Longbo Huang

DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.

Vocal Bursts Valence Prediction

Generative Augmented Flow Networks

no code implementations7 Oct 2022 Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio

We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments.

Diversity

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

1 code implementation30 Aug 2022 Pihe Hu, Ling Pan, Yu Chen, Zhixuan Fang, Longbo Huang

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.

Cloud Computing Deep Reinforcement Learning +3

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

no code implementations23 Jun 2022 Pihe Hu, Yu Chen, Longbo Huang

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s, a)$.

reinforcement-learning Reinforcement Learning +1

Provable Generalization of Overparameterized Meta-learning Trained with SGD

no code implementations18 Jun 2022 Yu Huang, Yingbin Liang, Longbo Huang

Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited.

Generalization Bounds Meta-Learning

Network Topology Optimization via Deep Reinforcement Learning

no code implementations19 Apr 2022 Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang

A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.

Deep Reinforcement Learning Graph Neural Network +3

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

no code implementations23 Mar 2022 Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, Longbo Huang

Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information.

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

no code implementations28 Jan 2022 Jiatai Huang, Yan Dai, Longbo Huang

Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.

Multi-Armed Bandits

Regularized Softmax Deep Multi-Agent Q-Learning

1 code implementation NeurIPS 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +5

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

1 code implementation22 Nov 2021 Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +3

Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

no code implementations15 Nov 2021 Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang

In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations.

Collaborative Pure Exploration in Kernel Bandit

no code implementations29 Oct 2021 Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.

Decision Making Recommendation Systems +1

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

no code implementations26 Oct 2021 Jiatai Huang, Yan Dai, Longbo Huang

We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.

Banker Online Mirror Descent

no code implementations16 Jun 2021 Jiatai Huang, Longbo Huang

In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

no code implementations NeurIPS 2021 Tiancheng Jin, Longbo Huang, Haipeng Luo

We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\text{polylog}(T))$ regret when the losses are (almost) stochastic.

Open-Ended Question Answering

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

1 code implementation NeurIPS 2021 Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang

In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.

Federated Learning

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations22 Mar 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +5

Continuous Mean-Covariance Bandits

no code implementations NeurIPS 2021 Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

Decision Making

A One-Size-Fits-All Solution to Conservative Bandit Problems

no code implementations14 Dec 2020 Yihan Du, Siwei Wang, Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.

Multi-Armed Bandits

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

no code implementations13 Dec 2020 Siwei Wang, Haoyun Wang, Longbo Huang

Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms.

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

no code implementations NeurIPS 2020 Siwei Wang, Longbo Huang, John C. S. Lui

Compared to existing algorithms, our result eliminates the exponential factor (in $M, N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.

Softmax Deep Double Deterministic Policy Gradients

1 code implementation NeurIPS 2020 Ling Pan, Qingpeng Cai, Longbo Huang

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.

continuous-control Continuous Control

Combinatorial Pure Exploration of Dueling Bandit

no code implementations23 Jun 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

no code implementations11 Jun 2020 Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.

Q-Learning Reinforcement Learning (RL)

Multi-Path Policy Optimization

no code implementations11 Nov 2019 Ling Pan, Qingpeng Cai, Longbo Huang

Recent years have witnessed a tremendous improvement of deep reinforcement learning.

Deep Reinforcement Learning Efficient Exploration

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation14 Mar 2019 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +3

Multi-armed Bandits with Compensation

no code implementations NeurIPS 2018 Siwei Wang, Longbo Huang

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for $T$ steps.

Multi-Armed Bandits

Double Quantization for Communication-Efficient Distributed Optimization

no code implementations NeurIPS 2019 Yue Yu, Jiaxiang Wu, Longbo Huang

In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients.

Distributed Optimization Quantization

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

no code implementations4 May 2018 Kun Chen, Kechao Cai, Longbo Huang, John C. S. Lui

The web link selection problem is to select a small subset of web links from a large web link pool, and to place the selected links on a web page that can only accommodate a limited number of links, e. g., advertisements, recommendations, or news feeds.

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations13 Feb 2018 Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.

Deep Reinforcement Learning reinforcement-learning +1

Multi-level Feedback Web Links Selection Problem: Learning and Optimization

no code implementations8 Sep 2017 Kechao Cai, Kun Chen, Longbo Huang, John C. S. Lui

To our best knowledge, we are the first to model the links selection problem as a constrained multi-armed bandit problem and design an effective links selection algorithm by learning the links' multi-level structure with provable \emph{sub-linear} regret and violation bounds.

Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization

no code implementations11 May 2017 Yue Yu, Longbo Huang

We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning.

BIG-bench Machine Learning

The Power of Online Learning in Stochastic Network Optimization

no code implementations6 Apr 2014 Longbo Huang, Xin Liu, Xiaohong Hao

We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(\epsilon), O([\log(1/\epsilon)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(\epsilon^{-2/3})$ convergence time.

Cannot find the paper you are looking for? You can Submit a new open access paper.