Search Results for author: Longbo Huang

Found 53 papers, 13 papers with code

Combinatorial Pure Exploration for Dueling Bandit

no code implementations • ICML 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Position

Paper
Add Code

RL-CFR: Improving Action Abstraction for Imperfect Information Extensive-Form Games with Reinforcement Learning

no code implementations • 7 Mar 2024 • Boning Li, Zhixuan Fang, Longbo Huang

Effective action abstraction is crucial in tackling challenges associated with large action spaces in Imperfect Information Extensive-Form Games (IIEFGs).

counterfactual Reinforcement Learning (RL)

Paper
Add Code

Provably Efficient Partially Observable Risk-Sensitive Reinforcement Learning with Hindsight Observation

no code implementations • 28 Feb 2024 • Tonghe Zhang, Yu Chen, Longbo Huang

This work pioneers regret analysis of risk-sensitive reinforcement learning in partially observable environments with hindsight observation, addressing a gap in theoretical exploration.

reinforcement-learning

Paper
Add Code

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

no code implementations • 28 Feb 2024 • Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang

In this paper, we introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.

Distributional Reinforcement Learning reinforcement-learning +1

Paper
Add Code

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

2 code implementations • 9 Nov 2023 • Simian Luo, Yiqin Tan, Suraj Patil, Daniel Gu, Patrick von Platen, Apolinário Passos, Longbo Huang, Jian Li, Hang Zhao

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps.

Image Generation

4,046

Paper
Code

A Quadratic Synchronization Rule for Distributed Deep Learning

1 code implementation • 22 Oct 2023 • Xinran Gu, Kaifeng Lyu, Sanjeev Arora, Jingzhao Zhang, Longbo Huang

In distributed deep learning with data parallelism, synchronizing gradients at each training step can cause a huge communication overhead, especially when many nodes work together to train large models.

Paper
Code

One is More: Diverse Perspectives within a Single Network for Efficient DRL

no code implementations • 21 Oct 2023 • Yiqin Tan, Ling Pan, Longbo Huang

Deep reinforcement learning has achieved remarkable performance in various domains by leveraging deep neural networks for approximating value functions and policies.

reinforcement-learning

Paper
Add Code

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

3 code implementations • 6 Oct 2023 • Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, Hang Zhao

Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al).

Text-to-Image Generation

4,046

Paper
Code

Provably Efficient Iterated CVaR Reinforcement Learning with Function Approximation and Human Feedback

no code implementations • 6 Jul 2023 • Yu Chen, Yihan Du, Pihe Hu, Siwei Wang, Desheng Wu, Longbo Huang

Risk-sensitive reinforcement learning (RL) aims to optimize policies that balance the expected reward and risk.

Decision Making LEMMA +2

Paper
Add Code

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

no code implementations • 4 Jul 2023 • Zhuoran Li, Ling Pan, Longbo Huang

We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL).

Data Augmentation Multi-agent Reinforcement Learning +1

Paper
Add Code

Queue Scheduling with Adversarial Bandit Learning

no code implementations • 3 Mar 2023 • Jiatai Huang, Leana Golubchik, Longbo Huang

In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions.

Multi-Armed Bandits Scheduling

Paper
Add Code

RePreM: Representation Pre-training with Masked Model for Reinforcement Learning

no code implementations • 3 Mar 2023 • Yuanying Cai, Chuheng Zhang, Wei Shen, Xuyun Zhang, Wenjie Ruan, Longbo Huang

Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory.

Data Augmentation Language Modelling +3

Paper
Add Code

Why (and When) does Local SGD Generalize Better than SGD?

1 code implementation • 2 Mar 2023 • Xinran Gu, Kaifeng Lyu, Longbo Huang, Sanjeev Arora

Local SGD is a communication-efficient variant of SGD for large-scale training, where multiple GPUs perform SGD independently and average the model parameters periodically.

Paper
Code

Stochastic Generative Flow Networks

1 code implementation • 19 Feb 2023 • Ling Pan, Dinghuai Zhang, Moksh Jain, Longbo Huang, Yoshua Bengio

Generative Flow Networks (or GFlowNets for short) are a family of probabilistic agents that learn to sample complex combinatorial structures through the lens of "inference as control".

Paper
Code

Multi-task Representation Learning for Pure Exploration in Linear Bandits

no code implementations • 9 Feb 2023 • Yihan Du, Longbo Huang, Wen Sun

In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks.

Decision Making Representation Learning

Paper
Add Code

Banker Online Mirror Descent: A Universal Approach for Delayed Online Bandit Learning

no code implementations • 25 Jan 2023 • Jiatai Huang, Yan Dai, Longbo Huang

\texttt{Banker-OMD} leads to the first delayed scale-free adversarial MAB algorithm achieving $\widetilde{\mathcal O}(\sqrt{K}L(\sqrt T+\sqrt D))$ regret and the first delayed adversarial linear bandit algorithm achieving $\widetilde{\mathcal O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

Paper
Add Code

Dueling Bandits: From Two-dueling to Multi-dueling

no code implementations • 16 Nov 2022 • Yihan Du, Siwei Wang, Longbo Huang

DoublerBAI provides a generic schema for translating known results on best arm identification algorithms to the dueling bandit problem, and achieves a regret bound of $O(\ln T)$.

Vocal Bursts Valence Prediction

Paper
Add Code

Generative Augmented Flow Networks

no code implementations • 7 Oct 2022 • Ling Pan, Dinghuai Zhang, Aaron Courville, Longbo Huang, Yoshua Bengio

We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments.

Paper
Add Code

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

1 code implementation • 30 Aug 2022 • Pihe Hu, Ling Pan, Yu Chen, Zhixuan Fang, Longbo Huang

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing.

Cloud Computing reinforcement-learning +2

Paper
Code

Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

no code implementations • 23 Jun 2022 • Pihe Hu, Yu Chen, Longbo Huang

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s, a)$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Provable Generalization of Overparameterized Meta-learning Trained with SGD

no code implementations • 18 Jun 2022 • Yu Huang, Yingbin Liang, Longbo Huang

Despite the superior empirical success of deep meta-learning, theoretical understanding of overparameterized meta-learning is still limited.

Generalization Bounds Meta-Learning

Paper
Add Code

Provably Efficient Risk-Sensitive Reinforcement Learning: Iterated CVaR and Worst Path

no code implementations • 6 Jun 2022 • Yihan Du, Siwei Wang, Longbo Huang

For Worst Path RL, we propose an efficient algorithm with constant upper and lower bounds.

Autonomous Driving reinforcement-learning +1

Paper
Add Code

RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch

1 code implementation • 30 May 2022 • Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang

Training deep reinforcement learning (DRL) models usually requires high computation costs.

Continuous Control Knowledge Distillation +3

Paper
Code

Network Topology Optimization via Deep Reinforcement Learning

no code implementations • 19 Apr 2022 • Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang

A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.

Management reinforcement-learning +1

Paper
Add Code

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

no code implementations • 23 Mar 2022 • Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, Longbo Huang

Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information.

Paper
Add Code

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

no code implementations • 28 Jan 2022 • Jiatai Huang, Yan Dai, Longbo Huang

Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.

Multi-Armed Bandits

Paper
Add Code

Regularized Softmax Deep Multi-Agent Q-Learning

1 code implementation • NeurIPS 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +4

Paper
Code

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

1 code implementation • 22 Nov 2021 • Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +3

Paper
Code

Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

no code implementations • 15 Nov 2021 • Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang

In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations.

Paper
Add Code

Collaborative Pure Exploration in Kernel Bandit

no code implementations • 29 Oct 2021 • Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.

Decision Making Recommendation Systems +1

Paper
Add Code

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

no code implementations • 26 Oct 2021 • Jiatai Huang, Yan Dai, Longbo Huang

We consider the Scale-Free Adversarial Multi-Armed Bandit (MAB) problem with unrestricted feedback delays.

Paper
Add Code

Banker Online Mirror Descent

no code implementations • 16 Jun 2021 • Jiatai Huang, Longbo Huang

In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits

Paper
Add Code

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

no code implementations • NeurIPS 2021 • Tiancheng Jin, Longbo Huang, Haipeng Luo

We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\text{polylog}(T))$ regret when the losses are (almost) stochastic.

Open-Ended Question Answering

Paper
Add Code

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

1 code implementation • NeurIPS 2021 • Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang

In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.

Federated Learning

Paper
Code

What Makes Multi-modal Learning Better than Single (Provably)

no code implementations • NeurIPS 2021 • Yu Huang, Chenzhuang Du, Zihui Xue, Xuanyao Chen, Hang Zhao, Longbo Huang

The world provides us with data of multiple modalities.

Paper
Add Code

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Multi-agent Reinforcement Learning Q-Learning +4

Paper
Add Code

Continuous Mean-Covariance Bandits

no code implementations • NeurIPS 2021 • Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

Decision Making

Paper
Add Code

A One-Size-Fits-All Solution to Conservative Bandit Problems

no code implementations • 14 Dec 2020 • Yihan Du, Siwei Wang, Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.

Multi-Armed Bandits

Paper
Add Code

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

no code implementations • 13 Dec 2020 • Siwei Wang, Haoyun Wang, Longbo Huang

Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms.

Paper
Add Code

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

no code implementations • NeurIPS 2020 • Siwei Wang, Longbo Huang, John C. S. Lui

Compared to existing algorithms, our result eliminates the exponential factor (in $M, N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.

Paper
Add Code

Softmax Deep Double Deterministic Policy Gradients

1 code implementation • NeurIPS 2020 • Ling Pan, Qingpeng Cai, Longbo Huang

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.

Continuous Control

Paper
Code

Combinatorial Pure Exploration of Dueling Bandit

no code implementations • 23 Jun 2020 • Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

Position

Paper
Add Code

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

no code implementations • 11 Jun 2020 • Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Multi-Agent Reinforcement Learning in Stochastic Networked Systems

1 code implementation • NeurIPS 2021 • Yiheng Lin, Guannan Qu, Longbo Huang, Adam Wierman

We study multi-agent reinforcement learning (MARL) in a stochastic network of agents.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

Multi-Path Policy Optimization

no code implementations • 11 Nov 2019 • Ling Pan, Qingpeng Cai, Longbo Huang

Recent years have witnessed a tremendous improvement of deep reinforcement learning.

Efficient Exploration

Paper
Add Code

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +2

Paper
Code

Multi-armed Bandits with Compensation

no code implementations • NeurIPS 2018 • Siwei Wang, Longbo Huang

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for $T$ steps.

Multi-Armed Bandits

Paper
Add Code

Double Quantization for Communication-Efficient Distributed Optimization

no code implementations • NeurIPS 2019 • Yue Yu, Jiaxiang Wu, Longbo Huang

In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients.

Distributed Optimization Quantization

Paper
Add Code

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

no code implementations • 4 May 2018 • Kun Chen, Kechao Cai, Longbo Huang, John C. S. Lui

The web link selection problem is to select a small subset of web links from a large web link pool, and to place the selected links on a web page that can only accommodate a limited number of links, e. g., advertisements, recommendations, or news feeds.

Paper
Add Code

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations • 13 Feb 2018 • Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Multi-level Feedback Web Links Selection Problem: Learning and Optimization

no code implementations • 8 Sep 2017 • Kechao Cai, Kun Chen, Longbo Huang, John C. S. Lui

To our best knowledge, we are the first to model the links selection problem as a constrained multi-armed bandit problem and design an effective links selection algorithm by learning the links' multi-level structure with provable \emph{sub-linear} regret and violation bounds.

Paper
Add Code

Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization

no code implementations • 11 May 2017 • Yue Yu, Longbo Huang

We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning.

BIG-bench Machine Learning

Paper
Add Code

The Power of Online Learning in Stochastic Network Optimization

no code implementations • 6 Apr 2014 • Longbo Huang, Xin Liu, Xiaohong Hao

We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(\epsilon), O([\log(1/\epsilon)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(\epsilon^{-2/3})$ convergence time.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.