Search Results for author: Longbo Huang

Found 31 papers, 4 papers with code

Combinatorial Pure Exploration for Dueling Bandit

no code implementations ICML 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Network Topology Optimization via Deep Reinforcement Learning

no code implementations19 Apr 2022 Zhuoran Li, Xing Wang, Ling Pan, Lin Zhu, Zhendong Wang, Junlan Feng, Chao Deng, Longbo Huang

A2C-GS consists of three novel components, including a verifier to validate the correctness of a generated network topology, a graph neural network (GNN) to efficiently approximate topology rating, and a DRL actor layer to conduct a topology search.

reinforcement-learning

Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably)

no code implementations23 Mar 2022 Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, Longbo Huang

Recently, it has been observed that the best uni-modal network outperforms the jointly trained multi-modal network, which is counter-intuitive since multiple signals generally bring more information.

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

no code implementations28 Jan 2022 Jiatai Huang, Yan Dai, Longbo Huang

Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $\alpha$ and $\sigma$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori.

Multi-Armed Bandits

Regularized Softmax Deep Multi-Agent Q-Learning

1 code implementation NeurIPS 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +3

Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification

no code implementations22 Nov 2021 Ling Pan, Longbo Huang, Tengyu Ma, Huazhe Xu

Conservatism has led to significant progress in offline reinforcement learning (RL) where an agent learns from pre-collected datasets.

Continuous Control Multi-agent Reinforcement Learning +2

Simultaneously Achieving Sublinear Regret and Constraint Violations for Online Convex Optimization with Time-varying Constraints

no code implementations15 Nov 2021 Qingsong Liu, Wenfei Wu, Longbo Huang, Zhixuan Fang

In this paper, we develop a novel virtual-queue-based online algorithm for online convex optimization (OCO) problems with long-term and time-varying constraints and conduct a performance analysis with respect to the dynamic regret and constraint violations.

Collaborative Pure Exploration in Kernel Bandit

no code implementations29 Oct 2021 Yihan Du, Wei Chen, Yuko Kuroki, Longbo Huang

In this paper, we formulate a Collaborative Pure Exploration in Kernel Bandit problem (CoPE-KB), which provides a novel model for multi-agent multi-task decision making under limited communication and general reward functions, and is applicable to many online learning tasks, e. g., recommendation systems and network scheduling.

Decision Making online learning +1

Scale-Free Adversarial Multi-Armed Bandit with Arbitrary Feedback Delays

no code implementations26 Oct 2021 Jiatai Huang, Yan Dai, Longbo Huang

We also present a variant of \texttt{SFBanker} for problem instances with non-negative losses (i. e., they range in $[0, L]$ for some unknown $L$), achieving an $\tilde{\mathcal O}(\sqrt{K(D+T)}L)$ total regret, which is near-optimal compared to the $\Omega(\sqrt{KT}+\sqrt{D\log K}L)$ lower-bound ([Cesa-Bianchi et al., 2016]).

Banker Online Mirror Descent

no code implementations16 Jun 2021 Jiatai Huang, Longbo Huang

In particular, it leads to the first delayed adversarial linear bandit algorithm achieving $\tilde{O}(\text{poly}(n)(\sqrt{T} + \sqrt{D}))$ regret.

Multi-Armed Bandits online learning

Fast Federated Learning in the Presence of Arbitrary Device Unavailability

1 code implementation NeurIPS 2021 Xinran Gu, Kaixuan Huang, Jingzhao Zhang, Longbo Huang

In this case, the convergence of popular FL algorithms such as FedAvg is severely influenced by the straggling devices.

Federated Learning

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

no code implementations NeurIPS 2021 Tiancheng Jin, Longbo Huang, Haipeng Luo

We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\text{polylog}(T))$ regret when the losses are (almost) stochastic.

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations22 Mar 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +3

Continuous Mean-Covariance Bandits

no code implementations NeurIPS 2021 Yihan Du, Siwei Wang, Zhixuan Fang, Longbo Huang

To the best of our knowledge, this is the first work that considers option correlation in risk-aware bandits and explicitly quantifies how arbitrary covariance structures impact the learning performance.

Decision Making

A One-Size-Fits-All Solution to Conservative Bandit Problems

no code implementations14 Dec 2020 Yihan Du, Siwei Wang, Longbo Huang

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i. e., the learner's reward performance must be at least as well as a given baseline at any time.

Multi-Armed Bandits

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

no code implementations13 Dec 2020 Siwei Wang, Haoyun Wang, Longbo Huang

Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms.

Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

no code implementations NeurIPS 2020 Siwei Wang, Longbo Huang, John C. S. Lui

Compared to existing algorithms, our result eliminates the exponential factor (in $M, N$) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.

Softmax Deep Double Deterministic Policy Gradients

1 code implementation NeurIPS 2020 Ling Pan, Qingpeng Cai, Longbo Huang

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.

Continuous Control

Combinatorial Pure Exploration of Dueling Bandit

no code implementations23 Jun 2020 Wei Chen, Yihan Du, Longbo Huang, Haoyu Zhao

For Borda winner, we establish a reduction of the problem to the original CPE-MAB setting and design PAC and exact algorithms that achieve both the sample complexity similar to that in the CPE-MAB setting (which is nearly optimal for a subclass of problems) and polynomial running time per round.

Exploration by Maximizing Rényi Entropy for Reward-Free RL Framework

no code implementations11 Jun 2020 Chuheng Zhang, Yuanying Cai, Longbo Huang, Jian Li

In the planning phase, the agent computes a good policy for any reward function based on the dataset without further interacting with the environment.

Q-Learning

Multi-Path Policy Optimization

no code implementations11 Nov 2019 Ling Pan, Qingpeng Cai, Longbo Huang

Recent years have witnessed a tremendous improvement of deep reinforcement learning.

Efficient Exploration

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation14 Mar 2019 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +1

Multi-armed Bandits with Compensation

no code implementations NeurIPS 2018 Siwei Wang, Longbo Huang

We propose and study the known-compensation multi-arm bandit (KCMAB) problem, where a system controller offers a set of arms to many short-term players for $T$ steps.

Multi-Armed Bandits

Double Quantization for Communication-Efficient Distributed Optimization

no code implementations NeurIPS 2019 Yue Yu, Jiaxiang Wu, Longbo Huang

In this paper, to reduce the communication complexity, we propose \emph{double quantization}, a general scheme for quantizing both model parameters and gradients.

Distributed Optimization Quantization

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

no code implementations4 May 2018 Kun Chen, Kechao Cai, Longbo Huang, John C. S. Lui

The web link selection problem is to select a small subset of web links from a large web link pool, and to place the selected links on a web page that can only accommodate a limited number of links, e. g., advertisements, recommendations, or news feeds.

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations13 Feb 2018 Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.

reinforcement-learning

Multi-level Feedback Web Links Selection Problem: Learning and Optimization

no code implementations8 Sep 2017 Kechao Cai, Kun Chen, Longbo Huang, John C. S. Lui

To our best knowledge, we are the first to model the links selection problem as a constrained multi-armed bandit problem and design an effective links selection algorithm by learning the links' multi-level structure with provable \emph{sub-linear} regret and violation bounds.

Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization

no code implementations11 May 2017 Yue Yu, Longbo Huang

We consider the stochastic composition optimization problem proposed in \cite{wang2017stochastic}, which has applications ranging from estimation to statistical and machine learning.

The Power of Online Learning in Stochastic Network Optimization

no code implementations6 Apr 2014 Longbo Huang, Xin Liu, Xiaohong Hao

We prove strong performance guarantees of the proposed algorithms: $\mathtt{OLAC}$ and $\mathtt{OLAC2}$ achieve the near-optimal $[O(\epsilon), O([\log(1/\epsilon)]^2)]$ utility-delay tradeoff and $\mathtt{OLAC2}$ possesses an $O(\epsilon^{-2/3})$ convergence time.

online learning

Cannot find the paper you are looking for? You can Submit a new open access paper.