Search Results for author: Haipeng Luo

Found 74 papers, 7 papers with code

Towards Minimax Online Learning with Unknown Time Horizon

no code implementations • 31 Jul 2013 • Haipeng Luo, Robert E. Schapire

We apply a minimax analysis, beginning with the fixed horizon case, and then moving on to two unknown-horizon settings, one that assumes the horizon is chosen randomly according to some known distribution, and the other which allows the adversary full control over the horizon.

Paper
Add Code

A Drifting-Games Analysis for Online Learning and Applications to Boosting

no code implementations • NeurIPS 2014 • Haipeng Luo, Robert E. Schapire

Different online learning settings (Hedge, multi-armed bandit problems and online convex optimization) are studied by converting into various kinds of drifting games.

Paper
Add Code

Accelerated Parallel Optimization Methods for Large Scale Machine Learning

no code implementations • 25 Nov 2014 • Haipeng Luo, Patrick Haffner, Jean-Francois Paiement

The growing amount of high dimensional data in different machine learning applications requires more efficient and scalable optimization algorithms.

BIG-bench Machine Learning

Paper
Add Code

Optimal and Adaptive Algorithms for Online Boosting

no code implementations • 9 Feb 2015 • Alina Beygelzimer, Satyen Kale, Haipeng Luo

We study online boosting, the task of converting any weak online learner into a strong online learner.

Paper
Add Code

Achieving All with No Parameters: Adaptive NormalHedge

no code implementations • 20 Feb 2015 • Haipeng Luo, Robert E. Schapire

We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information.

Paper
Add Code

Online Gradient Boosting

no code implementations • NeurIPS 2015 • Alina Beygelzimer, Elad Hazan, Satyen Kale, Haipeng Luo

We extend the theory of boosting for regression problems to the online learning setting.

regression

Paper
Add Code

Fast Convergence of Regularized Learning in Games

no code implementations • NeurIPS 2015 • Vasilis Syrgkanis, Alekh Agarwal, Haipeng Luo, Robert E. Schapire

We show that natural classes of regularized learning algorithms with a form of recency bias achieve faster convergence rates to approximate efficiency and to coarse correlated equilibria in multiplayer normal form games.

Paper
Add Code

Variance-Reduced and Projection-Free Stochastic Optimization

no code implementations • 5 Feb 2016 • Elad Hazan, Haipeng Luo

The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints.

Stochastic Optimization

Paper
Add Code

Efficient Second Order Online Learning by Sketching

no code implementations • NeurIPS 2016 • Haipeng Luo, Alekh Agarwal, Nicolo Cesa-Bianchi, John Langford

We propose Sketched Online Newton (SON), an online second order learning algorithm that enjoys substantially improved regret guarantees for ill-conditioned data.

Paper
Add Code

Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits

no code implementations • NeurIPS 2016 • Vasilis Syrgkanis, Haipeng Luo, Akshay Krishnamurthy, Robert E. Schapire

We give an oracle-based algorithm for the adversarial contextual bandit problem, where either contexts are drawn i. i. d.

Multi-Armed Bandits

Paper
Add Code

Oracle-Efficient Online Learning and Auction Design

no code implementations • 5 Nov 2016 • Miroslav Dudík, Nika Haghtalab, Haipeng Luo, Robert E. Schapire, Vasilis Syrgkanis, Jennifer Wortman Vaughan

We consider the design of computationally efficient online learning algorithms in an adversarial setting in which the learner has access to an offline optimization oracle.

Paper
Add Code

Corralling a Band of Bandit Algorithms

1 code implementation • 19 Dec 2016 • Alekh Agarwal, Haipeng Luo, Behnam Neyshabur, Robert E. Schapire

We study the problem of combining multiple bandit algorithms (that is, online learning algorithms with partial feedback) with the goal of creating a master algorithm that performs almost as well as the best base algorithm if it were to be run on its own.

Multi-Armed Bandits

Paper
Code

Efficient Contextual Bandits in Non-stationary Worlds

no code implementations • 5 Aug 2017 • Haipeng Luo, Chen-Yu Wei, Alekh Agarwal, John Langford

In this work, we develop several efficient contextual bandit algorithms for non-stationary environments by equipping existing methods for i. i. d.

Multi-Armed Bandits

Paper
Add Code

More Adaptive Algorithms for Adversarial Bandits

no code implementations • 10 Jan 2018 • Chen-Yu Wei, Haipeng Luo

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem (or more generally the combinatorial semi-bandit problem).

Paper
Add Code

Practical Contextual Bandits with Regression Oracles

no code implementations • ICML 2018 • Dylan J. Foster, Alekh Agarwal, Miroslav Dudík, Haipeng Luo, Robert E. Schapire

A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded.

General Classification Multi-Armed Bandits +1

Paper
Add Code

Logistic Regression: The Importance of Being Improper

no code implementations • 25 Mar 2018 • Dylan J. Foster, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Starting with the simple observation that the logistic loss is $1$-mixable, we design a new efficient improper learning algorithm for online logistic regression that circumvents the aforementioned lower bound with a regret bound exhibiting a doubly-exponential improvement in dependence on the predictor norm.

regression

Paper
Add Code

Efficient Online Portfolio with Logarithmic Regret

no code implementations • NeurIPS 2018 • Haipeng Luo, Chen-Yu Wei, Kai Zheng

We study the decades-old problem of online portfolio management and propose the first algorithm with logarithmic regret that is not based on Cover's Universal Portfolio algorithm and admits much faster implementation.

Management

Paper
Add Code

Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously

no code implementations • 25 Jan 2019 • Julian Zimmert, Haipeng Luo, Chen-Yu Wei

We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal{O}(\log T)$ regret for stochastic environments and $\mathcal{O}(\sqrt{T})$ regret for adversarial environments without knowledge of the regime or the number of rounds $T$.

Paper
Add Code

Improved Path-length Regret Bounds for Bandits

no code implementations • 29 Jan 2019 • Sébastien Bubeck, Yuanzhi Li, Haipeng Luo, Chen-Yu Wei

We study adaptive regret bounds in terms of the variation of the losses (the so-called path-length bounds) for both multi-armed bandit and more generally linear bandit.

Paper
Add Code

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

no code implementations • 3 Feb 2019 • Yifang Chen, Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei

We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal in terms of dynamic regret.

Multi-Armed Bandits

Paper
Add Code

Hypothesis Set Stability and Generalization

no code implementations • NeurIPS 2019 • Dylan J. Foster, Spencer Greenberg, Satyen Kale, Haipeng Luo, Mehryar Mohri, Karthik Sridharan

Our main result is a generalization bound for data-dependent hypothesis sets expressed in terms of a notion of hypothesis set stability and a notion of Rademacher complexity for data-dependent hypothesis sets that we introduce.

Paper
Add Code

Equipping Experts/Bandits with Long-term Memory

no code implementations • NeurIPS 2019 • Kai Zheng, Haipeng Luo, Ilias Diakonikolas, Li-Wei Wang

We propose the first reduction-based approach to obtaining long-term memory guarantees for online learning in the sense of Bousquet and Warmuth, 2002, by reducing the problem to achieving typical switching regret.

Multi-Armed Bandits

Paper
Add Code

Model selection for contextual bandits

1 code implementation • NeurIPS 2019 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

We work in the stochastic realizable setting with a sequence of nested linear policy classes of dimension $d_1 < d_2 < \ldots$, where the $m^\star$-th class contains the optimal policy, and we design an algorithm that achieves $\tilde{O}(T^{2/3}d^{1/3}_{m^\star})$ regret with no prior knowledge of the optimal dimension $d_{m^\star}$.

Model Selection Multi-Armed Bandits

Paper
Code

Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes

1 code implementation • ICML 2020 • Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain

Model-free reinforcement learning is known to be memory and computation efficient and more amendable to large scale problems.

Multi-Armed Bandits reinforcement-learning +1

Paper
Code

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

no code implementations • 3 Dec 2019 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the problem of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Paper
Add Code

Fair Contextual Multi-Armed Bandits: Theory and Experiments

no code implementations • 13 Dec 2019 • Yifang Chen, Alex Cuellar, Haipeng Luo, Jignesh Modi, Heramb Nemlekar, Stefanos Nikolaidis

We introduce a Multi-Armed Bandit algorithm with fairness constraints, where fairness is defined as a minimum rate that a task or a resource is assigned to a user.

Decision Making Fairness +1

Paper
Add Code

A Closer Look at Small-loss Bounds for Bandits with Graph Feedback

no code implementations • 2 Feb 2020 • Chung-Wei Lee, Haipeng Luo, Mengxiao Zhang

We study small-loss bounds for adversarial multi-armed bandits with graph feedback, that is, adaptive regret bounds that depend on the loss of the best arm or related quantities, instead of the total number of rounds.

Multi-Armed Bandits

Paper
Add Code

Taking a hint: How to leverage loss predictors in contextual bandits?

no code implementations • 4 Mar 2020 • Chen-Yu Wei, Haipeng Luo, Alekh Agarwal

We initiate the study of learning in contextual bandits with the help of loss predictors.

Multi-Armed Bandits

Paper
Add Code

Adversarial Online Learning with Changing Action Sets: Efficient Algorithms with Approximate Regret Bounds

no code implementations • 7 Mar 2020 • Ehsan Emamjomeh-Zadeh, Chen-Yu Wei, Haipeng Luo, David Kempe

We revisit the problem of online learning with sleeping experts/bandits: in each time step, only a subset of the actions are available for the algorithm to choose from (and learn about).

PAC learning

Paper
Add Code

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

no code implementations • 8 Jun 2020 • Mehdi Jafarnia-Jahromi, Chen-Yu Wei, Rahul Jain, Haipeng Luo

Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation.

Q-Learning reinforcement-learning +1

Paper
Add Code

Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

no code implementations • NeurIPS 2020 • Tiancheng Jin, Haipeng Luo

This work studies the problem of learning episodic Markov Decision Processes with known transition and bandit feedback.

Multi-Armed Bandits

Paper
Add Code

Active Online Domain Adaptation

no code implementations • ICML Workshop LifelongML 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Online Domain Adaptation regression

Paper
Add Code

Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

no code implementations • NeurIPS 2020 • Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang

We develop a new approach to obtaining high probability regret bounds for online learning with bandit feedback against an adaptive adversary.

Paper
Add Code

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

1 code implementation • ICLR 2021 • Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

Specifically, for OMWU in bilinear games over the simplex, we show that when the equilibrium is unique, linear last-iterate convergence is achieved with a learning rate whose value is set to a universal constant, improving the result of (Daskalakis & Panageas, 2019b) under the same assumption.

Paper
Code

Open Problem: Model Selection for Contextual Bandits

no code implementations • 19 Jun 2020 • Dylan J. Foster, Akshay Krishnamurthy, Haipeng Luo

In statistical learning, algorithms for model selection allow the learner to adapt to the complexity of the best hypothesis class in a sequence.

Model Selection Multi-Armed Bandits

Paper
Add Code

Active Online Learning with Hidden Shifting Domains

no code implementations • 25 Jun 2020 • Yining Chen, Haipeng Luo, Tengyu Ma, Chicheng Zhang

We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in settings where the data streams are from a mixture of hidden domains.

Domain Adaptation regression

Paper
Add Code

Comparator-adaptive Convex Bandits

no code implementations • NeurIPS 2020 • Dirk van der Hoeven, Ashok Cutkosky, Haipeng Luo

We study bandit convex optimization methods that adapt to the norm of the comparator, a topic that has only been studied before for its full-information counterpart.

Paper
Add Code

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

no code implementations • 23 Jul 2020 • Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Rahul Jain

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation.

Paper
Add Code

Minimax Regret for Stochastic Shortest Path with Adversarial Costs and Known Transition

no code implementations • 7 Dec 2020 • Liyu Chen, Haipeng Luo, Chen-Yu Wei

We study the stochastic shortest path problem with adversarial costs and known transition, and show that the minimax regret is $\widetilde{O}(\sqrt{DT^\star K})$ and $\widetilde{O}(\sqrt{DT^\star SA K})$ for the full-information setting and the bandit feedback setting respectively, where $D$ is the diameter, $T^\star$ is the expected hitting time of the optimal policy, $S$ is the number of states, $A$ is the number of actions, and $K$ is the number of episodes.

Paper
Add Code

Impossible Tuning Made Possible: A New Expert Algorithm and Its Applications

no code implementations • 1 Feb 2021 • Liyu Chen, Haipeng Luo, Chen-Yu Wei

We resolve the long-standing "impossible tuning" issue for the classic expert problem and show that, it is in fact possible to achieve regret $O\left(\sqrt{(\ln d)\sum_t \ell_{t, i}^2}\right)$ simultaneously for all expert $i$ in a $T$-round $d$-expert problem where $\ell_{t, i}$ is the loss for expert $i$ in round $t$.

Paper
Add Code

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

no code implementations • 8 Feb 2021 • Chen-Yu Wei, Chung-Wei Lee, Mengxiao Zhang, Haipeng Luo

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play.

Paper
Add Code

Non-stationary Reinforcement Learning without Prior Knowledge: An Optimal Black-box Approach

no code implementations • 10 Feb 2021 • Chen-Yu Wei, Haipeng Luo

Specifically, in most cases our algorithm achieves the optimal dynamic regret $\widetilde{\mathcal{O}}(\min\{\sqrt{LT}, \Delta^{1/3}T^{2/3}\})$ where $T$ is the number of rounds and $L$ and $\Delta$ are the number and amount of changes of the world respectively, while previous works only obtain suboptimal bounds and/or require the knowledge of $L$ and $\Delta$.

Multi-Armed Bandits reinforcement-learning +1

Paper
Add Code

Finding the Stochastic Shortest Path with Low Regret: The Adversarial Cost and Unknown Transition Case

no code implementations • 10 Feb 2021 • Liyu Chen, Haipeng Luo

Our work strictly improves (Rosenberg and Mansour, 2020) in the full information setting, extends (Chen et al., 2020) from known transition to unknown transition, and is also the first to consider the most challenging combination: bandit feedback with adversarial costs and unknown transition.

Paper
Add Code

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

no code implementations • 11 Feb 2021 • Chung-Wei Lee, Haipeng Luo, Chen-Yu Wei, Mengxiao Zhang, Xiaojin Zhang

In this work, we develop linear bandit algorithms that automatically adapt to different environments.

Paper
Add Code

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

no code implementations • NeurIPS 2021 • Tiancheng Jin, Longbo Huang, Haipeng Luo

We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\text{polylog}(T))$ regret when the losses are (almost) stochastic.

Open-Ended Question Answering

Paper
Add Code

Online Learning for Stochastic Shortest Path Model via Posterior Sampling

no code implementations • 9 Jun 2021 • Mehdi Jafarnia-Jahromi, Liyu Chen, Rahul Jain, Haipeng Luo

We consider the problem of online reinforcement learning for the Stochastic Shortest Path (SSP) problem modeled as an unknown MDP with an absorbing state.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path

no code implementations • NeurIPS 2021 • Liyu Chen, Mehdi Jafarnia-Jahromi, Rahul Jain, Haipeng Luo

We introduce a generic template for developing regret minimization algorithms in the Stochastic Shortest Path (SSP) model, which achieves minimax optimal regret as long as certain properties are ensured.

Paper
Add Code

Last-iterate Convergence in Extensive-Form Games

no code implementations • NeurIPS 2021 • Chung-Wei Lee, Christian Kroer, Haipeng Luo

Inspired by recent advances on last-iterate convergence of optimistic algorithms in zero-sum normal-form games, we study this phenomenon in sequential games, and provide a comprehensive study of last-iterate convergence for zero-sum extensive-form games with perfect recall (EFGs), using various optimistic regret-minimization algorithms over treeplexes.

counterfactual

Paper
Add Code

Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses

no code implementations • NeurIPS 2021 • Haipeng Luo, Chen-Yu Wei, Chung-Wei Lee

When a simulator is unavailable, we further consider a linear MDP setting and obtain $\widetilde{\mathcal{O}}({T}^{14/15})$ regret, which is the first result for linear MDPs with adversarial losses and bandit feedback.

Paper
Add Code

No-Regret Learning in Time-Varying Zero-Sum Games

no code implementations • 30 Jan 2022 • Mengxiao Zhang, Peng Zhao, Haipeng Luo, Zhi-Hua Zhou

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning.

Paper
Add Code

Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

no code implementations • 31 Jan 2022 • Tiancheng Jin, Tal Lancewicki, Haipeng Luo, Yishay Mansour, Aviv Rosenberg

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

no code implementations • 31 Jan 2022 • Liyu Chen, Rahul Jain, Haipeng Luo

We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints.

Paper
Add Code

Kernelized Multiplicative Weights for 0/1-Polyhedral Games: Bridging the Gap Between Learning in Extensive-Form and Normal-Form Games

no code implementations • 1 Feb 2022 • Gabriele Farina, Chung-Wei Lee, Haipeng Luo, Christian Kroer

In this paper we show that the Optimistic Multiplicative Weights Update (OMWU) algorithm -- the premier learning algorithm for NFGs -- can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick.

Paper
Add Code

Policy Optimization for Stochastic Shortest Path

no code implementations • 7 Feb 2022 • Liyu Chen, Haipeng Luo, Aviv Rosenberg

Policy optimization is among the most popular and successful reinforcement learning algorithms, and there is increasing interest in understanding its theoretical guarantees.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Corralling a Larger Band of Bandits: A Case Study on Switching Regret for Linear Bandits

no code implementations • 12 Feb 2022 • Haipeng Luo, Mengxiao Zhang, Peng Zhao, Zhi-Hua Zhou

The CORRAL algorithm of Agarwal et al. (2017) and its variants (Foster et al., 2020a) achieve this goal with a regret overhead of order $\widetilde{O}(\sqrt{MT})$ where $M$ is the number of base algorithms and $T$ is the time horizon.

Paper
Add Code

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

no code implementations • 12 Feb 2022 • Haipeng Luo, Mengxiao Zhang, Peng Zhao

We consider the problem of adversarial bandit convex optimization, that is, online learning over a sequence of arbitrary convex loss functions with only one function evaluation for each of them.

Paper
Add Code

Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games

no code implementations • 25 Apr 2022 • Ioannis Anagnostides, Gabriele Farina, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Tuomas Sandholm

In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$.

Paper
Add Code

Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments

no code implementations • 25 May 2022 • Liyu Chen, Haipeng Luo

We initiate the study of dynamic regret minimization for goal-oriented reinforcement learning modeled by a non-stationary stochastic shortest path problem with changing cost and transition functions.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

no code implementations • 26 May 2022 • Yan Dai, Haipeng Luo, Liyu Chen

More importantly, we then find two significant applications: First, the analysis of FTPL turns out to be readily generalizable to delayed bandit feedback with order-optimal regret, while OMD methods exhibit extra difficulties (Jin et al., 2022).

Paper
Add Code

Near-Optimal No-Regret Learning Dynamics for General Convex Games

no code implementations • 17 Jun 2022 • Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, Tuomas Sandholm

In this paper, we answer this in the positive by establishing the first uncoupled learning algorithm with $O(\log T)$ per-player regret in general \emph{convex games}, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets.

Paper
Add Code

Improved High-Probability Regret for Adversarial Bandits with Time-Varying Feedback Graphs

no code implementations • 4 Oct 2022 • Haipeng Luo, Hanghang Tong, Mengxiao Zhang, Yuheng Zhang

For general strongly observable graphs, we develop an algorithm that achieves the optimal regret $\widetilde{\mathcal{O}}((\sum_{t=1}^T\alpha_t)^{1/2}+\max_{t\in[T]}\alpha_t)$ with high probability, where $\alpha_t$ is the independence number of the feedback graph at round $t$.

Multi-Armed Bandits

Paper
Add Code

No-Regret Learning in Two-Echelon Supply Chain with Unknown Demand Distribution

no code implementations • 23 Oct 2022 • Mengxiao Zhang, Shi Chen, Haipeng Luo, Yingfei Wang

Supply chain management (SCM) has been recognized as an important discipline with applications to many industries, where the two-echelon stochastic inventory model, involving one downstream retailer and one upstream supplier, plays a fundamental role for developing firms' SCM strategies.

Management

Paper
Add Code

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

4 code implementations • CVPR 2023 • Wenhao Wu, Haipeng Luo, Bo Fang, Jingdong Wang, Wanli Ouyang

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences.

Ranked #7 on Video Retrieval on VATEX

Data Augmentation Retrieval +2

202

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

5 code implementations • CVPR 2023 • Wenhao Wu, Xiaohan Wang, Haipeng Luo, Jingdong Wang, Yi Yang, Wanli Ouyang

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Ranked #1 on Zero-Shot Action Recognition on ActivityNet

Action Classification Action Recognition +3

202

Paper
Code

Refined Regret for Adversarial MDPs with Linear Function Approximation

no code implementations • 30 Jan 2023 • Yan Dai, Haipeng Luo, Chen-Yu Wei, Julian Zimmert

This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest.

Paper
Add Code

Average-Constrained Policy Optimization

no code implementations • 2 Feb 2023 • Akhil Agnihotri, Rahul Jain, Haipeng Luo

In this paper, we introduce a new policy optimization with function approximation algorithm for constrained MDPs with the average criterion.

Reinforcement Learning (RL)

Paper
Add Code

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

1 code implementation • 18 Aug 2023 • Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, JianGuang Lou, Chongyang Tao, Xiubo Geng, QIngwei Lin, Shifeng Chen, Dongmei Zhang

Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model.

Ranked #49 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +2

8,871

Paper
Code

Online Learning in Contextual Second-Price Pay-Per-Click Auctions

no code implementations • 8 Oct 2023 • Mengxiao Zhang, Haipeng Luo

We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads and needs to make an estimate on their click-through rate (CTR) in order to run a second-price pay-per-click auction.

Paper
Add Code

Last-Iterate Convergence Properties of Regret-Matching Algorithms in Games

no code implementations • 1 Nov 2023 • Yang Cai, Gabriele Farina, Julien Grand-Clément, Christian Kroer, Chung-Wei Lee, Haipeng Luo, Weiqiang Zheng

Algorithms based on regret matching, specifically regret matching$^+$ (RM$^+$), and its variants are the most popular approaches for solving large-scale two-player zero-sum games in practice.

Paper
Add Code

Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games

no code implementations • 26 Jan 2024 • Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium.

Paper
Add Code

Contextual Multinomial Logit Bandits with General Value Functions

no code implementations • 12 Feb 2024 • Mengxiao Zhang, Haipeng Luo

Contextual multinomial logit (MNL) bandits capture many real-world assortment recommendation problems such as online retailing/advertising.

Computational Efficiency Multi-Armed Bandits

Paper
Add Code

Efficient Contextual Bandits with Uninformed Feedback Graphs

no code implementations • 12 Feb 2024 • Mengxiao Zhang, Yuheng Zhang, Haipeng Luo, Paul Mineiro

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications.

Multi-Armed Bandits regression

Paper
Add Code

Tractable Local Equilibria in Non-Concave Games

no code implementations • 13 Mar 2024 • Yang Cai, Constantinos Daskalakis, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng

While Online Gradient Descent and other no-regret learning procedures are known to efficiently converge to coarse correlated equilibrium in games where each agent's utility is concave in their own strategy, this is not the case when the utilities are non-concave, a situation that is common in machine learning applications where the agents' strategies are parameterized by deep neural networks, or the agents' utilities are computed by a neural network, or both.

Paper
Add Code

Learning Adversarial Markov Decision Processes with Bandit Feedback and Unknown Transition

no code implementations • ICML 2020 • Chi Jin, Tiancheng Jin, Haipeng Luo, Suvrit Sra, Tiancheng Yu

We consider the task of learning in episodic finite-horizon Markov decision processes with an unknown transition function, bandit feedback, and adversarial losses.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.