Search Results for author: Lin F. Yang

Found 67 papers, 8 papers with code

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels

no code implementations21 Dec 2023 Osama A. Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels.

Scheduling

Horizon-Free and Instance-Dependent Regret Bounds for Reinforcement Learning with General Function Approximation

no code implementations7 Dec 2023 Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang

To tackle long planning horizon problems in reinforcement learning with general function approximation, we propose the first algorithm, termed as UCRL-WVTR, that achieves both \emph{horizon-free} and \emph{instance-dependent}, since it eliminates the polynomial dependency on the planning horizon.

regression

Adaptive Liquidity Provision in Uniswap V3 with Deep Reinforcement Learning

no code implementations18 Sep 2023 Haochen Zhang, Xi Chen, Lin F. Yang

The DRL policy aims to optimize trading fees earned by LPs against associated costs, such as gas fees and hedging expenses, which is referred to as loss-versus-rebalancing (LVR).

Asset Management reinforcement-learning

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

no code implementations11 Jul 2023 Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang

Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards.

OpenAI Gym reinforcement-learning +1

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

no code implementations NeurIPS 2023 Jiayi Huang, Han Zhong, LiWei Wang, Lin F. Yang

Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$.

Reinforcement Learning (RL)

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

no code implementations18 Apr 2023 Dingwen Kong, Lin F. Yang

We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs.

Active Learning reinforcement-learning +1

Does Sparsity Help in Learning Misspecified Linear Bandits?

no code implementations29 Mar 2023 Jialin Dong, Lin F. Yang

In particular, Du et al. (2020) show that even if a learner is given linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or RL with a uniform error of $\varepsilon$, searching for an $O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$ queries.

reinforcement-learning Reinforcement Learning (RL)

Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP

no code implementations1 Dec 2022 Jinghan Wang, Mengdi Wang, Lin F. Yang

This work considers the sample complexity of obtaining an $\varepsilon$-optimal policy in an average reward Markov Decision Process (AMDP), given access to a generative model (simulator).

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

no code implementations8 Nov 2022 Osama A. Hanna, Lin F. Yang, Christina Fragouli

When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances.

Multi-Armed Bandits

Near-Optimal Sample Complexity Bounds for Constrained MDPs

no code implementations13 Jun 2022 Sharan Vaswani, Lin F. Yang, Csaba Szepesvári

In particular, we design a model-based algorithm that addresses two settings: (i) relaxed feasibility, where small constraint violations are allowed, and (ii) strict feasibility, where the output policy is required to satisfy the constraint.

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

no code implementations8 Jun 2022 Osama A. Hanna, Lin F. Yang, Christina Fragouli

Contextual linear bandits is a rich and theoretically important model that has many practical applications.

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

no code implementations1 Jun 2022 Sanae Amani, Lin F. Yang, Ching-An Cheng

We study lifelong reinforcement learning (RL) in a regret minimization setting of linear contextual Markov decision process (MDP), where the agent needs to learn a multi-task policy while solving a streaming sequence of tasks.

4k reinforcement-learning +1

Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost

no code implementations26 May 2022 Sanae Amani, Tor Lattimore, András György, Lin F. Yang

In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is ${\tilde{\mathcal{O}}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds.

Solving Multi-Arm Bandit Using a Few Bits of Communication

no code implementations11 Nov 2021 Osama A. Hanna, Lin F. Yang, Christina Fragouli

Existing works usually fail to address this issue and can become infeasible in certain applications.

Active Learning Quantization

Settling the Horizon-Dependence of Sample Complexity in Reinforcement Learning

no code implementations1 Nov 2021 Yuanzhi Li, Ruosong Wang, Lin F. Yang

Notably, for an RL environment with horizon length $H$, previous work have shown that there is a probably approximately correct (PAC) algorithm that learns an $O(1)$-optimal policy using $\mathrm{polylog}(H)$ episodes of environment interactions when the number of states and actions is fixed.

reinforcement-learning Reinforcement Learning (RL)

Breaking the Moments Condition Barrier: No-Regret Algorithm for Bandits with Super Heavy-Tailed Payoffs

no code implementations NeurIPS 2021 Han Zhong, Jiayi Huang, Lin F. Yang, LiWei Wang

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise $\eta$ satisfies Pr$\left[|\eta| > |y|\right] \le 1/|y|^{\alpha}$ for some $\alpha > 0$.

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

no code implementations12 Oct 2021 Weichao Mao, Lin F. Yang, Kaiqing Zhang, Tamer Başar

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential sample complexity dependence on the number of agents, a phenomenon known as \emph{the curse of multiagents}.

Multi-agent Reinforcement Learning Q-Learning +3

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

no code implementations9 Oct 2021 Junhong Shen, Lin F. Yang

To mitigate these issues, we propose a theoretically principled nearest neighbor (NN) function approximator that can improve the value networks in deep RL methods.

Reinforcement Learning (RL)

Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

no code implementations ICLR 2022 Xiaoyu Chen, Jiachen Hu, Lin F. Yang, LiWei Wang

In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that \emph{any planning algorithm} on the learned model can give a near-optimal policy.

Model-based Reinforcement Learning Reinforcement Learning (RL)

Gap-Dependent Unsupervised Exploration for Reinforcement Learning

1 code implementation11 Aug 2021 Jingfeng Wu, Vladimir Braverman, Lin F. Yang

In particular, for an unknown finite-horizon Markov decision process, the algorithm takes only $\widetilde{\mathcal{O}} (1/\epsilon \cdot (H^3SA / \rho + H^4 S^2 A) )$ episodes of exploration, and is able to obtain an $\epsilon$-optimal policy for a post-revealed reward with sub-optimality gap at least $\rho$, where $S$ is the number of states, $A$ is the number of actions, and $H$ is the length of the horizon, obtaining a nearly \emph{quadratic saving} in terms of $\epsilon$.

reinforcement-learning Reinforcement Learning (RL)

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

1 code implementation15 Jun 2021 Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.

reinforcement-learning Reinforcement Learning (RL)

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

no code implementations14 Jun 2021 Dingwen Kong, Ruslan Salakhutdinov, Ruosong Wang, Lin F. Yang

For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $\propto\operatorname{poly}\log(K)$ times for running the RL algorithm for $K$ episodes while still achieving a small near-optimal regret bound.

reinforcement-learning Reinforcement Learning (RL)

Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs

no code implementations11 Jun 2021 Jialin Dong, Da Zheng, Lin F. Yang, Geroge Karypis

This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy.

Fraud Detection

Provably Correct Optimization and Exploration with Non-linear Policies

1 code implementation22 Mar 2021 Fei Feng, Wotao Yin, Alekh Agarwal, Lin F. Yang

Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces.

Reinforcement Learning (RL)

Provably Breaking the Quadratic Error Compounding Barrier in Imitation Learning, Optimally

no code implementations25 Feb 2021 Nived Rajaraman, Yanjun Han, Lin F. Yang, Kannan Ramchandran, Jiantao Jiao

We establish an upper bound $O(|\mathcal{S}|H^{3/2}/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al (2020) which we prove to be computationally efficient.

Imitation Learning

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

no code implementations2 Jan 2021 Minbo Gao, Tianle Xie, Simon S. Du, Lin F. Yang

This paper focuses on the linear Markov Decision Process (MDP) recently studied in [Yang et al 2019, Jin et al 2020] where the linear function approximation is used for generalization on the large state space.

4k Recommendation Systems

Minimax Sample Complexity for Turn-based Stochastic Game

no code implementations29 Nov 2020 Qiwen Cui, Lin F. Yang

The empirical success of Multi-agent reinforcement learning is encouraging, while few theoretical guarantees have been revealed.

Multi-agent Reinforcement Learning reinforcement-learning +1

Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

1 code implementation NeurIPS 2021 Jingfeng Wu, Vladimir Braverman, Lin F. Yang

We formalize this problem as an episodic learning problem on a Markov decision process, where transitions are unknown and a reward function is the inner product of a preference vector with pre-specified multi-objective reward functions.

Multi-Objective Reinforcement Learning reinforcement-learning

Towards Fundamental Limits of Multi-armed Bandits with Random Walk Feedback

no code implementations3 Nov 2020 Tianyu Wang, Lin F. Yang, Zizhuo Wang

In this paper, we consider a new Multi-Armed Bandit (MAB) problem where arms are nodes in an unknown and possibly changing graph, and the agent (i) initiates random walks over the graph by pulling arms, (ii) observes the random walk trajectories, and (iii) receives rewards equal to the lengths of the walks.

Multi-Armed Bandits Recommendation Systems

Episodic Linear Quadratic Regulators with Low-rank Transitions

no code implementations3 Nov 2020 Tianyu Wang, Lin F. Yang

Consequently, the sample complexity of our algorithm only depends on the rank, $m$, rather than the ambient dimension, $d$, which can be orders-of-magnitude larger.

Toward the Fundamental Limits of Imitation Learning

no code implementations NeurIPS 2020 Nived Rajaraman, Lin F. Yang, Jiantao Jiao, Kannan Ramachandran

Here, we show that the policy which mimics the expert whenever possible is in expectation $\lesssim \frac{|\mathcal{S}| H^2 \log (N)}{N}$ suboptimal compared to the value of the expert, even when the expert follows an arbitrary stochastic policy.

Imitation Learning

Obtaining Adjustable Regularization for Free via Iterate Averaging

1 code implementation ICML 2020 Jingfeng Wu, Vladimir Braverman, Lin F. Yang

In sum, we obtain adjustable regularization for free for a large class of optimization problems and resolve an open question raised by Neu and Rosasco.

Open-Ended Question Answering

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

no code implementations NeurIPS 2020 Kaiqing Zhang, Sham M. Kakade, Tamer Başar, Lin F. Yang

This is in contrast to the usual reward-aware setting, with a $\tilde\Omega(|S|(|A|+|B|)(1-\gamma)^{-3}\epsilon^{-2})$ lower bound, where this model-based approach is near-optimal with only a gap on the $|A|,|B|$ dependence.

Model-based Reinforcement Learning Reinforcement Learning (RL)

On Reward-Free Reinforcement Learning with Linear Function Approximation

no code implementations NeurIPS 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov

The sample complexity of our algorithm is polynomial in the feature dimension and the planning horizon, and is completely independent of the number of states and actions.

reinforcement-learning Reinforcement Learning (RL)

Preference-based Reinforcement Learning with Finite-Time Guarantees

no code implementations NeurIPS 2020 Yichong Xu, Ruosong Wang, Lin F. Yang, Aarti Singh, Artur Dubrawski

If preferences are stochastic, and the preference probability relates to the hidden reward values, we present algorithms for PbRL, both with and without a simulator, that are able to identify the best policy up to accuracy $\varepsilon$ with high probability.

reinforcement-learning Reinforcement Learning (RL)

$Q$-learning with Logarithmic Regret

no code implementations16 Jun 2020 Kunhe Yang, Lin F. Yang, Simon S. Du

This paper presents the first non-asymptotic result showing that a model-free algorithm can achieve a logarithmic cumulative regret for episodic tabular reinforcement learning if there exists a strictly positive sub-optimality gap in the optimal $Q$-function.

Q-Learning

Model-Based Reinforcement Learning with Value-Targeted Regression

no code implementations ICML 2020 Alex Ayoub, Zeyu Jia, Csaba Szepesvari, Mengdi Wang, Lin F. Yang

We propose a model based RL algorithm that is based on optimism principle: In each episode, the set of models that are `consistent' with the data collected is constructed.

Model-based Reinforcement Learning regression +2

Is Long Horizon Reinforcement Learning More Difficult Than Short Horizon Reinforcement Learning?

no code implementations1 May 2020 Ruosong Wang, Simon S. Du, Lin F. Yang, Sham M. Kakade

Our analysis introduces two ideas: (i) the construction of an $\varepsilon$-net for optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class using sample complexity that scales with the log-covering number of the given policy class.

reinforcement-learning Reinforcement Learning (RL)

Provably Efficient Exploration for Reinforcement Learning Using Unsupervised Learning

1 code implementation NeurIPS 2020 Fei Feng, Ruosong Wang, Wotao Yin, Simon S. Du, Lin F. Yang

Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems [tang2017exploration, bellemare2016unifying], we investigate when this paradigm is provably efficient.

Efficient Exploration reinforcement-learning +1

Sketching Transformed Matrices with Applications to Natural Language Processing

no code implementations23 Feb 2020 Yingyu Liang, Zhao Song, Mengdi Wang, Lin F. Yang, Xin Yang

We show that our approach obtains small error and is efficient in both space and time.

Deep Reinforcement Learning with Linear Quadratic Regulator Regions

no code implementations23 Feb 2020 Gabriel I. Fernandez, Colin Togashi, Dennis W. Hong, Lin F. Yang

In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems.

reinforcement-learning Reinforcement Learning (RL)

How Does an Approximate Model Help in Reinforcement Learning?

no code implementations6 Dec 2019 Fei Feng, Wotao Yin, Lin F. Yang

In particular, we provide an algorithm that uses $\widetilde{O}(N/(1-\gamma)^3/\varepsilon^2)$ samples in a generative model to learn an $\varepsilon$-optimal policy, where $\gamma$ is the discount factor and $N$ is the number of near-optimal actions in the approximate model.

reinforcement-learning Reinforcement Learning (RL) +1

Continuous Control with Contexts, Provably

no code implementations30 Oct 2019 Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting.

Continuous Control

Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?

no code implementations ICLR 2020 Simon S. Du, Sham M. Kakade, Ruosong Wang, Lin F. Yang

With regards to the statistical viewpoint, this question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit sample efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning.

Imitation Learning reinforcement-learning +1

Efficient Symmetric Norm Regression via Linear Sketching

no code implementations NeurIPS 2019 Zhao Song, Ruosong Wang, Lin F. Yang, Hongyang Zhang, Peilin Zhong

When the loss function is a general symmetric norm, our algorithm produces a $\sqrt{d} \cdot \mathrm{polylog} n \cdot \mathrm{mmc}(\ell)$-approximate solution in input-sparsity time, where $\mathrm{mmc}(\ell)$ is a quantity related to the symmetric norm under consideration.

regression

PROVABLY BENEFITS OF DEEP HIERARCHICAL RL

no code implementations25 Sep 2019 Zeyu Jia, Simon S. Du, Ruosong Wang, Mengdi Wang, Lin F. Yang

Modern complex sequential decision-making problem often both low-level policy and high-level planning.

Decision Making Hierarchical Reinforcement Learning

Solving Discounted Stochastic Two-Player Games with Near-Optimal Time and Sample Complexity

no code implementations29 Aug 2019 Aaron Sidford, Mengdi Wang, Lin F. Yang, Yinyu Ye

In this paper, we settle the sampling complexity of solving discounted two-player turn-based zero-sum stochastic games up to polylogarithmic factors.

Q-Learning

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

no code implementations10 Jun 2019 Alekh Agarwal, Sham Kakade, Lin F. Yang

In this work, we study the effectiveness of the most natural plug-in approach to model-based planning: we build the maximum likelihood estimate of the transition model in the MDP from observations and then find an optimal policy in this empirical MDP.

Model-based Reinforcement Learning reinforcement-learning +1

Feature-Based Q-Learning for Two-Player Stochastic Games

no code implementations2 Jun 2019 Zeyu Jia, Lin F. Yang, Mengdi Wang

Consider a two-player zero-sum stochastic game where the transition function can be embedded in a given feature space.

Q-Learning Vocal Bursts Valence Prediction

Reinforcement Learning in Feature Space: Matrix Bandit, Kernels, and Regret Bound

no code implementations ICML 2020 Lin F. Yang, Mengdi Wang

In this case, the kernelized MatrixRL satisfies a regret bound ${O}\big(H^2\widetilde{d}\log T\sqrt{T}\big)$, where $\widetilde{d}$ is the effective dimension of the kernel space.

reinforcement-learning Reinforcement Learning (RL)

Learning to Control in Metric Space with Optimal Regret

1 code implementation5 May 2019 Lin F. Yang, Chengzhuo Ni, Mengdi Wang

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Sample-Optimal Parametric Q-Learning Using Linearly Additive Features

no code implementations13 Feb 2019 Lin F. Yang, Mengdi Wang

Consider a Markov decision process (MDP) that admits a set of state-action features, which can linearly express the process's probabilistic transition model.

Q-Learning

Towards a Theoretical Understanding of Hashing-Based Neural Nets

no code implementations26 Dec 2018 Yibo Lin, Zhao Song, Lin F. Yang

In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets.

On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization

no code implementations13 Jun 2018 Zhehui Chen, Xingguo Li, Lin F. Yang, Jarvis Haupt, Tuo Zhao

However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown.

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

1 code implementation5 Jun 2018 Aaron Sidford, Mengdi Wang, Xian Wu, Lin F. Yang, Yinyu Ye

In this paper we consider the problem of computing an $\epsilon$-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in $O(1)$ time.

Optimization and Control

Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data

no code implementations1 Feb 2018 Wei Hu, Zhao Song, Lin F. Yang, Peilin Zhong

We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset.

Clustering Vocal Bursts Intensity Prediction

Misspecified Nonconvex Statistical Optimization for Phase Retrieval

no code implementations18 Dec 2017 Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov

Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.

Retrieval

On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

no code implementations19 Jun 2017 Xingguo Li, Lin F. Yang, Jason Ge, Jarvis Haupt, Tong Zhang, Tuo Zhao

We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions.

Sparse Learning

Online Factorization and Partition of Complex Networks From Random Walks

no code implementations22 May 2017 Lin F. Yang, Vladimir Braverman, Tuo Zhao, Mengdi Wang

We formulate this into a nonconvex stochastic factorization problem and propose an efficient and scalable stochastic generalized Hebbian algorithm.

Clustering

The Physical Systems Behind Optimization Algorithms

no code implementations NeurIPS 2018 Lin F. Yang, R. Arora, V. Braverman, Tuo Zhao

We use differential equations based approaches to provide some {\it \textbf{physics}} insights into analyzing the dynamics of popular optimization algorithms in machine learning.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.