no code implementations • ICML 2020 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
We highlight that MF-FQI algorithm enjoys a ``blessing of many agents'' property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • ICML 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang
Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.
no code implementations • 13 Mar 2025 • Zhiyu Mou, Miao Xu, Rongquan Bai, Zhuoran Yang, Chuan Yu, Jian Xu, Bo Zheng
However, the NCB problem presents significant challenges due to its constrained bi-level structure and the typically large number of advertisers involved.
no code implementations • 12 Mar 2025 • Jing Wang, Fengzhuo Zhang, XiaoLi Li, Vincent Y. F. Tan, Tianyu Pang, Chao Du, Aixin Sun, Zhuoran Yang
In this work, we develop theoretical underpinnings for these models and use our insights to improve the performance of existing models.
no code implementations • 23 Feb 2025 • Yunhai Feng, Jiaming Han, Zhuoran Yang, Xiangyu Yue, Sergey Levine, Jianlan Luo
Solving complex long-horizon robotic manipulation problems requires sophisticated high-level planning capabilities, the ability to reason about the physical world, and reactively choose appropriate motor skills.
no code implementations • 11 Feb 2025 • Xuefeng Liu, Songhao Jiang, Siyu Chen, Zhuoran Yang, Yuxin Chen, Ian Foster, Rick Stevens
This research delves into the realm of drug optimization and introduce a novel reinforcement learning algorithm to finetune a drug optimization LLM-based generative model, enhancing the original drug across target objectives, while retains the beneficial chemical properties of the original drug.
no code implementations • 11 Feb 2025 • Xuefeng Liu, Hung T. C. Le, Siyu Chen, Rick Stevens, Zhuoran Yang, Matthew R. Walter, Yuxin Chen
Online reinforcement learning (RL) enhances policies through direct interactions with the environment, but faces challenges related to sample efficiency.
no code implementations • 24 Dec 2024 • Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Haifeng Xu
When instantiated in the domain of Bayesian linear regression, our value naturally corresponds to information gain.
no code implementations • 23 Dec 2024 • Xingyao Li, Fengzhuo Zhang, Jiachun Pan, Yunlong Hou, Vincent Y. F. Tan, Zhuoran Yang
Despite the considerable progress achieved in the long video generation problem, there is still significant room to improve the consistency of the videos, particularly in terms of smoothness and transitions between scenes.
no code implementations • 11 Dec 2024 • Zhuoran Yang, Xi Guo, Chenjing Ding, Chiyu Wang, Wei Wu
Autonomous driving requires robust perception models trained on high-quality, large-scale multi-view driving videos for tasks like 3D object detection, segmentation and trajectory prediction.
1 code implementation • 3 Dec 2024 • Kaixiong Gong, Kaituo Feng, Bohao Li, Yibing Wang, Mofan Cheng, Shijia Yang, Jiaming Han, Benyou Wang, Yutong Bai, Zhuoran Yang, Xiangyu Yue
Recently, multimodal large language models (MLLMs), such as GPT-4o, Gemini 1. 5 Pro, and Reka Core, have expanded their capabilities to include vision and audio modalities.
no code implementations • 9 Sep 2024 • Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang
In the limiting model, the first attention layer acts as a $\mathit{copier}$, copying past tokens within a given window to each position, and the feed-forward network with normalization acts as a $\mathit{selector}$ that generates a feature vector by only looking at informationally relevant parents from the window.
no code implementations • 25 Aug 2024 • Xinyang Hu, Fengzhuo Zhang, Siyu Chen, Zhuoran Yang
This estimator effectively solves the multi-step reasoning problem by aggregating a posterior distribution inferred from the demonstration examples in the prompt.
no code implementations • 23 Jun 2024 • Zehao Dou, Minshuo Chen, Mengdi Wang, Zhuoran Yang
Diffusion models have revolutionized various application domains, including computer vision and audio generation.
no code implementations • 30 May 2024 • Jianliang He, Siyu Chen, Fengzhuo Zhang, Zhuoran Yang
In this work, from a theoretical lens, we aim to understand why large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
1 code implementation • 25 May 2024 • Chuanhao Li, Runhan Yang, Tiankai Li, Milad Bafarassat, Kourosh Sharifi, Dirk Bergemann, Zhuoran Yang
Large Language Models (LLMs) like GPT-4 have revolutionized natural language processing, showing remarkable linguistic proficiency and reasoning capabilities.
1 code implementation • 30 Apr 2024 • Chenjia Bai, Lingxiao Wang, Jianye Hao, Zhuoran Yang, Bin Zhao, Zhen Wang, Xuelong Li
We further provide theoretical analysis, which shows that the optimality gap of our method is only related to the expected data coverage of the shared dataset, thus resolving the distribution shift issue in data sharing.
no code implementations • 19 Apr 2024 • Jianliang He, Han Zhong, Zhuoran Yang
Moreover, for AMDPs, we propose a novel complexity measure -- average-reward generalized eluder coefficient (AGEC) -- which captures the challenge of exploration in AMDPs with general function approximation.
no code implementations • 18 Apr 2024 • Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen
Under this regime, the stochastic gradient descent-ascent corresponds to a Wasserstein gradient flow over the space of probability measures defined over the space of neural network parameters.
no code implementations • 18 Mar 2024 • Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning.
no code implementations • 1 Mar 2024 • Awni Altabaa, Zhuoran Yang
In a sequential decision-making problem, the information structure is the description of how events in the system occurring at different points in time affect each other.
no code implementations • 29 Feb 2024 • Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang
In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model.
no code implementations • 16 Feb 2024 • Zihao Li, Boyi Liu, Zhuoran Yang, Zhaoran Wang, Mengdi Wang
Designing algorithms for a constrained convex MDP faces several challenges, including (1) handling the large state space, (2) managing the exploration/exploitation tradeoff, and (3) solving the constrained optimization where the objective and the constraint are both nonlinear functions of the visitation measure.
no code implementations • 10 Feb 2024 • Han Shen, Zhuoran Yang, Tianyi Chen
But bilevel problems such as incentive design, inverse reinforcement learning (RL), and RL from human feedback (RLHF) are often modeled as dynamic objective functions that go beyond the simple static objective structures, which pose significant challenges of using existing bilevel solutions.
no code implementations • 2 Dec 2023 • Juno Kim, Kakei Yamamoto, Kazusato Oko, Zhuoran Yang, Taiji Suzuki
In this paper, we extend mean-field Langevin dynamics to minimax optimization over probability distributions for the first time with symmetric and provably convergent updates.
no code implementations • 28 Nov 2023 • YiXuan Wang, Ruochen Jiao, Sinong Simon Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu
Autonomous Driving (AD) encounters significant safety hurdles in long-tail unforeseen driving scenarios, largely stemming from the non-interpretability and poor generalization of the deep neural networks within the AD system, particularly in out-of-distribution and uncertain data.
no code implementations • 22 Nov 2023 • Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye
For these settings, we design a provably sample-efficient algorithm which achieves a $ \mathcal{\tilde O}(s_0^2 \log^2 T)$ regret in the sparse case and $ \mathcal{\tilde O} ( r ^2 \log^2 T)$ regret in the low-rank case, using only $L = \mathcal{O}( \log T)$ batches.
no code implementations • 26 Oct 2023 • Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang
Second, using kernel embedding of distributions, we design efficient algorithms to estimate the transition kernels, reward functions, and graphons from sampled agents.
no code implementations • 10 Oct 2023 • Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
no code implementations • 26 Jul 2023 • Siyu Chen, Mengdi Wang, Zhuoran Yang
The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data.
no code implementations • 8 Jul 2023 • Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun
We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy.
no code implementations • 26 Jun 2023 • Nuoya Xiong, Zhaoran Wang, Zhuoran Yang
We take the first step in studying general sequential decision-making under two adaptivity constraints: rare policy switch and batch learning.
no code implementations • 21 Jun 2023 • Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang
In this paper, we study representation learning in partially observable Markov Decision Processes (POMDPs), where the agent learns a decoder function that maps a series of high-dimensional raw observations to a compact representation and uses it for more efficient exploration and planning.
no code implementations • 31 May 2023 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We examine online safe multi-agent reinforcement learning using constrained Markov games in which agents compete by maximizing their expected total rewards under a constraint on expected total utilities.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • 30 May 2023 • Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, Zhaoran Wang
(b) What is a proper performance metric for ICL and what is the error rate?
1 code implementation • NeurIPS 2023 • Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
To achieve this, existing sample-efficient online RL algorithms typically consist of three components: estimation, planning, and exploration.
no code implementations • 29 May 2023 • Zihao Li, Zhuoran Yang, Mengdi Wang
In this paper, we study offline Reinforcement Learning with Human Feedback (RLHF) where we aim to learn the human's underlying reward and the MDP's optimal policy from a set of trajectories induced by human choices.
1 code implementation • NeurIPS 2023 • Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li
Specifically, we propose Multi-Task Diffusion Model (\textsc{MTDiff}), a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis in multi-task offline settings.
1 code implementation • 8 May 2023 • Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee
Motivated by the observation, we present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
4 code implementations • 28 Mar 2023 • Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Victor Wai Kin Chan, Xianyuan Zhan
This gives a deeper understanding of why the in-sample learning paradigm works, i. e., it applies implicit value regularization to the policy.
no code implementations • 20 Mar 2023 • Siyu Chen, Yitan Wang, Zhaoran Wang, Zhuoran Yang
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data.
no code implementations • 15 Mar 2023 • Siyu Chen, Jibang Wu, Yifan Wu, Zhuoran Yang
Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information.
no code implementations • 3 Mar 2023 • Zhuoqing Song, Jason D. Lee, Zhuoran Yang
Second, when both players adopt the algorithm, their joint policy converges to a Nash equilibrium of the game.
no code implementations • 24 Feb 2023 • Ruitu Xu, Yifei Min, Tianhao Wang, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang
We study a heterogeneous agent macroeconomic model with an infinite number of households and firms competing in a labor market.
no code implementations • 29 Dec 2022 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup
Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.
no code implementations • 23 Dec 2022 • Zuyue Fu, Zhengling Qi, Zhuoran Yang, Zhaoran Wang, Lan Wang
To tackle the distributional mismatch, we leverage the idea of pessimism and use our OPE method to develop an off-policy learning algorithm for finding a desirable policy pair for both Alice and Bob.
no code implementations • 19 Dec 2022 • Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang
As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast.
no code implementations • 10 Nov 2022 • Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan
This result shows that exponential-in-$m$ samples are sufficient and necessary to learn a near-optimal contract, resolving an open problem on the hardness of online contract design.
no code implementations • 3 Nov 2022 • Han Zhong, Wei Xiong, Sirui Zheng, LiWei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang
The proposed algorithm modifies the standard posterior sampling algorithm in two aspects: (i) we use an optimistic prior distribution that biases towards hypotheses with higher values and (ii) a loglikelihood function is set to be the empirical loss evaluated on the historical data, where the choice of loss function supports both model-free and model-based learning.
no code implementations • 19 Oct 2022 • Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan
First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders who aim to manipulates seller's policy.
no code implementations • 29 Sep 2022 • YiXuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu
It is quite challenging to ensure the safety of reinforcement learning (RL) agents in an unknown and stochastic environment under hard constraints that require the system state not to reach certain specified unsafe regions.
no code implementations • 20 Sep 2022 • Fengzhuo Zhang, Boyi Liu, Kaixin Wang, Vincent Y. F. Tan, Zhuoran Yang, Zhaoran Wang
The cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
no code implementations • 18 Sep 2022 • Zuyue Fu, Zhengling Qi, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok
Due to the lack of online interaction with the environment, offline RL is facing the following two significant challenges: (i) the agent may be confounded by the unobserved state variables; (ii) the offline data collected a prior does not provide sufficient coverage for the environment.
no code implementations • 23 Aug 2022 • Mengxin Yu, Zhuoran Yang, Jianqing Fan
We study offline reinforcement learning under a novel model called strategic MDP, which characterizes the strategic interactions between a principal and a sequence of myopic agents with private types.
1 code implementation • 29 Jul 2022 • Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang
Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
no code implementations • 25 Jul 2022 • Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment.
no code implementations • 3 Jun 2022 • Wenhao Zhan, Jason D. Lee, Zhuoran Yang
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
no code implementations • 26 May 2022 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang
For a class of POMDPs with a low-rank structure in the transition kernel, ETC attains an $O(1/\epsilon^2)$ sample complexity that scales polynomially with the horizon and the intrinsic dimension (that is, the rank).
no code implementations • 26 May 2022 • Miao Lu, Yifei Min, Zhaoran Wang, Zhuoran Yang
We study offline reinforcement learning (RL) in partially observable Markov decision processes.
no code implementations • 23 May 2022 • Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, LiWei Wang
To the best of our knowledge, this is the first theoretical result for PbRL with (general) function approximation.
no code implementations • 5 May 2022 • Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang
In the setting where the function approximation is employed to handle large state spaces, with only mild assumptions on the expressiveness of the function class, we are able to design a dynamic mechanism using offline reinforcement learning algorithms.
no code implementations • 20 Apr 2022 • Qi Cai, Zhuoran Yang, Zhaoran Wang
The sample efficiency of OP-TENET is enabled by a sequence of ingredients: (i) a Bellman operator with finite memory, which represents the value function in a recursive manner, (ii) the identification and estimation of such an operator via an adversarial integral equation, which features a smoothed discriminator tailored to the linear structure, and (iii) the exploration of the observation and state spaces via optimism, which is based on quantifying the uncertainty in the adversarial integral equation.
no code implementations • 7 Mar 2022 • Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang
We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.
no code implementations • 3 Mar 2022 • Grigoris Velegkas, Zhuoran Yang, Amin Karbasi
In this paper, we study the problem of regret minimization for episodic Reinforcement Learning (RL) both in the model-free and the model-based setting.
no code implementations • 25 Feb 2022 • Shuang Qiu, Boxiang Lyu, Qinglin Meng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan
Dynamic mechanism design studies how mechanism designers should allocate resources among agents in a time-varying environment.
1 code implementation • ICLR 2022 • Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang
We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL.
no code implementations • 22 Feb 2022 • Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximizes the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions.
no code implementations • 15 Feb 2022 • Han Zhong, Wei Xiong, Jiyuan Tan, LiWei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang
When the dataset does not have uniform coverage over all policy pairs, finding an approximate NE involves challenges in three aspects: (i) distributional shift between the behavior policy and the optimal policy, (ii) function approximation to handle large state space, and (iii) minimax optimization for equilibrium solving.
no code implementations • 28 Jan 2022 • YiXuan Wang, Simon Zhan, Zhilu Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu
In model-based reinforcement learning for safety-critical control systems, it is important to formally certify system properties (e. g., safety, stability) under the learned controller.
1 code implementation • 28 Dec 2021 • Gene Li, Junbo Li, Anmol Kabra, Nathan Srebro, Zhaoran Wang, Zhuoran Yang
We propose an optimistic model-based algorithm, dubbed SMRL, for finite-horizon episodic reinforcement learning (RL) when the transition model is specified by exponential family distributions with $d$ parameters and the reward is bounded and known.
no code implementations • 27 Dec 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
no code implementations • NeurIPS 2021 • Yufeng Zhang, Siyu Chen, Zhuoran Yang, Michael I. Jordan, Zhaoran Wang
Specifically, we consider a version of AC where the actor and critic are represented by overparameterized two-layer neural networks and are updated with two-timescale learning rates.
1 code implementation • 11 Dec 2021 • Xiao-Yang Liu, Zechu Li, Zhuoran Yang, Jiahao Zheng, Zhaoran Wang, Anwar Walid, Jian Guo, Michael I. Jordan
In this paper, we present a scalable and elastic library ElegantRL-podracer for cloud-native deep reinforcement learning, which efficiently supports millions of GPU cores to carry out massively parallel training at multiple levels.
1 code implementation • NeurIPS 2021 • Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao
Theoretically, under a weak coverage assumption that the experience dataset contains enough information about the optimal policy, we prove that for an episodic mean-field MDP with a horizon $H$ and $N$ training trajectories, SAFARI attains a sub-optimality gap of $\mathcal{O}(H^2d_{\rm eff} /\sqrt{N})$, where $d_{\rm eff}$ is the effective dimension of the function class for parameterizing the value function, but independent on the number of agents.
no code implementations • NeurIPS 2021 • Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang
In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint.
Multi-Objective Reinforcement Learning
reinforcement-learning
no code implementations • NeurIPS 2021 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang
Despite the tremendous success of reinforcement learning (RL) with function approximation, efficient exploration remains a significant challenge, both practically and theoretically.
no code implementations • NeurIPS 2021 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang
The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism.
1 code implementation • 24 Oct 2021 • Zhihong Deng, Zuyue Fu, Lingxiao Wang, Zhuoran Yang, Chenjia Bai, Tianyi Zhou, Zhaoran Wang, Jing Jiang
Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems.
no code implementations • 19 Oct 2021 • Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang
Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase.
no code implementations • 18 Oct 2021 • Han Zhong, Zhongren Chen, Zhuoran Yang, Zhaoran Wang, Csaba Szepesvári
We study episodic reinforcement learning (RL) in non-stationary linear kernel Markov decision processes (MDPs).
no code implementations • 4 Oct 2021 • Boyi Liu, Jiayang Li, Zhuoran Yang, Hoi-To Wai, Mingyi Hong, Yu Marco Nie, Zhaoran Wang
To regulate a social system comprised of self-interested agents, economic incentives are often required to induce a desirable outcome.
no code implementations • 29 Sep 2021 • Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael Jordan
To our best knowledge, we establish the first provably efficient RL algorithms for solving SNE in general-sum Markov games with leader-controlled state transitions.
no code implementations • ICLR 2022 • Zhi Zhang, Zhuoran Yang, Han Liu, Pratap Tokekar, Furong Huang
This paper proposes a new algorithm for learning the optimal policies under a novel multi-agent predictive state representation reinforcement learning model.
no code implementations • 19 Aug 2021 • Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set.
no code implementations • 8 Aug 2021 • Pratik Ramprasad, Yuantong Li, Zhuoran Yang, Zhaoran Wang, Will Wei Sun, Guang Cheng
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms.
no code implementations • ICLR 2022 • Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang
In the {coordinated} setting where both players are controlled by the agent, we propose a model-based algorithm and a model-free algorithm.
no code implementations • 6 Jul 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang
We further show that unlike GTD, the learned GVFs by GenTD are guaranteed to converge to the ground truth GVFs as long as the function approximation power is sufficiently large.
no code implementations • 1 Jul 2021 • Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du
As one of the most popular methods in the field of reinforcement learning, Q-learning has received increasing attention.
1 code implementation • 15 Jun 2021 • Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin F. Yang
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle.
no code implementations • 23 Feb 2021 • Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang
We also show that the overall convergence of DR-Off-PAC is doubly robust to the approximation errors that depend only on the expressive power of approximation functions.
no code implementations • 19 Feb 2021 • Luofeng Liao, Zuyue Fu, Zhuoran Yang, Yixin Wang, Mladen Kolar, Zhaoran Wang
When a valid instrument is present, we can recover the confounded transition dynamics through observational data.
no code implementations • NeurIPS 2021 • Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang
We focus on bilevel problems where the lower level subproblem is strongly-convex and the upper level objective function is smooth.
no code implementations • 1 Jan 2021 • Boyi Liu, Zhuoran Yang, Zhaoran Wang
Specifically, in each iteration, each player infers the policy of the opponent implicitly via policy evaluation and improves its current policy by taking the smoothed best-response via a proximal policy optimization (PPO) step.
no code implementations • 1 Jan 2021 • Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Zhaoran Wang, Animesh Garg, Lihong Li, Doina Precup
Learning policies from fixed offline datasets is a key challenge to scale up reinforcement learning (RL) algorithms towards practical applications.
no code implementations • 1 Jan 2021 • Qi Cai, Zhuoran Yang, Csaba Szepesvari, Zhaoran Wang
Although policy optimization with neural networks has a track record of achieving state-of-the-art results in reinforcement learning on various domains, the theoretical understanding of the computational and sample efficiency of policy optimization remains restricted to linear function approximations with finite-dimensional feature representations, which hinders the design of principled, effective, and efficient algorithms.
no code implementations • 1 Jan 2021 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang
We study risk-sensitive reinforcement learning with the entropic risk measure and function approximation.
no code implementations • 30 Dec 2020 • Ying Jin, Zhuoran Yang, Zhaoran Wang
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on a dataset collected a priori.
no code implementations • 28 Dec 2020 • Han Zhong, Xun Deng, Ethan X. Fang, Zhuoran Yang, Zhaoran Wang, Runze Li
In particular, we focus on a variance-constrained policy optimization problem where the goal is to find a policy that maximizes the expected value of the long-run average reward, subject to a constraint that the long-run variance of the average reward is upper bounded by a threshold.
no code implementations • 21 Dec 2020 • Zhuoran Yang, Yufeng Zhang, Yongxin Chen, Zhaoran Wang
Specifically, we prove that moving along the geodesic in the direction of functional gradient with respect to the second-order Wasserstein distance is equivalent to applying a pushforward mapping to a probability distribution, which can be approximated accurately by pushing a set of particles.
no code implementations • NeurIPS 2020 • Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong
This paper studies a gradient temporal difference (GTD) algorithm using neural network (NN) function approximators to minimize the mean squared Bellman error (MSBE).
no code implementations • NeurIPS 2020 • Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Mladen Kolar, Zhaoran Wang
We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.
no code implementations • NeurIPS 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang
Temporal-difference and Q-learning play a key role in deep reinforcement learning, where they are empowered by expressive nonlinear function approximators such as neural networks.
no code implementations • NeurIPS 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael Jordan
Reinforcement learning (RL) algorithms combined with modern function approximators such as kernel functions and deep neural networks have achieved significant empirical successes in large-scale application problems with a massive number of states.
no code implementations • 9 Nov 2020 • Zhuoran Yang, Chi Jin, Zhaoran Wang, Mengdi Wang, Michael I. Jordan
The classical theory of reinforcement learning (RL) has focused on tabular and linear representations of value functions.
no code implementations • 8 Oct 2020 • Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca
We propose a reinforcement learning algorithm for stationary mean-field games, where the goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
no code implementations • 23 Aug 2020 • Shuang Qiu, Zhuoran Yang, Xiaohan Wei, Jieping Ye, Zhaoran Wang
Existing approaches for this problem are based on two-timescale or double-loop stochastic gradient algorithms, which may also require sampling large-batch data.
no code implementations • 16 Aug 2020 • Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang
Reinforcement learning is a powerful tool to learn the optimal policy of possibly multiple agents by interacting with the environment.
no code implementations • ICLR 2021 • Zuyue Fu, Zhuoran Yang, Zhaoran Wang
To the best of our knowledge, we establish the rate of convergence and global optimality of single-timescale actor-critic with linear function approximation for the first time.
no code implementations • 16 Jul 2020 • Jianqing Fan, Zhuoran Yang, Mengxin Yu
For both the vector and matrix settings, we construct an over-parameterized least-squares loss function by employing the score function transform and a robust truncation step designed specifically for heavy-tailed data.
no code implementations • 10 Jul 2020 • Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang
Bilevel optimization is a class of problems which exhibit a two-level structure, and its goal is to minimize an outer objective function with variables which are constrained to be the optimal solution to an (inner) optimization problem.
no code implementations • 2 Jul 2020 • Luofeng Liao, You-Lin Chen, Zhuoran Yang, Bo Dai, Zhaoran Wang, Mladen Kolar
We study estimation in a class of generalized SEMs where the object of interest is defined as the solution to a linear operator equation.
no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Zhaoran Wang, Qiaomin Xie
We consider reinforcement learning (RL) in episodic MDPs with adversarial full-information reward feedback and unknown fixed transition kernels.
no code implementations • ICML 2020 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang
Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior, while the outer level searches for the optimal shared prior by optimizing its aggregated performance over all the subtasks.
no code implementations • NeurIPS 2020 • Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang, Qiaomin Xie
We study risk-sensitive reinforcement learning in episodic Markov decision processes with unknown transition kernels, where the goal is to optimize the total reward under the risk measure of exponential utility.
no code implementations • NeurIPS 2021 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes.
no code implementations • 21 Jun 2020 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
no code implementations • 15 Jun 2020 • Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou
This paper develops an approach to learn a policy of a dynamical system that is guaranteed to be both provably safe and goal-reaching.
no code implementations • 8 Jun 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Yongxin Chen, Zhaoran Wang
We aim to answer the following questions: When the function approximator is a neural network, how does the associated feature representation evolve?
no code implementations • 8 Mar 2020 • Yufeng Zhang, Qi Cai, Zhuoran Yang, Zhaoran Wang
Generative adversarial imitation learning (GAIL) demonstrates tremendous success in practice, especially when combined with neural networks.
no code implementations • NeurIPS 2020 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang, Jieping Ye, Zhaoran Wang
In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode.
no code implementations • ICML 2020 • Sen Na, Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar
We consider the bipartite graph and formalize its representation learning problem as a statistical estimation problem of parameters in a semiparametric exponential family distribution.
no code implementations • 1 Mar 2020 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
To this end, we present an \underline{O}ptimistic \underline{P}rimal-\underline{D}ual Proximal Policy \underline{OP}timization (OPDOP) algorithm where the value function is estimated by combining the least-squares policy evaluation and an additional bonus term for safe exploration.
no code implementations • 17 Feb 2020 • Qiaomin Xie, Yudong Chen, Zhaoran Wang, Zhuoran Yang
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
no code implementations • ICLR 2020 • Minshuo Chen, Yizhou Wang, Tianyi Liu, Zhuoran Yang, Xingguo Li, Zhaoran Wang, Tuo Zhao
Generative Adversarial Imitation Learning (GAIL) is a powerful and practical approach for learning sequential decision-making policies.
1 code implementation • NeurIPS 2020 • Wanxin Jin, Zhaoran Wang, Zhuoran Yang, Shaoshuai Mou
This paper develops a Pontryagin Differentiable Programming (PDP) methodology, which establishes a unified framework to solve a broad class of learning and control tasks.
no code implementations • 14 Dec 2019 • Yuwei Luo, Zhuoran Yang, Zhaoran Wang, Mladen Kolar
Multi-agent reinforcement learning has been successfully applied to a number of challenging problems.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • ICML 2020 • Qi Cai, Zhuoran Yang, Chi Jin, Zhaoran Wang
While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL.
no code implementations • 9 Dec 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar
Multi-agent reinforcement learning (MARL) has long been a significant and everlasting research topic in both machine learning and control.
no code implementations • NeurIPS 2019 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang
Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning.
no code implementations • NeurIPS 2019 • Lingxiao Wang, Zhuoran Yang, Zhaoran Wang
Using the statistical query model to characterize the computational cost of an algorithm, we show that when $\cov(Y, X^\top\beta^*)=0$ and $\cov(Y,(X^\top\beta^*)^2)>0$, no computationally tractable algorithms can achieve the information-theoretic limit of the minimax risk.
no code implementations • NeurIPS 2019 • Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind.
no code implementations • NeurIPS 2019 • Hoi-To Wai, Mingyi Hong, Zhuoran Yang, Zhaoran Wang, Kexin Tang
Policy evaluation with smooth and nonlinear function approximation has shown great potential for reinforcement learning.
no code implementations • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.
no code implementations • 24 Nov 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar
Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc.
1 code implementation • NeurIPS 2019 • Ming Yu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang
We study the safe reinforcement learning problem with nonlinear function approximation, where policy optimization is formulated as a constrained optimization problem with both the objective and the constraint being nonconvex functions.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
no code implementations • ICLR 2020 • Zuyue Fu, Zhuoran Yang, Yongxin Chen, Zhaoran Wang
We study discrete-time mean-field Markov games with infinite numbers of agents where each agent aims to minimize its ergodic cost.
1 code implementation • 8 Oct 2019 • Jiaheng Wei, Zuyue Fu, Yang Liu, Xingyu Li, Zhuoran Yang, Zhaoran Wang
We also show a connection between this sample elicitation problem and $f$-GAN, and how this connection can help reconstruct an estimator of the distribution based on collected samples.
no code implementations • 25 Sep 2019 • Yang Liu, Zuyue Fu, Zhuoran Yang, Zhaoran Wang
While classical elicitation results apply to eliciting a complex and generative (and continuous) distribution $p(x)$ for this image data, we are interested in eliciting samples $x_i \sim p(x)$ from agents.
no code implementations • NeurIPS Workshop Deep_Invers 2019 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang
In this paper, we consider a new framework for the one-bit sensing problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$.
no code implementations • ICLR 2020 • Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang
In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate.
no code implementations • ICML 2020 • Shuang Qiu, Xiaohan Wei, Zhuoran Yang
Specifically, we consider a new framework for this problem where the sparsity is implicitly enforced via mapping a low dimensional representation $x_0 \in \mathbb{R}^k$ through a known $n$-layer ReLU generative network $G:\mathbb{R}^k\rightarrow\mathbb{R}^d$ such that $\theta_0 = G(x_0)$.
no code implementations • 7 Aug 2019 • Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanović
We study the policy evaluation problem in multi-agent reinforcement learning where a group of agents, with jointly observed states and private local actions and rewards, collaborate to learn the value function of a given policy via local computation and communication over a connected undirected network.
Multi-agent Reinforcement Learning
Reinforcement Learning
+1
no code implementations • 14 Jul 2019 • Zhuoran Yang, Yongxin Chen, Mingyi Hong, Zhaoran Wang
Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind.
no code implementations • NeurIPS 2016 • Xinyang Yi, Zhaoran Wang, Zhuoran Yang, Constantine Caramanis, Han Liu
We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1- {\alpha}$.
no code implementations • 13 Jul 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Ji Liu
In this paper, we present a probability one convergence proof, under suitable conditions, of a certain class of actor-critic algorithms for finding approximate solutions to entropy-regularized MDPs using the machinery of stochastic approximation.
2 code implementations • 11 Jul 2019 • Chi Jin, Zhuoran Yang, Zhaoran Wang, Michael. I. Jordan
Modern Reinforcement Learning (RL) is commonly applied to practical problems with an enormous number of states, where function approximation must be deployed to approximate either the value function or the policy.
no code implementations • 6 Jul 2019 • Yixuan Lin, Kaiqing Zhang, Zhuoran Yang, Zhaoran Wang, Tamer Başar, Romeil Sandhu, Ji Liu
This paper considers a distributed reinforcement learning problem in which a network of multiple agents aim to cooperatively maximize the globally averaged return through communication with only local neighbors.
no code implementations • 25 Jun 2019 • Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang
Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning.
no code implementations • NeurIPS 2019 • Kaiqing Zhang, Zhuoran Yang, Tamer Başar
To the best of our knowledge, this work appears to be the first one to investigate the optimization landscape of LQ games, and provably show the convergence of policy optimization methods to the Nash equilibria.
1 code implementation • NeurIPS 2019 • Qi Cai, Zhuoran Yang, Jason D. Lee, Zhaoran Wang
Temporal-difference learning (TD), coupled with neural networks, is among the most fundamental building blocks of deep reinforcement learning.
1 code implementation • 15 Mar 2019 • Wesley Suttle, Zhuoran Yang, Kaiqing Zhang, Zhaoran Wang, Tamer Basar, Ji Liu
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy.
no code implementations • 1 Jan 2019 • Jianqing Fan, Zhaoran Wang, Yuchen Xie, Zhuoran Yang
Despite the great empirical success of deep reinforcement learning, its theoretical foundation is less well understood.
no code implementations • 6 Dec 2018 • Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar
This work appears to be the first finite-sample analysis for batch MARL, a step towards rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
no code implementations • NeurIPS 2018 • Yi Chen, Zhuoran Yang, Yuchen Xie, Princeton Zhaoran Wang
In this paper, we study a semiparametric model where the pairwise measurements follow a natural exponential family distribution with an unknown base measure.
no code implementations • NeurIPS 2018 • Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar, Zhaoran Wang
In this paper, we study the Gaussian embedding model and develop the first theoretical results for exponential family embedding models.
1 code implementation • 16 Oct 2018 • Sen Na, Zhuoran Yang, Zhaoran Wang, Mladen Kolar
We study the parameter estimation problem for a varying index coefficient model in high dimensions.
5 code implementations • 10 Oct 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Lei Han, Yang Zheng, Haobo Fu, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider either discrete action space or continuous action space solely.
no code implementations • 27 Sep 2018 • Zhuoran Yang, Zuyue Fu, Kaiqing Zhang, Zhaoran Wang
We study reinforcement learning algorithms with nonlinear function approximation in the online setting.
no code implementations • 21 Aug 2018 • Jianqing Fan, Han Liu, Zhaoran Wang, Zhuoran Yang
We study the fundamental tradeoffs between statistical accuracy and computational tractability in the analysis of high dimensional heterogeneous data.
no code implementations • 17 Jul 2018 • Krishnakumar Balasubramanian, Jianqing Fan, Zhuoran Yang
Motivated by the sampling problems and heterogeneity issues common in high- dimensional big datasets, we consider a class of discordant additive index models.
no code implementations • ICML 2018 • Hao Lu, Yuan Cao, Zhuoran Yang, Junwei Lu, Han Liu, Zhaoran Wang
We study the hypothesis testing problem of inferring the existence of combinatorial structures in undirected graphical models.
no code implementations • NeurIPS 2018 • Hoi-To Wai, Zhuoran Yang, Zhaoran Wang, Mingyi Hong
Despite the success of single-agent reinforcement learning, multi-agent reinforcement learning (MARL) remains challenging due to complex interactions between agents.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
5 code implementations • ICML 2018 • Kaiqing Zhang, Zhuoran Yang, Han Liu, Tong Zhang, Tamer Başar
To this end, we propose two decentralized actor-critic algorithms with function approximation, which are applicable to large-scale MARL problems where both the number of states and the number of agents are massively large.
Multi-agent Reinforcement Learning
reinforcement-learning
+2
1 code implementation • ICLR 2018 • Jiechao Xiong, Qing Wang, Zhuoran Yang, Peng Sun, Yang Zheng, Lei Han, Haobo Fu, Xiangru Lian, Carson Eisenach, Haichuan Yang, Emmanuel Ekwedike, Bei Peng, Haoyue Gao, Tong Zhang, Ji Liu, Han Liu
Most existing deep reinforcement learning (DRL) frameworks consider action spaces that are either discrete or continuous space.
no code implementations • 18 Dec 2017 • Zhuoran Yang, Lin F. Yang, Ethan X. Fang, Tuo Zhao, Zhaoran Wang, Matey Neykov
Existing nonconvex statistical optimization theory and methods crucially rely on the correct specification of the underlying "true" statistical models.
no code implementations • NeurIPS 2017 • Zhuoran Yang, Krishnakumar Balasubramanian, Princeton Zhaoran Wang, Han Liu
We consider estimating the parametric components of semiparametric multi-index models in high dimensions.
no code implementations • 26 Sep 2017 • Zhuoran Yang, Krishnakumar Balasubramanian, Han Liu
We consider estimating the parametric components of semi-parametric multiple index models in a high-dimensional and non-Gaussian setting.
no code implementations • ICML 2017 • Zhuoran Yang, Krishnakumar Balasubramanian, Han Liu
We consider estimating the parametric component of single index models in high dimensions.
no code implementations • NeurIPS 2015 • Kwang-Sung Jun, Jerry Zhu, Timothy T. Rogers, Zhuoran Yang, Ming Yuan
In this paper, we propose the first efficient maximum likelihood estimate (MLE) for INVITE by decomposing the censored output into a series of absorbing random walks.
no code implementations • 14 Nov 2015 • Zhuoran Yang, Zhaoran Wang, Han Liu, Yonina C. Eldar, Tong Zhang
To recover $\beta^*$, we propose an $\ell_1$-regularized least-squares estimator.
no code implementations • 30 Dec 2014 • Zhuoran Yang, Yang Ning, Han Liu
We propose a new class of semiparametric exponential family graphical models for the analysis of high dimensional mixed data.