no code implementations • 25 Feb 2023 • Zhifa Ke, Zaiwen Wen, Junyu Zhang
Compared to the popular Temporal Difference (TD) learning, which can be viewed as taking a single gradient descent step to FQI's subproblem per iteration, the Gauss-Newton step of GNTD better retains the structure of FQI and hence leads to better convergence.
no code implementations • 13 Jul 2022 • Fan Chen, Junyu Zhang, Zaiwen Wen
As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature.
no code implementations • 15 Jun 2021 • Amrit Singh Bedi, Anjaly Parayil, Junyu Zhang, Mengdi Wang, Alec Koppel
To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space.
no code implementations • 29 May 2021 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
DSAC augments the classic critic step by requiring agents to (i) estimate their local occupancy measure in order to (ii) estimate the derivative of the local utility with respect to their occupancy measure, i. e., the "shadow reward".
no code implementations • NeurIPS 2021 • Junyu Zhang, Chengzhuo Ni, Zheng Yu, Csaba Szepesvari, Mengdi Wang
By assuming the overparameterizaiton of policy and exploiting the hidden convexity of the problem, we further show that TSIVR-PG converges to global $\epsilon$-optimal policy with $\tilde{\mathcal{O}}(\epsilon^{-2})$ samples.
no code implementations • NeurIPS 2020 • Junyu Zhang, Alec Koppel, Amrit Singh Bedi, Csaba Szepesvari, Mengdi Wang
Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function.
no code implementations • 25 Jun 2020 • Junyu Zhang, Chen Gong, Shangbin Li, Shanchi Wu, Rui Ni, Chengjie Zuo, Jinkang Zhu, Ming Zhao, Zhengyuan Xu
Based on the state transition principles of the three-level system, we propose a statistical model for microwave signal detection.
no code implementations • 25 Jun 2020 • Junyu Zhang, Chen Gong, Shangbin Li, Rui Ni, Chengjie Zuo, Jinkang Zhu, Ming Zhao, Zhengyuan Xu
Future wireless communication system embraces physical-layer signal detection with high sensitivity, especially in the microwave photon level.
no code implementations • 20 Jun 2020 • Mingyi Hong, Siliang Zeng, Junyu Zhang, Haoran Sun
However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$'s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient.
no code implementations • 29 Mar 2020 • Shanchi Wu, Chen Gong, Chengjie Zuo, Shangbin Li, Junyu Zhang, Zhongbin Dai, Kai Yang, Ming Zhao, Rui Ni, Zhengyuan Xu, Jinkang Zhu
We propose a novel radio-frequency (RF) receiving architecture based on micro-electro-mechanical system (MEMS) and optical coherent detection module.
no code implementations • 27 Feb 2020 • Junyu Zhang, Amrit Singh Bedi, Mengdi Wang, Alec Koppel
To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
no code implementations • 29 Aug 2019 • Junyu Zhang, Lin Xiao
We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings.
no code implementations • 31 Jul 2019 • Damek Davis, Dmitriy Drusvyatskiy, Lin Xiao, Junyu Zhang
Standard results in stochastic convex optimization bound the number of samples that an algorithm needs to generate a point with small function value in expectation.
no code implementations • NeurIPS 2019 • Junyu Zhang, Lin Xiao
We show that this method achieves the same orders of complexity as the best known first-order methods for minimizing expected-value and finite-sum nonconvex functions, despite the additional outer composition which renders the composite gradient estimator biased.
no code implementations • 6 Feb 2018 • Jason Causey, Junyu Zhang, Shiqian Ma, Bo Jiang, Jake Qualls, David G. Politte, Fred Prior, Shuzhong Zhang, Xiuzhen Huang
Here we present NoduleX, a systematic approach to predict lung nodule malignancy from CT data, based on deep learning convolutional neural networks (CNN).
no code implementations • 5 Oct 2017 • Junyu Zhang, Shiqian Ma, Shuzhong Zhang
For prohibitively large-size tensor or machine learning models, we present a sampling-based stochastic algorithm with the same iteration complexity bound in expectation.