no code implementations • 26 Oct 2022 • Pengyi Li, Hongyao Tang, Jianye Hao, Yan Zheng, Xian Fu, Zhaopeng Meng
The state representation conveys expressive common features of the environment learned by all the agents collectively; the linear policy representation provides a favorable space for efficient policy optimization, where novel behavior-level crossover and mutation operations can be performed.
no code implementations • 6 Apr 2022 • Tong Sang, Hongyao Tang, Yi Ma, Jianye Hao, Yan Zheng, Zhaopeng Meng, Boyan Li, Zhen Wang
In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF.
no code implementations • NeurIPS 2021 • Yi Ma, Xiaotian Hao, Jianye Hao, Jiawen Lu, Xing Liu, Tong Xialiang, Mingxuan Yuan, Zhigang Li, Jie Tang, Zhaopeng Meng
To address this problem, existing methods partition the overall DPDP into fixed-size sub-problems by caching online generated orders and solve each sub-problem, or on this basis to utilize the predicted future orders to optimize each sub-problem further.
no code implementations • 19 Nov 2021 • Tong Sang, Hongyao Tang, Jianye Hao, Yan Zheng, Zhaopeng Meng
Such a reconstruction exploits the underlying structure of value matrix to improve the value approximation, thus leading to a more efficient learning process of value function.
no code implementations • 14 Sep 2021 • Jianye Hao, Tianpei Yang, Hongyao Tang, Chenjia Bai, Jinyi Liu, Zhaopeng Meng, Peng Liu, Zhen Wang
In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks.
1 code implementation • ICLR 2022 • Boyan Li, Hongyao Tang, Yan Zheng, Jianye Hao, Pengyi Li, Zhen Wang, Zhaopeng Meng, Li Wang
Discrete-continuous hybrid action space is a natural setting in many practical problems, such as robot control and game AI.
1 code implementation • 20 Apr 2021 • Qiangguo Jin, Hui Cui, Changming Sun, Zhaopeng Meng, Ran Su
The network is composed of a new richer convolutional feature enhanced dilated-gated generator (RicherDG) and a hybrid loss function.
1 code implementation • 20 Apr 2021 • Qiangguo Jin, Hui Cui, Changming Sun, Zhaopeng Meng, Leyi Wei, Ran Su
DASC-Net consists of a novel attention and feature domain enhanced domain adaptation model (AFD-DA) to solve the domain shifts and a self-correction learning process to refine segmentation results.
1 code implementation • 3 Mar 2021 • Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Zhaopeng Meng
Value function is the central notion of Reinforcement Learning (RL).
no code implementations • 3 Mar 2021 • Chen Chen, Hongyao Tang, Jianye Hao, Wulong Liu, Zhaopeng Meng
We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy which ensures monotonically non-decreasing updates under some mild conditions.
no code implementations • NeurIPS 2021 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Changmin Yu, Hangyu Mao, Wulong Liu, Yaodong Yang, Wenyuan Tao, Li Wang
We study Policy-extended Value Function Approximator (PeVFA) in Reinforcement Learning (RL), which extends conventional value function approximator (VFA) to take as input not only the state (and action) but also an explicit policy representation.
no code implementations • 28 Sep 2020 • Hongyao Tang, Zhaopeng Meng, Jianye Hao, Chen Chen, Daniel Graves, Dong Li, Wulong Liu, Yaodong Yang
The value function lies in the heart of Reinforcement Learning (RL), which defines the long-term evaluation of a policy in a given state.
no code implementations • 28 Sep 2020 • Tianpei Yang, Jianye Hao, Weixun Wang, Hongyao Tang, Zhaopeng Meng, Hangyu Mao, Dong Li, Wulong Liu, Yujing Hu, Yingfeng Chen, Changjie Fan
In many cases, each agent's experience is inconsistent with each other which causes the option-value estimation to oscillate and to become inaccurate.
no code implementations • 14 May 2020 • Jianwen Sun, Yan Zheng, Jianye Hao, Zhaopeng Meng, Yang Liu
With the increasing popularity of electric vehicles, distributed energy generation and storage facilities in smart grid systems, an efficient Demand-Side Management (DSM) is urgent for energy savings and peak loads reduction.
no code implementations • 19 Feb 2020 • Tianpei Yang, Jianye Hao, Zhaopeng Meng, Zongzhang Zhang, Yujing Hu, Yingfeng Cheng, Changjie Fan, Weixun Wang, Wulong Liu, Zhaodong Wang, Jiajie Peng
Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks.
no code implementations • 27 May 2019 • Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Zhaopeng Meng, Yaodong Yang, Li Wang
Value functions are crucial for model-free Reinforcement Learning (RL) to obtain a policy implicitly or guide the policy updates.
no code implementations • NeurIPS 2018 • Yan Zheng, Zhaopeng Meng, Jianye Hao, Zongzhang Zhang, Tianpei Yang, Changjie Fan
In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.
1 code implementation • 4 Nov 2018 • Qiangguo Jin, Zhaopeng Meng, Changming Sun, Leyi Wei, Ran Su
Automatic extraction of liver and tumor from CT volumes is a challenging task due to their heterogeneous and diffusive shapes.
no code implementations • 3 Nov 2018 • Qiangguo Jin, Zhaopeng Meng, Tuan D. Pham, Qi Chen, Leyi Wei, Ran Su
Results show that more detailed vessels are extracted by DUNet and it exhibits state-of-the-art performance for retinal vessel segmentation with a global accuracy of 0. 9697/0. 9722/0. 9724 and AUC of 0. 9856/0. 9868/0. 9863 on DRIVE, STARE and CHASE_DB1 respectively.
Ranked #5 on Retinal Vessel Segmentation on STARE
no code implementations • 25 Sep 2018 • Hongyao Tang, Jianye Hao, Tangjie Lv, Yingfeng Chen, Zongzhang Zhang, Hangtian Jia, Chunxu Ren, Yan Zheng, Zhaopeng Meng, Changjie Fan, Li Wang
Besides, we propose a new experience replay mechanism to alleviate the issue of the sparse transitions at the high level of abstraction and the non-stationarity of multiagent learning.
no code implementations • 12 Sep 2018 • Tianpei Yang, Zhaopeng Meng, Jianye Hao, Chongjie Zhang, Yan Zheng, Ze Zheng
This paper proposes a novel approach called Bayes-ToMoP which can efficiently detect the strategy of opponents using either stationary or higher-level reasoning strategies.