1 code implementation • 14 Nov 2022 • Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal
However, on easy exploration tasks, the agent gets distracted by intrinsic rewards and performs unnecessary exploration even when sufficient task (also called extrinsic) reward is available.
no code implementations • 17 Oct 2022 • Kwangjun Ahn, Zakaria Mhammedi, Horia Mania, Zhang-Wei Hong, Ali Jadbabaie
Recent approaches to data-driven MPC have used the simplest form of imitation learning known as behavior cloning to learn controllers that mimic the performance of MPC by online sampling of the trajectories of the closed-loop MPC system.
no code implementations • 28 Apr 2022 • Zhang-Wei Hong, Ge Yang, Pulkit Agrawal
The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function.
no code implementations • ICLR 2022 • Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
State-of-the-art deep Q-learning methods update Q-values using state transition tuples sampled from the experience replay buffer.
no code implementations • 14 Mar 2022 • Haokuan Luo, Albert Yue, Zhang-Wei Hong, Pulkit Agrawal
We present a strong baseline that surpasses the performance of previously published methods on the Habitat Challenge task of navigating to a target object in indoor environments.
no code implementations • ICLR 2022 • Ge Yang, Zhang-Wei Hong, Pulkit Agrawal
We simultaneously learn both components.
no code implementations • 30 May 2021 • Chin-Jui Chang, Yu-Wei Chu, Chao-Hsien Ting, Hao-Kang Liu, Zhang-Wei Hong, Chun-Yi Lee
Deep reinforcement learning (DRL) has been demonstrated to provide promising results in several challenging decision making and control tasks.
1 code implementation • 1 Jan 2021 • Yu Ming Chen, Kuan-Yu Chang, Chien Liu, Tsu-Ching Hsiao, Zhang-Wei Hong, Chun-Yi Lee
Macro actions have been demonstrated to be beneficial for the learning processes of an agent.
no code implementations • 16 Jul 2020 • Po-Han Chiang, Hsuan-Kung Yang, Zhang-Wei Hong, Chun-Yi Lee
Nevertheless, integrating step returns into a single target sacrifices the diversity of the advantages offered by different step return targets.
1 code implementation • 1 Feb 2020 • Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation.
no code implementations • 25 Sep 2019 • Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
Reinforcement Learning (RL) has demonstrated promising results across several sequential decision-making tasks.
no code implementations • 15 Aug 2019 • Zhang-Wei Hong, Joni Pajarinen, Jan Peters
Model-based Reinforcement Learning (MBRL) allows data-efficient learning which is required in real world applications such as robotics.
no code implementations • ICLR 2019 • Hsin-Wei Yu, Po-Yu Wu, Chih-An Tsao, You-An Shen, Shih-Hsuan Lin, Zhang-Wei Hong, Yi-Hsiang Chang, Chun-Yi Lee
In this paper, we propose a modular approach which separates the instruction-to-action mapping procedure into two separate stages.
no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.
no code implementations • ICLR 2019 • Zhang-Wei Hong, Tsu-Jui Fu, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
Our framework consists of a deep reinforcement learning (DRL) agent and an inverse dynamics model contesting with each other.
no code implementations • NeurIPS 2018 • Zhang-Wei Hong, Tzu-Yun Shann, Shih-Yang Su, Yi-Hsiang Chang, Chun-Yi Lee
Efficient exploration remains a challenging research problem in reinforcement learning, especially when an environment contains large state spaces, deceptive local optima, or sparse rewards.
no code implementations • 1 Feb 2018 • Zhang-Wei Hong, Chen Yu-Ming, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Hsuan-Kung Yang, Brian Hsi-Lin Ho, Chih-Chieh Tu, Yueh-Chuan Chang, Tsu-Ching Hsiao, Hsin-Wei Hsiao, Sih-Pin Lai, Chun-Yi Lee
Collecting training data from the physical world is usually time-consuming and even dangerous for fragile robots, and thus, recent advances in robot learning advocate the use of simulators as the training platform.
no code implementations • 21 Dec 2017 • Zhang-Wei Hong, Shih-Yang Su, Tzu-Yun Shann, Yi-Hsiang Chang, Chun-Yi Lee
DPIQN incorporates the learned policy features as a hidden vector into its own deep Q-network (DQN), such that it is able to predict better Q values for the controllable agents than the state-of-the-art deep reinforcement learning models.
no code implementations • 8 Mar 2017 • Yen-Chen Lin, Zhang-Wei Hong, Yuan-Hong Liao, Meng-Li Shih, Ming-Yu Liu, Min Sun
In the strategically-timed attack, the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode.