no code implementations • 1 Feb 2022 • Daoming Lyu, Bo Liu, Jianshu Chen
We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning.
no code implementations • 24 Jan 2022 • Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu
This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories.
no code implementations • 13 Aug 2021 • Daoming Lyu, Fangkai Yang, Hugh Kwon, Wen Dong, Levent Yilmaz, Bo Liu
Human-robot interactive decision-making is increasingly becoming ubiquitous, and trust is an influential factor in determining the reliance on autonomy.
no code implementations • 14 Sep 2020 • Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu
To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.
no code implementations • 6 Jun 2020 • Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi. Wang
Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.
no code implementations • 18 Sep 2019 • Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson
Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry.
no code implementations • 17 Jun 2019 • Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson
Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve, especially at the beginning stage.
no code implementations • 16 May 2019 • Daoming Lyu
Deep reinforcement learning (DRL) algorithms have achieved great success on sequential decision-making problems, yet is criticized for the lack of data-efficiency and explainability.
no code implementations • 31 Oct 2018 • Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson
The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input.
no code implementations • NeurIPS 2018 • Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon
Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.
no code implementations • 20 Apr 2018 • Fangkai Yang, Daoming Lyu, Bo Liu, Steven Gustafson
Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents.
no code implementations • 17 Apr 2017 • Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz
Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w. r. t approximating the true value function $V$.