Search Results for author: Daoming Lyu

Found 12 papers, 0 papers with code

PRIMA: Planner-Reasoner Inside a Multi-task Reasoning Agent

no code implementations1 Feb 2022 Daoming Lyu, Bo Liu, Jianshu Chen

We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning.

STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence

no code implementations24 Jan 2022 Liangliang Xu, Daoming Lyu, Yangchen Pan, Aiwen Jiang, Bo Liu

This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories.

TDM: Trustworthy Decision-Making via Interpretability Enhancement

no code implementations13 Aug 2021 Daoming Lyu, Fangkai Yang, Hugh Kwon, Wen Dong, Levent Yilmaz, Bo Liu

Human-robot interactive decision-making is increasingly becoming ubiquitous, and trust is an influential factor in determining the reliance on autonomy.

Decision Making

Variance-Reduced Off-Policy Memory-Efficient Policy Search

no code implementations14 Sep 2020 Daoming Lyu, Qi Qi, Mohammad Ghavamzadeh, Hengshuai Yao, Tianbao Yang, Bo Liu

To achieve variance-reduced off-policy-stable policy optimization, we propose an algorithm family that is memory-efficient, stochastically variance-reduced, and capable of learning from off-policy samples.

Reinforcement Learning (RL) Stochastic Optimization

Stable and Efficient Policy Evaluation

no code implementations6 Jun 2020 Daoming Lyu, Bo Liu, Matthieu Geist, Wen Dong, Saad Biaz, Qi. Wang

Policy evaluation algorithms are essential to reinforcement learning due to their ability to predict the performance of a policy.

Reinforcement Learning (RL)

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

no code implementations18 Sep 2019 Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson

Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry.

General Knowledge Reinforcement Learning (RL)

A Joint Planning and Learning Framework for Human-Aided Decision-Making

no code implementations17 Jun 2019 Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson

Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve, especially at the beginning stage.

Decision Making General Knowledge +1

Knowledge-Based Sequential Decision-Making Under Uncertainty

no code implementations16 May 2019 Daoming Lyu

Deep reinforcement learning (DRL) algorithms have achieved great success on sequential decision-making problems, yet is criticized for the lack of data-efficiency and explainability.

Decision Making Decision Making Under Uncertainty +2

SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning

no code implementations31 Oct 2018 Daoming Lyu, Fangkai Yang, Bo Liu, Steven Gustafson

The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input.

Decision Making reinforcement-learning +2

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

no code implementations NeurIPS 2018 Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yin-Lam Chow, Daoming Lyu, Daesub Yoon

Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare.

Autonomous Driving Management

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

no code implementations17 Apr 2017 Bo Liu, Daoming Lyu, Wen Dong, Saad Biaz

Temporal difference learning and Residual Gradient methods are the most widely used temporal difference based learning algorithms; however, it has been shown that none of their objective functions is optimal w. r. t approximating the true value function $V$.

Cannot find the paper you are looking for? You can Submit a new open access paper.