Search Results for author: Weichao Zhou

Found 8 papers, 3 papers with code

PAGAR: Taming Reward Misalignment in Inverse Reinforcement Learning-Based Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

no code implementations2 Jun 2023 Weichao Zhou, Wenchao Li

Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer the intrinsic reward function that an expert is implicitly optimizing for based on their demonstrated behaviors.

Imitation Learning Zero-Shot Learning

POLAR-Express: Efficient and Precise Formal Reachability Analysis of Neural-Network Controlled Systems

1 code implementation31 Mar 2023 YiXuan Wang, Weichao Zhou, Jiameng Fan, Zhilu Wang, Jiajun Li, Xin Chen, Chao Huang, Wenchao Li, Qi Zhu

We also present a novel approach to propagate TMs more efficiently and precisely across ReLU activation functions.

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

no code implementations20 Apr 2022 Weichao Zhou, Wenchao Li

A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems.

reinforcement-learning Reinforcement Learning (RL)

Programmatic Reward Design by Example

no code implementations14 Dec 2021 Weichao Zhou, Wenchao Li

In this paper, we propose the idea of programmatic reward design, i. e. using programs to specify the reward functions in RL environments.

Reinforcement Learning (RL)

POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems

2 code implementations25 Jun 2021 Chao Huang, Jiameng Fan, Zhilu Wang, YiXuan Wang, Weichao Zhou, Jiajun Li, Xin Chen, Wenchao Li, Qi Zhu

We present POLAR, a polynomial arithmetic-based framework for efficient bounded-time reachability analysis of neural-network controlled systems (NNCSs).

Runtime-Safety-Guided Policy Repair

no code implementations17 Aug 2020 Weichao Zhou, Ruihan Gao, BaekGyu Kim, Eunsuk Kang, Wenchao Li

The key idea behind our approach is the formulation of a trajectory optimization problem that allows the joint reasoning of policy update and safety constraints.

Safety-Aware Apprenticeship Learning

1 code implementation22 Oct 2017 Weichao Zhou, Wenchao Li

Apprenticeship learning (AL) is a kind of Learning from Demonstration techniques where the reward function of a Markov Decision Process (MDP) is unknown to the learning agent and the agent has to derive a good policy by observing an expert's demonstrations.

Cannot find the paper you are looking for? You can Submit a new open access paper.