TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
MuJoCo Games	Ant-v3	ParPI	Average Reward	5142	# 1
MuJoCo Games	HalfCHeetah-v3	ParPI	Average Reward	11738	# 1
MuJoCo Games	Hopper-v3	ParPI	Average Reward	3042	# 1
MuJoCo Games	Humanoid-v3	ParPI	Average Reward	4912	# 1
Offline RL	Walker2d	ParPI	D4RL Normalized Score	151.4	# 1
MuJoCo Games	Walker2d-v3	ParPI	Average Reward	5201	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/mujoco-games-on-ant-v3)](https://paperswithcode.com/sota/mujoco-games-on-ant-v3?p=particle-based-stochastic-policy-optimization)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/mujoco-games-on-halfcheetah-v3)](https://paperswithcode.com/sota/mujoco-games-on-halfcheetah-v3?p=particle-based-stochastic-policy-optimization)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/mujoco-games-on-hopper-v3)](https://paperswithcode.com/sota/mujoco-games-on-hopper-v3?p=particle-based-stochastic-policy-optimization)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/mujoco-games-on-humanoid-v3)](https://paperswithcode.com/sota/mujoco-games-on-humanoid-v3?p=particle-based-stochastic-policy-optimization)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/offline-rl-on-walker2d)](https://paperswithcode.com/sota/offline-rl-on-walker2d?p=particle-based-stochastic-policy-optimization)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/particle-based-stochastic-policy-optimization/mujoco-games-on-walker2d-v3)](https://paperswithcode.com/sota/mujoco-games-on-walker2d-v3?p=particle-based-stochastic-policy-optimization)`

Particle Based Stochastic Policy Optimization

29 Sep 2021 · Qiwei Ye, Yuxuan Song, Chang Liu, Fangyun Wei, Tao Qin, Tie-Yan Liu ·

Stochastic polic have been widely applied for their good property in exploration and uncertainty quantification. Modeling policy distribution by joint state-action distribution within the exponential family has enabled flexibility in exploration and learning multi-modal policies and also involved the probabilistic perspective of deep reinforcement learning (RL). The connection between probabilistic inference and RL makes it possible to leverage the advancements of probabilistic optimization tools. However, recent efforts are limited to the minimization of reverse KLdivergence which is confidence-seeking and may fade the merit of a stochastic policy. To leverage the full potential of stochastic policy and provide more flexible property, there is a strong motivation to consider different update rules during policy optimization. In this paper, we propose a particle-based probabilistic pol-icy optimization framework, ParPI, which enables the usage of a broad family of divergence or distances, such asf-divergences, and the Wasserstein distance which could serve better probabilistic behavior of the learned stochastic policy. Experiments in both online and offline settings demonstrate the effectiveness of the proposed algorithm as well as the characteristics of different discrepancy measures for policy optimization.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

MuJoCo Games

Offline RL

Reinforcement Learning (RL)

Uncertainty Quantification

Datasets

MuJoCo

D4RL

Results from the Paper

Add Remove

Ranked #1 on MuJoCo Games on Ant-v3

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
MuJoCo Games	Ant-v3	ParPI	Average Reward	5142	# 1	Compare
MuJoCo Games	HalfCHeetah-v3	ParPI	Average Reward	11738	# 1	Compare
MuJoCo Games	Hopper-v3	ParPI	Average Reward	3042	# 1	Compare
MuJoCo Games	Humanoid-v3	ParPI	Average Reward	4912	# 1	Compare
Offline RL	Walker2d	ParPI	D4RL Normalized Score	151.4	# 1	Compare
MuJoCo Games	Walker2d-v3	ParPI	Average Reward	5201	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Particle Based Stochastic Policy Optimization

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove