Neural Ordinary Differential Equation Value Networks for Parametrized Action Spaces

Action spaces equipped with parameter sets are a common occurrence in reinforcement learning applications. Solutions to problems of this class have been developed under different frameworks, such as parametrized action Markov decision processes (PAMDP) or hierarchical reinforcement learning (HRL). These approaches often require extensions or modifications to standard existing algorithms developed on standard MDPs. For this reason they can be unwieldy and, particularly in the case of HRL, computationally inefficient. We propose adopting a different parametrization scheme for state--action value networks based on neural ordinary differential equations (NODEs) as a scalable, plug--and--play approach for parametrized action spaces. NODEs value networks do not require extensive modification to existing algorithms nor the adoption of HRL methods. Our solution can directly be integrated into existing training algorithms and opens up new opportunities in single-agent and multi-agent settings with tight precision constraints on the action parameters such as robotics.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here