no code implementations • 1 Jun 2023 • Manisha Senadeera, Thommen Karimpanal George, Sunil Gupta, Stephan Jacobs, Santu Rana
This involves learning an "Imagination Network" to transform the other agent's observed state in order to produce a human-interpretable "empathetic state" which, when presented to the learning agent, produces behaviours that mimic the other agent.
no code implementations • 20 Apr 2022 • Hung Le, Thommen Karimpanal George, Majid Abdolshah, Dung Nguyen, Kien Do, Sunil Gupta, Svetha Venkatesh
We introduce a constrained optimization method for policy gradient reinforcement learning, which uses a virtual trust region to regulate each policy update.
no code implementations • NeurIPS 2021 • Hung Le, Thommen Karimpanal George, Majid Abdolshah, Truyen Tran, Svetha Venkatesh
Episodic control enables sample efficiency in reinforcement learning by recalling past experiences from an episodic memory.
no code implementations • 29 Sep 2021 • Thommen Karimpanal George, Majid Abdolshah, Hung Le, Santu Rana, Sunil Gupta, Truyen Tran, Svetha Venkatesh
The objective in goal-based reinforcement learning is to learn a policy to reach a particular goal state within the environment.
no code implementations • 29 Sep 2021 • Majid Abdolshah, Hung Le, Thommen Karimpanal George, Vuong Le, Sunil Gupta, Santu Rana, Svetha Venkatesh
Whilst Generative Adversarial Networks (GANs) generate visually appealing high resolution images, the latent representations (or codes) of these models do not allow controllable changes on the semantic attributes of the generated images.
no code implementations • 20 Aug 2021 • Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh
This is achieved by representing the global transition dynamics as a union of local transition functions, each with respect to one active object in the scene.
no code implementations • 18 Jul 2021 • Majid Abdolshah, Hung Le, Thommen Karimpanal George, Sunil Gupta, Santu Rana, Svetha Venkatesh
Transfer in reinforcement learning is usually achieved through generalisation across tasks.