no code implementations • 20 Mar 2023 • Alan Lewis, Tim Miller
We propose the deceptive exploration ambiguity model (DEAM), which learns using the deceptive policy during training, leading to targeted exploration of the state space.
reinforcement-learning Reinforcement Learning (RL)