Our framework relies on Neuroevolution-Enhanced Multi-Objective Optimization (NEMO), a novel search method, to find Pareto optimal mixed-precision configurations for memory and bit-operations objectives.
Since the safe agent effectively abstracts a task-independent notion of safety via its action probabilities, it can be ported to modulate multiple policies solving different tasks within the given environment without further training.
For deep neural network accelerators, memory movement is both energetically expensive and can bound computation.
Training policies solely on the team-based reward is often difficult due to its sparsity.
Deep reinforcement learning algorithms have been successfully applied to a range of challenging control tasks.