These pre-trained policies can accelerate learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
In offline reinforcement learning, a policy learns to maximize cumulative rewards with a fixed collection of data.
Reward decomposition is a critical problem in centralized training with decentralized execution~(CTDE) paradigm for multi-agent reinforcement learning.
Video game genre classification based on its cover and textual description would be utterly beneficial to many modern identification, collocation, and retrieval systems.
Unlike previous approaches that apply search algorithms on a small, human-designed search space without considering hardware diversity, we propose HURRICANE that explores the automatic hardware-aware search over a much larger search space and a two-stage search algorithm, to efficiently generate tailored models for different types of hardware.