We believe that synthetic training data could open the door to realizing the full potential of deep learning for replay-based RL algorithms from limited data.
Sample efficiency and exploration remain major challenges in online reinforcement learning (RL).
We then present CASCADE, a novel approach for self-supervised exploration in this new setting.
Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly.
Off-policy reinforcement learning (RL) from pixel observations is notoriously unstable.
Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain.
Significant progress has been made recently in offline model-based reinforcement learning, approaches which leverage a learned dynamics model.
Reinforcement learning from large-scale offline datasets provides us with the ability to learn policies without potentially unsafe or impractical exploration.
We study a setting where the pruning phase is given a time budget, and identify connections between iterative pruning and multiple sleep cycles in humans.
The principle of optimism in the face of uncertainty is prevalent throughout sequential decision making problems such as multi-armed bandits and reinforcement learning (RL).
In this paper, we provide: 1) an accessible overview of the discrete-state formulation of active inference, highlighting natural behaviors in active inference that are generally engineered in RL; 2) an explicit discrete-state comparison between active inference and RL on an OpenAI gym baseline.
We demonstrate our new sensitivity analysis tools in real-world fairness scenarios to assess the bias arising from confounding.