Further, we evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms, including dealing with a state representation that has a high intrinsic dimensionality and is partially observable.
Moreover, due to the large amount of data needed to learn these end-to-end solutions, an emerging trend is to learn control policies in simulation and then transfer them over to the real world.
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Learning non-rigid registration in an end-to-end manner is challenging due to the inherent high degrees of freedom and the lack of labeled training data.
Second, instead of jointly learning both the pick and the place locations, we only explicitly learn the placing policy conditioned on random pick points.