Intuitively, easy samples, which generally exit early in the network during inference, should contribute more to training early classifiers.
Decisions made by human subjects in a driving simulator are treated as safe demonstrations, which are stored into the replay buffer and then utilized to enhance the training process of RL.
We theoretically prove that the policy improvement theorem holds for the preference-guided $\epsilon$-greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q-values.
Spatial redundancy widely exists in visual recognition tasks, i. e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand.
A novel prioritized experience replay mechanism that adapts to human guidance in the reinforcement learning process is proposed to boost the efficiency and performance of the reinforcement learning algorithm.
To handle the error of the head pose estimation model, this paper proposes an adaptive Kalman Filter.
Therefore, we propose an apprenticeship learning in combination with deep reinforcement learning approach that allows the agent to learn the driving and stopping behaviors with continuous actions.