Decisions made by human subjects in a driving simulator are treated as safe demonstrations, which are stored into the replay buffer and then utilized to enhance the training process of RL.
We theoretically prove that the policy improvement theorem holds for the preference-guided $\epsilon$-greedy policy and experimentally show that the inferred action preference distribution aligns with the landscape of corresponding Q-values.
Several GRL approaches are summarized and implemented in the proposed framework.
A novel prioritized experience replay mechanism that adapts to human guidance in the reinforcement learning process is proposed to boost the efficiency and performance of the reinforcement learning algorithm.
Then, a novel uncertainty-aware model-based RL framework is developed based on the adaptive truncation approach, providing virtual interactions between the agent and environment model, and improving RL's training efficiency and performance.