no code implementations • 25 Sep 2023 • Emanuel Figetakis, Yahuza Bello, Ahmed Refaey, Lei Lei, Medhat Moussa
The results unequivocally demonstrate that the DQN agent trained using the {\epsilon}-greedy policy significantly outperforms the one trained with the Boltzmann policy.