Learning to Explore with Pleasure

1 Jan 2021 · Yean Hoon Ong, Jun Wang ·

Exploration is a long-standing challenge in sequential decision problem in machine learning. This paper investigates the adoption of two theories of optimal stimulation level - "the pacer principle" and the Wundt curve - from psychology to improve the exploration challenges. We propose a method called exploration with pleasure (EP) which is formulated based on the notion of pleasure as defined in accordance with the above two theories. EP is able to identify the region of stimulations that will trigger pleasure to the learning agent during exploration and consequently improve on the learning process. The effectiveness of EP is studied in two machine learning settings: curiosity-driven reinforcement learning (RL) and Bayesian optimisation (BO). Experiments in purely curiosity-driven RL show that by using EP to generate intrinsic rewards, it can yield faster learning. Experiments in BO demonstrate that by using EP to specify the exploration parameters in two acquisition functions - Probability of Improvement and Expected Improvement - it can achieve faster convergence and better function values.

PDF Abstract