The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games

26 Jan 2021  ·  Konstantinos Makantasis, Antonios Liapis, Georgios N. Yannakakis ·

What if emotion could be captured in a general and subject-agnostic fashion? Is it possible, for instance, to design general-purpose representations that detect affect solely from the pixels and audio of a human-computer interaction video? In this paper we address the above questions by evaluating the capacity of deep learned representations to predict affect by relying only on audiovisual information of videos. We assume that the pixels and audio of an interactive session embed the necessary information required to detect affect. We test our hypothesis in the domain of digital games and evaluate the degree to which deep classifiers and deep preference learning algorithms can learn to predict the arousal of players based only on the video footage of their gameplay. Our results from four dissimilar games suggest that general-purpose representations can be built across games as the arousal models obtain average accuracies as high as 85% using the challenging leave-one-video-out cross-validation scheme. The dissimilar audiovisual characteristics of the tested games showcase the strengths and limitations of the proposed method.

PDF Abstract
No code implementations yet. Submit your code now

Categories


Human-Computer Interaction

Datasets


  Add Datasets introduced or used in this paper