CNN-based Facial Affect Analysis on Mobile Devices

23 Jul 2018 · Charlie Hewitt, Hatice Gunes ·

This paper focuses on the design, deployment and evaluation of Convolutional Neural Network (CNN) architectures for facial affect analysis on mobile devices. Unlike traditional CNN approaches, models deployed to mobile devices must minimise storage requirements while retaining high performance. We therefore propose three variants of established CNN architectures and comparatively evaluate them on a large, in-the-wild benchmark dataset of facial images. Our results show that the proposed architectures retain similar performance to the dataset baseline while minimising storage requirements: achieving 58% accuracy for eight-class emotion classification and average RMSE of 0.39 for valence/arousal prediction. To demonstrate the feasibility of deploying these models for real-world applications, we implement a music recommendation interface based on predicted user affect. Although the CNN models were not trained in the context of music recommendation, our case study shows that: (i) the trained models achieve similar prediction performance to the benchmark dataset, and (ii) users tend to positively rate the song recommendations provided by the interface. Average runtime of the deployed models on an iPhone 6S equates to ~45 fps, suggesting that the proposed architectures are also well suited for real-time deployment on video streams.

PDF Abstract