In this paper, we exploit the stationary properties of human behavior within an interaction and present a representation learning method to capture behavioral information from speech in an unsupervised way.
In this paper, we investigate this link and present an analysis framework that determines appropriate window lengths for the task of behavior estimation.
Further, we investigate the importance of emotional-context in the expression of behavior by constraining (or not) the neural networks' contextual view of the data.
We find that our proposed measure is correlated with the therapist's empathy towards their patient in Motivational Interviewing and with affective behaviors in Couples Therapy.
Unsupervised learning has been an attractive method for easily deriving meaningful data representations from vast amounts of unlabeled data.
Dyadic interactions among humans are marked by speakers continuously influencing and reacting to each other in terms of responses and behaviors, among others.
Entrainment is a known adaptation mechanism that causes interaction participants to adapt or synchronize their acoustic characteristics.
Behavioral annotation using signal processing and machine learning is highly dependent on training data and manual annotations of behavioral labels.
We propose a Sparsely-Connected and Disjointly-Trained DNN (SD-DNN) framework to deal with limited data.