Unlocking the Emotional States of High-Risk Suicide Callers through Speech Analysis

Suicide remains a major public health concern worldwide, and early detection of suicidal ideation is crucial for prevention. One promising approach for monitoring symptoms is through the prediction of suicidal speech, as speech can be passively collected and may provide insight into changes in risk. However, identifying suicidal speech is challenging due to the rapid variability in speech characteristics and the association of suicidal ideation with emotion dysregulation. In light of these challenges, we present a novel end-to-end (E2E) method for speech emotion recognition (SER) as a mean of detecting changes in emotional state, that may indicate a high risk of suicide. Our method incorporates the use of Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to analyze raw waveform signals. Firstly, our model is designed to train the CNN component to identify higher-level speech representations directly from raw waveform data, rather than relying on manually crafted features or spectrograms. This enables the network to effectively capture specific emotion-related features within a narrow frequency range, while also handling speech of varying lengths without the need for segmentation. Secondly, the GRU component is capable of learning temporal patterns, enhancing the network’s ability to capture time-dependent features in signal. Furthermore, we validate our approach on the NSPL- CRISE emotion dataset that we recently created. This dataset contains phone call recordings from lifeline frequent callers with psychological problems, and potentially with a history of suicidal ideation and previous attempts. Our experimental results show that our method outperforms other state-of-the-art techniques for SER.



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.