Search Results for author: Björn W. Schuller

Found 88 papers, 27 papers with code

Dawn of the transformer era in speech emotion recognition: closing the valence gap

1 code implementation • 14 Mar 2022 • Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller

Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks.

Ranked #1 on Speech Emotion Recognition on MSP-Podcast (Dominance)

Cross-corpus Fairness +1

398

Paper
Code

MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning

3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao

The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.

Multi-Label Learning Multimodal Emotion Recognition

Paper
Code

openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit

1 code implementation • 22 May 2016 • Maximilian Schmitt, Björn W. Schuller

We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input.

Document Classification Emotion Recognition +2

Paper
Code

SVTS: Scalable Video-to-Speech Synthesis

2 code implementations • 4 May 2022 • Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.

Speech Synthesis

Paper
Code

HEAR: Holistic Evaluation of Audio Representations

3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.

Open-Ended Question Answering

Paper
Code

The MuSe 2021 Multimodal Sentiment Analysis Challenge: Sentiment, Emotion, Physiological-Emotion, and Stress

1 code implementation • 14 Apr 2021 • Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, Björn W. Schuller

Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities.

Emotion Recognition Multimodal Sentiment Analysis

Paper
Code

The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress

1 code implementation • 23 Jun 2022 • Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Müller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions.

Emotion Recognition Humor Detection +1

Paper
Code

Speech Emotion Recognition using Semantic Information

1 code implementation • 4 Mar 2021 • Panagiotis Tzirakis, Anh Nguyen, Stefanos Zafeiriou, Björn W. Schuller

In this paper, we propose a novel framework that can capture both the semantic and the paralinguistic information in the signal.

Speech Emotion Recognition Sound Audio and Speech Processing

Paper
Code

audb -- Sharing and Versioning of Audio and Annotation Data in Python

1 code implementation • 1 Mar 2023 • Hagen Wierstorf, Johannes Wagner, Florian Eyben, Felix Burkhardt, Björn W. Schuller

Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing.

Management

Paper
Code

The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

1 code implementation • 5 May 2023 • Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

Participants predict the presence of spontaneous humour in a cross-cultural setting.

Emotion Recognition Multimodal Sentiment Analysis

Paper
Code

MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

1 code implementation • 30 Apr 2020 • Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris

Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities.

Emotion Recognition Multimodal Sentiment Analysis

Paper
Code

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

1 code implementation • 23 Apr 2021 • Shahin Amiriparian, Tobias Hübner, Maurice Gerczuk, Sandra Ottl, Björn W. Schuller

By obtaining state-of-the-art results on a set of paralinguistics tasks, we demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing, even when data is scarce.

Audio Signal Processing Transfer Learning

Paper
Code

End-2-End COVID-19 Detection from Breath & Cough Audio

1 code implementation • 7 Jan 2021 • Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Alice Baird, Lyn Jones, Björn W. Schuller

Our main contributions are as follows: (I) We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples, achieving ROC-AUC of 0. 846; (II) Our model, the COVID-19 Identification ResNet, (CIdeR), has potential for rapid scalability, minimal cost and improving performance as more data becomes available.

Paper
Code

MuSe-Toolbox: The Multimodal Sentiment Analysis Continuous Annotation Fusion and Discrete Class Transformation Toolbox

1 code implementation • 25 Jul 2021 • Lukas Stappen, Lea Schumann, Benjamin Sertolli, Alice Baird, Benjamin Weigel, Erik Cambria, Björn W. Schuller

With this in mind, the MuSe-Toolbox provides the functionality to run exhaustive searches for meaningful class clusters in the continuous gold standards.

Multimodal Sentiment Analysis Translation

Paper
Code

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

1 code implementation • 18 Jul 2021 • Xiangheng He, Junjie Chen, Georgios Rizos, Björn W. Schuller

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information.

Data Augmentation Generative Adversarial Network +2

Paper
Code

A large-scale and PCR-referenced vocal audio dataset for COVID-19

1 code implementation • 15 Dec 2022 • Jobie Budd, Kieran Baker, Emma Karoune, Harry Coppock, Selina Patel, Ana Tendero Cañadas, Alexander Titcomb, Richard Payne, David Hurley, Sabrina Egglestone, Lorraine Butler, Jonathon Mellor, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Radka Jersakova, Rachel A. McKendry, Peter Diggle, Sylvia Richardson, Björn W. Schuller, Steven Gilmour, Davide Pigoli, Stephen Roberts, Josef Packham, Tracey Thornley, Chris Holmes

The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date.

Paper
Code

Audio-based AI classifiers show no evidence of improved COVID-19 screening over simple symptoms checkers

1 code implementation • 15 Dec 2022 • Harry Coppock, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Kieran Baker, Jobie Budd, Richard Payne, Emma Karoune, David Hurley, Alexander Titcomb, Sabrina Egglestone, Ana Tendero Cañadas, Lorraine Butler, Radka Jersakova, Jonathon Mellor, Selina Patel, Tracey Thornley, Peter Diggle, Sylvia Richardson, Josef Packham, Björn W. Schuller, Davide Pigoli, Steven Gilmour, Stephen Roberts, Chris Holmes

Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status.

Paper
Code

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

1 code implementation • 15 May 2020 • Shahin Amiriparian, Pawel Winokurow, Vincent Karas, Sandra Ottl, Maurice Gerczuk, Björn W. Schuller

On the development partition of the data, we achieve Spearman's correlation coefficients of . 324, . 283, and . 320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively.

Machine Translation Representation Learning

Paper
Code

GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

1 code implementation • 4 May 2021 • Lukas Stappen, Jason Thies, Gerhard Hagerer, Björn W. Schuller, Georg Groh

To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed.

Clustering Topic Models +1

Paper
Code

A novel policy for pre-trained Deep Reinforcement Learning for Speech Emotion Recognition

1 code implementation • 4 Jan 2021 • Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Björn W. Schuller, Jiajun Liu

In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER.

Cross-corpus reinforcement-learning +2

Paper
Code

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis

1 code implementation • 30 Mar 2022 • Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller

Respiratory sound classification is an important tool for remote screening of respiratory-related diseases such as pneumonia, asthma, and COVID-19.

Sound Classification

Paper
Code

Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

1 code implementation • 28 Sep 2022 • Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller

Our findings suggest that for the automatic analysis of humour and its sentiment, facial expressions are most promising, while humour direction can be best modelled via text-based features.

Paper
Code

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

1 code implementation • 14 Jun 2022 • Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller

In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction.

Paper
Code

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

1 code implementation • 26 Oct 2022 • Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller

Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

Speech Emotion Recognition Transfer Learning

Paper
Code

Synthia's Melody: A Benchmark Framework for Unsupervised Domain Adaptation in Audio

1 code implementation • 26 Sep 2023 • Chia-Hsin Lin, Charles Jones, Björn W. Schuller, Harry Coppock

Despite significant advancements in deep learning for vision and natural language, unsupervised domain adaptation in audio remains relatively unexplored.

Attribute Selection bias +1

Paper
Code

Automatic Emotion Modelling in Written Stories

1 code implementation • 21 Dec 2022 • Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller

Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience.

Paper
Code

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition

no code implementations • 24 Oct 2019 • Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

no code implementations • 2 Jan 2020 • Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller

Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis

no code implementations • 24 Mar 2020 • Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li

We come to the conclusion that CA appears ready for implementation of (pre-)diagnosis and monitoring tools, and more generally provides rich and significant, yet so far untapped potential in the fight against COVID-19 spread.

Paper
Add Code

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

no code implementations • 30 Apr 2020 • Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller

In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.

Sleep Quality

Paper
Add Code

deepSELF: An Open Source Deep Self End-to-End Learning Framework

no code implementations • 11 May 2020 • Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto

To the best of our knowledge, it is the first public toolkit assembling a series of state-of-the-art deep learning technologies.

Image Generation

Paper
Add Code

On Deep Speech Packet Loss Concealment: A Mini-Survey

no code implementations • 15 May 2020 • Mostafa M. Mohamed, Mina A. Nessiem, Björn W. Schuller

In this mini-survey, we review all the literature we found to date, that attempt to solve the packet-loss in speech using deep learning methods.

Packet Loss Concealment

Paper
Add Code

ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

no code implementations • 15 May 2020 • Mostafa M. Mohamed, Björn W. Schuller

Additionally, extending this with an end-to-end emotion prediction neural network provides a network that performs SER from audio with lost frames, end-to-end.

Packet Loss Concealment Speech Emotion Recognition

Paper
Add Code

"I have vxxx bxx connexxxn!": Facing Packet Loss in Deep Speech Emotion Recognition

no code implementations • 15 May 2020 • Mostafa M. Mohamed, Björn W. Schuller

We explore matched, mismatched, and multi-condition training settings.

Data Augmentation Speech Emotion Recognition

Paper
Add Code

High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

no code implementations • 1 Jun 2020 • Kazi Nazmul Haque, Rajib Rana, Björn W. Schuller

Hence, with the extensive experimental results, we have demonstrated that by harnessing the power of the high-fidelity audio generation, the proposed GAAE model can learn powerful representation from unlabelled dataset leveraging a fewer percentage of labelled data as supervision/guidance.

Audio Generation Representation Learning +1

Paper
Add Code

Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)

no code implementations • 15 Jun 2020 • Lukas Stappen, Xinchen Du, Vincent Karas, Stefan Müller, Björn W. Schuller

Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles.

Benchmarking Domain Adaptation +1

Paper
Add Code

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

no code implementations • 12 Jul 2020 • Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Björn W. Schuller

To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side.

Paper
Add Code

Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview

no code implementations • 29 Nov 2020 • Gauri Deshpande, Björn W. Schuller

This drives the research focus towards identifying the markers of COVID-19 in speech and other human generated audio signals.

Paper
Add Code

The voice of COVID-19: Acoustic correlates of infection

no code implementations • 17 Dec 2020 • Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Schönweiler, Markus Wehler, Björn W. Schuller

Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope.

Paper
Add Code

Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural Networks

no code implementations • 29 Dec 2020 • Björn W. Schuller, Harry Coppock, Alexander Gaskell

The COVID-19 pandemic has affected the world unevenly; while industrial economies have been able to produce the tests necessary to track the spread of the virus and mostly avoided complete lockdowns, developing countries have faced issues with testing capacity.

Bayesian Optimisation

Paper
Add Code

The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates

no code implementations • 24 Feb 2021 • Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp

The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified.

Binary Classification Representation Learning

Paper
Add Code

Computational Emotion Analysis From Images: Recent Advances and Future Directions

no code implementations • 19 Mar 2021 • Sicheng Zhao, Quanwei Huang, YouBao Tang, Xingxu Yao, Jufeng Yang, Guiguang Ding, Björn W. Schuller

Recently, extensive research efforts have been dedicated to understanding the emotions of images.

Domain Adaptation Emotion Recognition

Paper
Add Code

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

no code implementations • 20 Apr 2021 • Shahin Amiriparian, Artem Sokolov, Ilhan Aslan, Lukas Christ, Maurice Gerczuk, Tobias Hübner, Dmitry Lamanov, Manuel Milling, Sandra Ottl, Ilya Poduremennykh, Evgeniy Shuranov, Björn W. Schuller

Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations • 27 Apr 2021 • Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

Paper
Add Code

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations • 16 Jun 2021 • Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning +1

Paper
Add Code

Affective Image Content Analysis: Two Decades Review and New Perspectives

no code implementations • 30 Jun 2021 • Sicheng Zhao, Xingxu Yao, Jufeng Yang, Guoli Jia, Guiguang Ding, Tat-Seng Chua, Björn W. Schuller, Kurt Keutzer

Images can convey rich semantics and induce various emotions in viewers.

Emotional Intelligence Emotion Recognition +1

Paper
Add Code

A Physiologically-Adapted Gold Standard for Arousal during Stress

no code implementations • 27 Jul 2021 • Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Meßner, Björn W. Schuller

We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features.

Paper
Add Code

Evaluating the COVID-19 Identification ResNet (CIdeR) on the INTERSPEECH COVID-19 from Audio Challenges

no code implementations • 30 Jul 2021 • Alican Akman, Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Lyn Jones, Björn W. Schuller

We report on cross-running the recent COVID-19 Identification ResNet (CIdeR) on the two Interspeech 2021 COVID-19 diagnosis from cough and speech audio challenges: ComParE and DiCOVA.

COVID-19 Diagnosis

Paper
Add Code

Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations

no code implementations • 4 Oct 2021 • Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller

Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.

Acoustic Scene Classification Fairness +1

Paper
Add Code

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

no code implementations • 13 Oct 2021 • Andreas Triantafyllopoulos, Uwe Reichel, Shuo Liu, Stephan Huber, Florian Eyben, Björn W. Schuller

In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER).

Speech Emotion Recognition

Paper
Add Code

EIHW-MTG DiCOVA 2021 Challenge System Report

no code implementations • 13 Oct 2021 • Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller

This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs.

Paper
Add Code

Facial Emotion Recognition using Deep Residual Networks in Real-World Environments

no code implementations • 4 Nov 2021 • Panagiotis Tzirakis, Dénes Boros, Elnar Hajiyev, Björn W. Schuller

To show the favourable properties of our pre-trained model on modelling facial affect, we use the RECOLA database, and compare with the current state-of-the-art approach.

Facial Emotion Recognition

Paper
Add Code

Emotion Intensity and its Control for Emotional Voice Conversion

no code implementations • 10 Jan 2022 • Kun Zhou, Berrak Sisman, Rajib Rana, Björn W. Schuller, Haizhou Li

As desired, the proposed network controls the fine-grained emotion intensity in the output speech.

Emotion Classification Voice Conversion

Paper
Add Code

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

no code implementations • 2 Feb 2022 • Mostafa M. Mohamed, Björn W. Schuller

We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training.

Binary Classification Decision Making +2

Paper
Add Code

A Summary of the ComParE COVID-19 Challenges

no code implementations • 17 Feb 2022 • Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Jing Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Björn W. Schuller

The COVID-19 pandemic has caused massive humanitarian and economic damage.

Humanitarian

Paper
Add Code

Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition

no code implementations • 9 Mar 2022 • Yi Chang, Sofiane Laridi, Zhao Ren, Gregory Palmer, Björn W. Schuller, Marco Fisichella

The proposed framework consists of i) federated learning for data privacy, and ii) adversarial training at the training stage and randomisation at the testing stage for model robustness.

Federated Learning Speech Emotion Recognition

Paper
Add Code

Climate Change & Computer Audition: A Call to Action and Overview on Audio Intelligence to Help Save the Planet

no code implementations • 10 Mar 2022 • Björn W. Schuller, Alican Akman, Yi Chang, Harry Coppock, Alexander Gebhard, Alexander Kathan, Esther Rituerto-González, Andreas Triantafyllopoulos, Florian B. Pokorny

We categorise potential computer audition applications according to the five elements of earth, water, air, fire, and aether, proposed by the ancient Greeks in their five element theory; this categorisation serves as a framework to discuss computer audition in relation to different ecological aspects.

Paper
Add Code

Audiovisual Affect Assessment and Autonomous Automobiles: Applications

no code implementations • 14 Mar 2022 • Björn W. Schuller, Dagmar M. Schuller

Emotion and a broader range of affective driver states can be a life decisive factor on the road.

Autonomous Vehicles Emotion Recognition

Paper
Add Code

Continuous-Time Audiovisual Fusion with Recurrence vs. Attention for In-The-Wild Affect Recognition

no code implementations • 24 Mar 2022 • Vincent Karas, Mani Kumar Tellamekala, Adria Mallol-Ragolta, Michel Valstar, Björn W. Schuller

To clearly understand the performance differences between recurrent and attention models in audiovisual affect recognition, we present a comprehensive evaluation of fusion models based on LSTM-RNNs, self-attention and cross-modal attention, trained for valence and arousal estimation.

Arousal Estimation Multimodal Emotion Recognition

Paper
Add Code

An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion

no code implementations • 29 Mar 2022 • Zijiang Yang, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller

Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond.

Voice Conversion

Paper
Add Code

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

no code implementations • 31 Mar 2022 • Xin Jing, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Björn W. Schuller

Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission.

Computational Efficiency

Paper
Add Code

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

no code implementations • 1 Apr 2022 • Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller

Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Journaling Data for Daily PHQ-2 Depression Prediction and Forecasting

no code implementations • 6 May 2022 • Alexander Kathan, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Küster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, Björn W. Schuller

Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression.

Paper
Add Code

Fatigue Prediction in Outdoor Running Conditions using Audio Data

no code implementations • 9 May 2022 • Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, Shahin Amiriparian, Björn W. Schuller

Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year.

Paper
Add Code

Insights on Modelling Physiological, Appraisal, and Affective Indicators of Stress using Audio Features

no code implementations • 9 May 2022 • Andreas Triantafyllopoulos, Sandra Zänkert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, Björn W. Schuller

Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms.

Paper
Add Code

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

no code implementations • 13 May 2022 • Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, Stephen Roberts

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected.

Human Activity Recognition

Paper
Add Code

Depression Diagnosis and Forecast based on Mobile Phone Sensor Data

no code implementations • 10 May 2022 • Xiangheng He, Andreas Triantafyllopoulos, Alexander Kathan, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Küster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, Björn W. Schuller

Previous studies have shown the correlation between sensor data collected from mobile phones and human depression states.

Paper
Add Code

COLD Fusion: Calibrated and Ordinal Latent Distribution Fusion for Uncertainty-Aware Multimodal Emotion Recognition

no code implementations • 12 Jun 2022 • Mani Kumar Tellamekala, Shahin Amiriparian, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar

In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions.

Multimodal Emotion Recognition

Paper
Add Code

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

no code implementations • 20 Jun 2022 • Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller

As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers.

Paper
Add Code

Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?

no code implementations • 3 Jul 2022 • Mani Kumar Tellamekala, Ömer Sümer, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar

We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in AUs 4, 6, 10, 12, and 25, but not the entire set of AUs.

3D Face Alignment Arousal Estimation +1

Paper
Add Code

Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts

no code implementations • 15 Sep 2022 • Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller

Vocal bursts play an important role in communicating affect, making them valuable for improving speech emotion recognition.

Speech Emotion Recognition

Paper
Add Code

An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao

Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Statistical Design and Analysis for Robust Machine Learning: A Case Study from COVID-19

no code implementations • 15 Dec 2022 • Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G. Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J. Roberts, Björn W. Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb

Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings.

Paper
Add Code

Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

no code implementations • 31 Dec 2022 • Björn W. Schuller, Shahin Amiriparian, Anton Batliner, Alexander Gebhard, Maurice Gerzcuk, Vincent Karas, Alexander Kathan, Lennart Seizer, Johanna Löchner

We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.

Paper
Add Code

A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era

no code implementations • 23 Jan 2023 • Zhao Ren, Yi Chang, Thanh Tam Nguyen, Yang Tan, Kun Qian, Björn W. Schuller

Deep learning has been successfully applied to heart sound analysis in the past years.

Paper
Add Code

Will Affective Computing Emerge from Foundation Models and General AI? A First Evaluation on ChatGPT

no code implementations • 3 Mar 2023 • Mostafa M. Amin, Erik Cambria, Björn W. Schuller

We utilise three baselines, a robust language model (RoBERTa-base), a legacy word model with pretrained embeddings (Word2Vec), and a simple bag-of-words baseline (BoW).

Language Modelling Sentiment Analysis +2

Paper
Add Code

The ACM Multimedia 2023 Computational Paralinguistics Challenge: Emotion Share & Requests

no code implementations • 28 Apr 2023 • Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié

The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected.

regression

Paper
Add Code

Integrating Generative Artificial Intelligence in Intelligent Vehicle Systems

no code implementations • 15 May 2023 • Lukas Stappen, Jeremy Dillmann, Serena Striegel, Hans-Jörg Vögel, Nicolas Flores-Herr, Björn W. Schuller

This paper aims to serve as a comprehensive guide for researchers and practitioners, offering insights into the current state, potential applications, and future research directions for generative artificial intelligence and foundation models within the context of intelligent vehicles.

Ethics

Paper
Add Code

Can ChatGPT's Responses Boost Traditional Natural Language Processing?

1 code implementation • 6 Jul 2023 • Mostafa M. Amin, Erik Cambria, Björn W. Schuller

In this work, we extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together.

Language Modelling Sentiment Analysis

Paper
Code

Identifying depression-related topics in smartphone-collected free-response speech recordings using an automatic speech recognition system and a deep learning topic model

no code implementations • 22 Aug 2023 • Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf, Richard JB Dobson, NIcholas Cummins, RADAR-CNS consortium

Our study identified 29 topics in 3919 smartphone-collected speech recordings from 265 participants using the Whisper tool and BERTopic model.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

A Wide Evaluation of ChatGPT on Affective Computing Tasks

no code implementations • 26 Aug 2023 • Mostafa M. Amin, Rui Mao, Erik Cambria, Björn W. Schuller

In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3. 5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection.

Aspect Extraction Sarcasm Detection +1

Paper
Add Code

Exploring Meta Information for Audio-based Zero-shot Bird Classification

no code implementations • 15 Sep 2023 • Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research.

Audio Classification Zero-shot Audio Classification +1

Paper
Add Code

Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits

no code implementations • 18 Sep 2023 • Xiangheng He, Junjie Chen, Björn W. Schuller

Our proposed method is significantly superior in terms of UAR and F1 to the single-task and multi-task baselines with p-values < 0. 05.

Dialogue Act Classification Multi-Armed Bandits +2

Paper
Add Code

Customising General Large Language Models for Specialised Emotion Recognition Tasks

no code implementations • 22 Oct 2023 • Liyizhe Peng, Zixing Zhang, Tao Pang, Jing Han, Huan Zhao, Hao Chen, Björn W. Schuller

This indicates the strong transferability and feasibility of LLMs in the field of emotion recognition.

Emotion Recognition Language Modelling +1

Paper
Add Code

Testing Speech Emotion Recognition Machine Learning Models

no code implementations • 11 Dec 2023 • Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller

Machine learning models for speech emotion recognition (SER) can be trained for different tasks and are usually evaluated on the basis of a few available datasets per task.

Fairness Speech Emotion Recognition

Paper
Add Code

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

no code implementations • 2 Feb 2024 • Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction.

Adversarial Attack Speech Emotion Recognition

Paper
Add Code

On Prompt Sensitivity of ChatGPT in Affective Computing

no code implementations • 20 Mar 2024 • Mostafa M. Amin, Björn W. Schuller

Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing.

Prompt Engineering Sarcasm Detection +2

Paper
Add Code

Enhancing Suicide Risk Assessment: A Speech-Based Automated Approach in Emergency Medicine

no code implementations • 18 Apr 2024 • Shahin Amiriparian, Maurice Gerczuk, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Alexander Kathan, Björn W. Schuller

The metadata integration yields a balanced accuracy of $94. 4\,\%$, marking an absolute improvement of $28. 2\,\%$, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.

Binary Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.