Search Results for author: Björn W. Schuller

Found 61 papers, 16 papers with code

Are 3D Face Shapes Expressive Enough for Recognising Continuous Emotions and Action Unit Intensities?

no code implementations3 Jul 2022 Mani Kumar Tellamekala, Ömer Sümer, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar

We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in AUs 4, 6, 10, 12, and 25, but not the entire set of AUs.

3D Face Alignment Arousal Estimation +1

The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress

1 code implementation23 Jun 2022 Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Müller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller

For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions.

Emotion Recognition Humor Detection +1

COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

no code implementations20 Jun 2022 Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller

As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers.

Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction

1 code implementation14 Jun 2022 Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller

In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction.

The ACM Multimedia 2022 Computational Paralinguistics Challenge: Vocalisations, Stuttering, Activity, & Mosquitoes

no code implementations13 May 2022 Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, Stephen Roberts

The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected.

Human Activity Recognition

Fatigue Prediction in Outdoor Running Conditions using Audio Data

no code implementations9 May 2022 Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, Shahin Amiriparian, Björn W. Schuller

Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year.

Journaling Data for Daily PHQ-2 Depression Prediction and Forecasting

no code implementations6 May 2022 Alexander Kathan, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Küster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, Björn W. Schuller

Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression.

SVTS: Scalable Video-to-Speech Synthesis

no code implementations4 May 2022 Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.

Speech Synthesis

Probing Speech Emotion Recognition Transformers for Linguistic Knowledge

no code implementations1 Apr 2022 Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller

Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets.

Automatic Speech Recognition Speech Emotion Recognition +1

A Temporal-oriented Broadcast ResNet for COVID-19 Detection

no code implementations31 Mar 2022 Xin Jing, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Björn W. Schuller

Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission.

Example-based Explanations with Adversarial Attacks for Respiratory Sound Analysis

1 code implementation30 Mar 2022 Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller

Respiratory sound classification is an important tool for remote screening of respiratory-related diseases such as pneumonia, asthma, and COVID-19.

An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion

no code implementations29 Mar 2022 Zijiang Yang, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller

Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond.

Voice Conversion

Continuous-Time Audiovisual Fusion with Recurrence vs. Attention for In-The-Wild Affect Recognition

no code implementations24 Mar 2022 Vincent Karas, Mani Kumar Tellamekala, Adria Mallol-Ragolta, Michel Valstar, Björn W. Schuller

To clearly understand the performance differences between recurrent and attention models in audiovisual affect recognition, we present a comprehensive evaluation of fusion models based on LSTM-RNNs, self-attention and cross-modal attention, trained for valence and arousal estimation.

Arousal Estimation Multimodal Emotion Recognition

Climate Change & Computer Audition: A Call to Action and Overview on Audio Intelligence to Help Save the Planet

no code implementations10 Mar 2022 Björn W. Schuller, Alican Akman, Yi Chang, Harry Coppock, Alexander Gebhard, Alexander Kathan, Esther Rituerto-González, Andreas Triantafyllopoulos, Florian B. Pokorny

We categorise potential computer audition applications according to the five elements of earth, water, air, fire, and aether, proposed by the ancient Greeks in their five element theory; this categorisation serves as a framework to discuss computer audition in relation to different ecological aspects.

Robust Federated Learning Against Adversarial Attacks for Speech Emotion Recognition

no code implementations9 Mar 2022 Yi Chang, Sofiane Laridi, Zhao Ren, Gregory Palmer, Björn W. Schuller, Marco Fisichella

The proposed framework consists of i) federated learning for data privacy, and ii) adversarial training at the training stage and randomisation at the testing stage for model robustness.

Federated Learning Speech Emotion Recognition

Normalise for Fairness: A Simple Normalisation Technique for Fairness in Regression Machine Learning Problems

no code implementations2 Feb 2022 Mostafa M. Mohamed, Björn W. Schuller

We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training.

Decision Making Fairness

Facial Emotion Recognition using Deep Residual Networks in Real-World Environments

no code implementations4 Nov 2021 Panagiotis Tzirakis, Dénes Boros, Elnar Hajiyev, Björn W. Schuller

To show the favourable properties of our pre-trained model on modelling facial affect, we use the RECOLA database, and compare with the current state-of-the-art approach.

Facial Emotion Recognition

EIHW-MTG DiCOVA 2021 Challenge System Report

no code implementations13 Oct 2021 Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller

This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs.

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

no code implementations13 Oct 2021 Andreas Triantafyllopoulos, Uwe Reichel, Shuo Liu, Stephan Huber, Florian Eyben, Björn W. Schuller

In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER).

Speech Emotion Recognition

Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations

no code implementations4 Oct 2021 Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller

Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.

Acoustic Scene Classification Fairness +1

Evaluating the COVID-19 Identification ResNet (CIdeR) on the INTERSPEECH COVID-19 from Audio Challenges

no code implementations30 Jul 2021 Alican Akman, Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Lyn Jones, Björn W. Schuller

We report on cross-running the recent COVID-19 Identification ResNet (CIdeR) on the two Interspeech 2021 COVID-19 diagnosis from cough and speech audio challenges: ComParE and DiCOVA.

COVID-19 Diagnosis

A Physiologically-Adapted Gold Standard for Arousal during Stress

no code implementations27 Jul 2021 Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Meßner, Björn W. Schuller

We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features.

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

1 code implementation18 Jul 2021 Xiangheng He, Junjie Chen, Georgios Rizos, Björn W. Schuller

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information.

Data Augmentation Speech Emotion Recognition +1

LiRA: Learning Visual Speech Representations from Audio through Self-supervision

no code implementations16 Jun 2021 Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic

The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.

Lip Reading Self-Supervised Learning

GraphTMT: Unsupervised Graph-based Topic Modeling from Video Transcripts

1 code implementation4 May 2021 Lukas Stappen, Jason Thies, Gerhard Hagerer, Björn W. Schuller, Georg Groh

To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed.

Topic Models Word Embeddings

End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks

no code implementations27 Apr 2021 Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic

In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.

Lip Reading Speech Synthesis

DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data

1 code implementation23 Apr 2021 Shahin Amiriparian, Tobias Hübner, Maurice Gerczuk, Sandra Ottl, Björn W. Schuller

By obtaining state-of-the-art results on a set of paralinguistics tasks, we demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing, even when data is scarce.

Audio Signal Processing Transfer Learning

The MuSe 2021 Multimodal Sentiment Analysis Challenge: Sentiment, Emotion, Physiological-Emotion, and Stress

1 code implementation14 Apr 2021 Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, Björn W. Schuller

Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities.

Emotion Recognition Multimodal Sentiment Analysis

Speech Emotion Recognition using Semantic Information

1 code implementation4 Mar 2021 Panagiotis Tzirakis, Anh Nguyen, Stefanos Zafeiriou, Björn W. Schuller

In this paper, we propose a novel framework that can capture both the semantic and the paralinguistic information in the signal.

Speech Emotion Recognition Sound Audio and Speech Processing

The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates

no code implementations24 Feb 2021 Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp

The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified.

Representation Learning

End-2-End COVID-19 Detection from Breath & Cough Audio

1 code implementation7 Jan 2021 Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Alice Baird, Lyn Jones, Björn W. Schuller

Our main contributions are as follows: (I) We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples, achieving ROC-AUC of 0. 846; (II) Our model, the COVID-19 Identification ResNet, (CIdeR), has potential for rapid scalability, minimal cost and improving performance as more data becomes available.

Detecting COVID-19 from Breathing and Coughing Sounds using Deep Neural Networks

no code implementations29 Dec 2020 Björn W. Schuller, Harry Coppock, Alexander Gaskell

The COVID-19 pandemic has affected the world unevenly; while industrial economies have been able to produce the tests necessary to track the spread of the virus and mostly avoided complete lockdowns, developing countries have faced issues with testing capacity.

Bayesian Optimisation

The voice of COVID-19: Acoustic correlates of infection

no code implementations17 Dec 2020 Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Schönweiler, Markus Wehler, Björn W. Schuller

Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope.

Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview

no code implementations29 Nov 2020 Gauri Deshpande, Björn W. Schuller

This drives the research focus towards identifying the markers of COVID-19 in speech and other human generated audio signals.

MeDaS: An open-source platform as service to help break the walls between medicine and informatics

no code implementations12 Jul 2020 Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Björn W. Schuller

To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side.

Natural Language Processing

Domain Adaptation with Joint Learning for Generic, Optical Car Part Recognition and Detection Systems (Go-CaRD)

no code implementations15 Jun 2020 Lukas Stappen, Xinchen Du, Vincent Karas, Stefan Müller, Björn W. Schuller

Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles.

Domain Adaptation

High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder

no code implementations1 Jun 2020 Kazi Nazmul Haque, Rajib Rana, Björn W. Schuller

Hence, with the extensive experimental results, we have demonstrated that by harnessing the power of the high-fidelity audio generation, the proposed GAAE model can learn powerful representation from unlabelled dataset leveraging a fewer percentage of labelled data as supervision/guidance.

Audio Generation Representation Learning

ConcealNet: An End-to-end Neural Network for Packet Loss Concealment in Deep Speech Emotion Recognition

no code implementations15 May 2020 Mostafa M. Mohamed, Björn W. Schuller

Additionally, extending this with an end-to-end emotion prediction neural network provides a network that performs SER from audio with lost frames, end-to-end.

Speech Emotion Recognition

On Deep Speech Packet Loss Concealment: A Mini-Survey

no code implementations15 May 2020 Mostafa M. Mohamed, Mina A. Nessiem, Björn W. Schuller

In this mini-survey, we review all the literature we found to date, that attempt to solve the packet-loss in speech using deep learning methods.

A Novel Fusion of Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

1 code implementation15 May 2020 Shahin Amiriparian, Pawel Winokurow, Vincent Karas, Sandra Ottl, Maurice Gerczuk, Björn W. Schuller

On the development partition of the data, we achieve Spearman's correlation coefficients of . 324, . 283, and . 320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively.

Machine Translation Representation Learning

deepSELF: An Open Source Deep Self End-to-End Learning Framework

no code implementations11 May 2020 Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto

To the best of our knowledge, it is the first public toolkit assembling a series of state-of-the-art deep learning technologies.

Image Generation

MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

1 code implementation30 Apr 2020 Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris

Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities.

Emotion Recognition Multimodal Sentiment Analysis

An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety

no code implementations30 Apr 2020 Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller

In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.

Sleep Quality

COVID-19 and Computer Audition: An Overview on What Speech & Sound Analysis Could Contribute in the SARS-CoV-2 Corona Crisis

no code implementations24 Mar 2020 Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li

We come to the conclusion that CA appears ready for implementation of (pre-)diagnosis and monitoring tools, and more generally provides rich and significant, yet so far untapped potential in the fight against COVID-19 spread.

Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends

no code implementations2 Jan 2020 Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller

Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions.

Automatic Speech Recognition Emotion Recognition +4

Pre-training in Deep Reinforcement Learning for Automatic Speech Recognition

no code implementations24 Oct 2019 Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller

Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment.

Automatic Speech Recognition reinforcement-learning +1

openXBOW - Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit

1 code implementation22 May 2016 Maximilian Schmitt, Björn W. Schuller

We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input.

Document Classification Emotion Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.