1 code implementation • 8 Jan 2025 • Zijiang Yang, Meishu Song, Xin Jing, Haojie Zhang, Kun Qian, Bin Hu, Kota Tamada, Toru Takumi, Björn W. Schuller, Yoshiharu Yamamoto
The findings suggest promising directions for vocalization analysis and highlight the potential value of audible and ultrasound vocalizations in ASD detection.
no code implementations • 30 Dec 2024 • Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Björn W. Schuller, Amir Hussain
The advent of text-to-video generation models has revolutionized content creation as it produces high-quality videos from textual prompts.
no code implementations • 19 Dec 2024 • Qiyang Sun, Yupei Li, Emran Alturki, Sunil Munthumoduku Krishna Murthy, Björn W. Schuller
As Artificial Intelligence (AI) continues to advance rapidly, Friendly AI (FAI) has been proposed to advocate for more equitable and fair development of AI.
no code implementations • 17 Dec 2024 • Yupei Li, Manuel Milling, Lucia Specia, Björn W. Schuller
This makes them susceptible to deception by surface-level sentence patterns, particularly for longer texts and in texts that have been subsequently paraphrased.
no code implementations • 16 Dec 2024 • Xiangheng He, Junjie Chen, Zixing Zhang, Björn W. Schuller
We propose ProsodyFM, a prosody-aware text-to-speech synthesis (TTS) model with a flow-matching (FM) backbone that aims to enhance the phrasing and intonation aspects of prosody.
no code implementations • 16 Dec 2024 • Simon Rampp, Andreas Triantafyllopoulos, Manuel Milling, Björn W. Schuller
This work introduces the key operating principles for autrainer, our new deep learning training framework for computer audition tasks.
no code implementations • 15 Nov 2024 • Qiyang Sun, Alican Akman, Björn W. Schuller
The continuous development of artificial intelligence (AI) theory has propelled this field to unprecedented heights, owing to the relentless efforts of scholars and researchers.
no code implementations • 14 Oct 2024 • Qiyang Sun, Alican Akman, Xin Jing, Manuel Milling, Björn W. Schuller
The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship.
1 code implementation • 10 Oct 2024 • Alican Akman, Qiyang Sun, Björn W. Schuller
The increasing success of audio foundation models across various tasks has led to a growing need for improved interpretability to understand their intricate decision-making processes better.
no code implementations • 27 Aug 2024 • Mohammad Nadeem, Shahab Saquib Sohail, Erik Cambria, Björn W. Schuller, Amir Hussain
In the present work, we have identified a fundamental limitation related to the image generation ability of LLMs, and termed it The NO Syndrome.
no code implementations • 12 Aug 2024 • Manuel Milling, Shuo Liu, Andreas Triantafyllopoulos, Ilhan Aslan, Björn W. Schuller
Neural network models for audio tasks, such as automatic speech recognition (ASR) and acoustic scene classification (ASC), are susceptible to noise contamination for real-life applications.
Acoustic Scene Classification Automatic Speech Recognition +4
1 code implementation • 3 Jul 2024 • Rui Liu, Haolin Zuo, Zheng Lian, Xiaofen Xing, Björn W. Schuller, Haizhou Li
Together with the release of the dataset, we also develop an Emotion and Intent Interaction (EI$^2$) network as a reference system by modeling the deep correlation between emotion and intent in the multimodal conversation.
no code implementations • 26 Jun 2024 • Maurice Gerczuk, Shahin Amiriparian, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Björn W. Schuller
Finally, our analysis reveals a discrepancy in the relationship of speech characteristics and suicide risk between female and male subjects.
1 code implementation • 25 Jun 2024 • Lukas Christ, Shahin Amiriparian, Friederike Hawighorst, Ann-Kathrin Schill, Angelo Boutalikakis, Lorenz Graf-Vlachy, Andreas König, Björn W. Schuller
Flattery is an important aspect of human communication that facilitates social bonding, shapes perceptions, and influences behavior through strategic compliments and praise, leveraging the power of speech to build rapport effectively.
no code implementations • 21 Jun 2024 • Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller
Speech emotion recognition (SER) plays a crucial role in human-computer interaction.
1 code implementation • 11 Jun 2024 • Shahin Amiriparian, Filip Packań, Maurice Gerczuk, Björn W. Schuller
Foundation models have shown great promise in speech emotion recognition (SER) by leveraging their pre-trained representations to capture emotion patterns in speech signals.
1 code implementation • 4 Jun 2024 • Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience.
no code implementations • 7 May 2024 • Zhongren Dong, Zixing Zhang, Weixiang Xu, Jing Han, Jianjun Ou, Björn W. Schuller
Automatically detecting Alzheimer's Disease (AD) from spontaneous speech plays an important role in its early diagnosis.
no code implementations • 30 Apr 2024 • Andreas Triantafyllopoulos, Björn W. Schuller
Imbuing machines with the ability to talk has been a longtime pursuit of artificial intelligence (AI) research.
2 code implementations • 26 Apr 2024 • Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
However, this process may lead to inaccurate annotations, such as ignoring non-majority or non-candidate labels.
no code implementations • 18 Apr 2024 • Shahin Amiriparian, Maurice Gerczuk, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Alexander Kathan, Björn W. Schuller
The metadata integration yields a balanced accuracy of $94. 4\,\%$, marking an absolute improvement of $28. 2\,\%$, demonstrating the efficacy of our proposed approaches for automatic suicide risk assessment in emergency medicine.
no code implementations • 20 Mar 2024 • Mostafa M. Amin, Björn W. Schuller
Recent studies have demonstrated the emerging capabilities of foundation models like ChatGPT in several fields, including affective computing.
no code implementations • 2 Feb 2024 • Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller
Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction.
no code implementations • 11 Dec 2023 • Anna Derington, Hagen Wierstorf, Ali Özkil, Florian Eyben, Felix Burkhardt, Björn W. Schuller
Nine different transformer based models, an xLSTM based model and a convolutional baseline model are tested for arousal, valence, dominance, and emotional categories.
no code implementations • 22 Oct 2023 • Liyizhe Peng, Zixing Zhang, Tao Pang, Jing Han, Huan Zhao, Hao Chen, Björn W. Schuller
This indicates the strong transferability and feasibility of LLMs in the field of emotion recognition.
1 code implementation • 26 Sep 2023 • Chia-Hsin Lin, Charles Jones, Björn W. Schuller, Harry Coppock
Despite significant advancements in deep learning for vision and natural language, unsupervised domain adaptation in audio remains relatively unexplored.
no code implementations • 18 Sep 2023 • Xiangheng He, Junjie Chen, Björn W. Schuller
Our proposed method is significantly superior in terms of UAR and F1 to the single-task and multi-task baselines with p-values < 0. 05.
1 code implementation • 15 Sep 2023 • Alexander Gebhard, Andreas Triantafyllopoulos, Teresa Bez, Lukas Christ, Alexander Kathan, Björn W. Schuller
Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research.
1 code implementation • 26 Aug 2023 • Mostafa M. Amin, Rui Mao, Erik Cambria, Björn W. Schuller
In this work, we widely study the capabilities of the ChatGPT models, namely GPT-4 and GPT-3. 5, on 13 affective computing problems, namely aspect extraction, aspect polarity classification, opinion extraction, sentiment analysis, sentiment intensity ranking, emotions intensity ranking, suicide tendency detection, toxicity detection, well-being assessment, engagement measurement, personality assessment, sarcasm detection, and subjectivity detection.
no code implementations • 22 Aug 2023 • Yuezhou Zhang, Amos A Folarin, Judith Dineley, Pauline Conde, Valeria de Angel, Shaoxiong Sun, Yatharth Ranjan, Zulqarnain Rashid, Callum Stewart, Petroula Laiou, Heet Sankesara, Linglong Qian, Faith Matcham, Katie M White, Carolin Oetzmann, Femke Lamers, Sara Siddi, Sara Simblett, Björn W. Schuller, Srinivasan Vairavan, Til Wykes, Josep Maria Haro, Brenda WJH Penninx, Vaibhav A Narayan, Matthew Hotopf, Richard JB Dobson, NIcholas Cummins, RADAR-CNS consortium
Our study identified 29 topics in 3919 smartphone-collected speech recordings from 265 participants using the Whisper tool and BERTopic model.
1 code implementation • 6 Jul 2023 • Mostafa M. Amin, Erik Cambria, Björn W. Schuller
In this work, we extend this by exploring if ChatGPT has novel knowledge that would enhance existing specialised models when they are fused together.
no code implementations • 15 May 2023 • Lukas Stappen, Jeremy Dillmann, Serena Striegel, Hans-Jörg Vögel, Nicolas Flores-Herr, Björn W. Schuller
This paper aims to serve as a comprehensive guide for researchers and practitioners, offering insights into the current state, potential applications, and future research directions for generative artificial intelligence and foundation models within the context of intelligent vehicles.
1 code implementation • 5 May 2023 • Lukas Christ, Shahin Amiriparian, Alice Baird, Alexander Kathan, Niklas Müller, Steffen Klug, Chris Gagne, Panagiotis Tzirakis, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller
Participants predict the presence of spontaneous humour in a cross-cultural setting.
no code implementations • 28 Apr 2023 • Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Alexander Barnhill, Maurice Gerczuk, Andreas Triantafyllopoulos, Alice Baird, Panagiotis Tzirakis, Chris Gagne, Alan S. Cowen, Nikola Lackovic, Marie-José Caraty, Claude Montacié
The ACM Multimedia 2023 Computational Paralinguistics Challenge addresses two different problems for the first time in a research competition under well-defined conditions: In the Emotion Share Sub-Challenge, a regression on speech has to be made; and in the Requests Sub-Challenges, requests and complaints need to be detected.
3 code implementations • 18 Apr 2023 • Zheng Lian, Haiyang Sun, Licai Sun, Kang Chen, Mingyu Xu, Kexin Wang, Ke Xu, Yu He, Ying Li, Jinming Zhao, Ye Liu, Bin Liu, Jiangyan Yi, Meng Wang, Erik Cambria, Guoying Zhao, Björn W. Schuller, JianHua Tao
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
no code implementations • 3 Mar 2023 • Mostafa M. Amin, Erik Cambria, Björn W. Schuller
We utilise three baselines, a robust language model (RoBERTa-base), a legacy word model with pretrained embeddings (Word2Vec), and a simple bag-of-words baseline (BoW).
1 code implementation • 1 Mar 2023 • Hagen Wierstorf, Johannes Wagner, Florian Eyben, Felix Burkhardt, Björn W. Schuller
Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing.
1 code implementation • 23 Jan 2023 • Zhao Ren, Yi Chang, Thanh Tam Nguyen, Yang Tan, Kun Qian, Björn W. Schuller
This work introduces both classic machine learning and deep learning for comparison, and further offer insights about the advances and future research directions in deep learning for heart sound analysis.
no code implementations • 31 Dec 2022 • Björn W. Schuller, Shahin Amiriparian, Anton Batliner, Alexander Gebhard, Maurice Gerzcuk, Vincent Karas, Alexander Kathan, Lennart Seizer, Johanna Löchner
We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.
1 code implementation • 21 Dec 2022 • Lukas Christ, Shahin Amiriparian, Manuel Milling, Ilhan Aslan, Björn W. Schuller
Telling stories is an integral part of human communication which can evoke emotions and influence the affective states of the audience.
1 code implementation • 15 Dec 2022 • Jobie Budd, Kieran Baker, Emma Karoune, Harry Coppock, Selina Patel, Ana Tendero Cañadas, Alexander Titcomb, Richard Payne, David Hurley, Sabrina Egglestone, Lorraine Butler, Jonathon Mellor, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Radka Jersakova, Rachel A. McKendry, Peter Diggle, Sylvia Richardson, Björn W. Schuller, Steven Gilmour, Davide Pigoli, Stephen Roberts, Josef Packham, Tracey Thornley, Chris Holmes
The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date.
no code implementations • 15 Dec 2022 • Davide Pigoli, Kieran Baker, Jobie Budd, Lorraine Butler, Harry Coppock, Sabrina Egglestone, Steven G. Gilmour, Chris Holmes, David Hurley, Radka Jersakova, Ivan Kiskin, Vasiliki Koutra, Jonathon Mellor, George Nicholson, Joe Packham, Selina Patel, Richard Payne, Stephen J. Roberts, Björn W. Schuller, Ana Tendero-Cañadas, Tracey Thornley, Alexander Titcomb
Since early in the coronavirus disease 2019 (COVID-19) pandemic, there has been interest in using artificial intelligence methods to predict COVID-19 infection status based on vocal audio signals, for example cough recordings.
1 code implementation • 15 Dec 2022 • Harry Coppock, George Nicholson, Ivan Kiskin, Vasiliki Koutra, Kieran Baker, Jobie Budd, Richard Payne, Emma Karoune, David Hurley, Alexander Titcomb, Sabrina Egglestone, Ana Tendero Cañadas, Lorraine Butler, Radka Jersakova, Jonathon Mellor, Selina Patel, Tracey Thornley, Peter Diggle, Sylvia Richardson, Josef Packham, Björn W. Schuller, Davide Pigoli, Steven Gilmour, Stephen Roberts, Chris Holmes
Recent work has reported that AI classifiers trained on audio recordings can accurately predict severe acute respiratory syndrome coronavirus 2 (SARSCoV2) infection status.
1 code implementation • 26 Oct 2022 • Yi Chang, Zhao Ren, Thanh Tam Nguyen, Kun Qian, Björn W. Schuller
Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.
no code implementations • 6 Oct 2022 • Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, JianHua Tao
Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research.
2 code implementations • 28 Sep 2022 • Lukas Christ, Shahin Amiriparian, Alexander Kathan, Niklas Müller, Andreas König, Björn W. Schuller
In this context, we propose a novel multimodal architecture that yields the best overall results.
no code implementations • 15 Sep 2022 • Vincent Karas, Andreas Triantafyllopoulos, Meishu Song, Björn W. Schuller
Vocal bursts play an important role in communicating affect, making them valuable for improving speech emotion recognition.
no code implementations • 3 Jul 2022 • Mani Kumar Tellamekala, Ömer Sümer, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar
We also study how 3D face shapes performed on AU intensity estimation on BP4D and DISFA datasets, and report that 3D face features were on par with 2D appearance features in AUs 4, 6, 10, 12, and 25, but not the entire set of AUs.
1 code implementation • 23 Jun 2022 • Lukas Christ, Shahin Amiriparian, Alice Baird, Panagiotis Tzirakis, Alexander Kathan, Niklas Müller, Lukas Stappen, Eva-Maria Meßner, Andreas König, Alan Cowen, Erik Cambria, Björn W. Schuller
For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions.
no code implementations • 20 Jun 2022 • Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller
As compared to other existing COVID-19 sound datasets, the unique feature of the COVYT dataset is that it comprises both COVID-19 positive and negative samples from all 65 speakers.
1 code implementation • 14 Jun 2022 • Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Xin Jing, Björn W. Schuller
In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction.
no code implementations • 12 Jun 2022 • Mani Kumar Tellamekala, Shahin Amiriparian, Björn W. Schuller, Elisabeth André, Timo Giesbrecht, Michel Valstar
In particular, we impose Calibration and Ordinal Ranking constraints on the variance vectors of audiovisual latent distributions.
no code implementations • 13 May 2022 • Björn W. Schuller, Anton Batliner, Shahin Amiriparian, Christian Bergler, Maurice Gerczuk, Natalie Holz, Pauline Larrouy-Maestri, Sebastian P. Bayerl, Korbinian Riedhammer, Adria Mallol-Ragolta, Maria Pateraki, Harry Coppock, Ivan Kiskin, Marianne Sinka, Stephen Roberts
The ACM Multimedia 2022 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the Vocalisations and Stuttering Sub-Challenges, a classification on human non-verbal vocalisations and speech has to be made; the Activity Sub-Challenge aims at beyond-audio human activity recognition from smartwatch sensor data; and in the Mosquitoes Sub-Challenge, mosquitoes need to be detected.
no code implementations • 10 May 2022 • Xiangheng He, Andreas Triantafyllopoulos, Alexander Kathan, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Küster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, Björn W. Schuller
Previous studies have shown the correlation between sensor data collected from mobile phones and human depression states.
no code implementations • 9 May 2022 • Andreas Triantafyllopoulos, Sandra Zänkert, Alice Baird, Julian Konzok, Brigitte M. Kudielka, Björn W. Schuller
Stress is a major threat to well-being that manifests in a variety of physiological and mental symptoms.
no code implementations • 9 May 2022 • Andreas Triantafyllopoulos, Sandra Ottl, Alexander Gebhard, Esther Rituerto-González, Mirko Jaumann, Steffen Hüttner, Valerie Dieter, Patrick Schneeweiß, Inga Krauß, Maurice Gerczuk, Shahin Amiriparian, Björn W. Schuller
Although running is a common leisure activity and a core training regiment for several athletes, between $29\%$ and $79\%$ of runners sustain an overuse injury each year.
no code implementations • 6 May 2022 • Alexander Kathan, Andreas Triantafyllopoulos, Xiangheng He, Manuel Milling, Tianhao Yan, Srividya Tirunellai Rajamani, Ludwig Küster, Mathias Harrer, Elena Heber, Inga Grossmann, David D. Ebert, Björn W. Schuller
Digital health applications are becoming increasingly important for assessing and monitoring the wellbeing of people suffering from mental health conditions like depression.
2 code implementations • 4 May 2022 • Rodrigo Mira, Alexandros Haliassos, Stavros Petridis, Björn W. Schuller, Maja Pantic
Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip movements into the corresponding audio.
no code implementations • 1 Apr 2022 • Andreas Triantafyllopoulos, Johannes Wagner, Hagen Wierstorf, Maximilian Schmitt, Uwe Reichel, Florian Eyben, Felix Burkhardt, Björn W. Schuller
Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 31 Mar 2022 • Xin Jing, Shuo Liu, Emilia Parada-Cabaleiro, Andreas Triantafyllopoulos, Meishu Song, Zijiang Yang, Björn W. Schuller
Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission.
1 code implementation • 30 Mar 2022 • Yi Chang, Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl, Björn W. Schuller
Respiratory sound classification is an important tool for remote screening of respiratory-related diseases such as pneumonia, asthma, and COVID-19.
no code implementations • 29 Mar 2022 • Zijiang Yang, Xin Jing, Andreas Triantafyllopoulos, Meishu Song, Ilhan Aslan, Björn W. Schuller
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond.
no code implementations • 24 Mar 2022 • Vincent Karas, Mani Kumar Tellamekala, Adria Mallol-Ragolta, Michel Valstar, Björn W. Schuller
To clearly understand the performance differences between recurrent and attention models in audiovisual affect recognition, we present a comprehensive evaluation of fusion models based on LSTM-RNNs, self-attention and cross-modal attention, trained for valence and arousal estimation.
1 code implementation • 14 Mar 2022 • Johannes Wagner, Andreas Triantafyllopoulos, Hagen Wierstorf, Maximilian Schmitt, Felix Burkhardt, Florian Eyben, Björn W. Schuller
Recent advances in transformer-based architectures which are pre-trained in self-supervised manner have shown great promise in several machine learning tasks.
Ranked #1 on Emotion Recognition on MSP-Podcast
no code implementations • 14 Mar 2022 • Björn W. Schuller, Dagmar M. Schuller
Emotion and a broader range of affective driver states can be a life decisive factor on the road.
no code implementations • 10 Mar 2022 • Björn W. Schuller, Alican Akman, Yi Chang, Harry Coppock, Alexander Gebhard, Alexander Kathan, Esther Rituerto-González, Andreas Triantafyllopoulos, Florian B. Pokorny
We categorise potential computer audition applications according to the five elements of earth, water, air, fire, and aether, proposed by the ancient Greeks in their five element theory; this categorisation serves as a framework to discuss computer audition in relation to different ecological aspects.
no code implementations • 9 Mar 2022 • Yi Chang, Sofiane Laridi, Zhao Ren, Gregory Palmer, Björn W. Schuller, Marco Fisichella
The proposed framework consists of i) federated learning for data privacy, and ii) adversarial training at the training stage and randomisation at the testing stage for model robustness.
3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk
The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.
no code implementations • 17 Feb 2022 • Harry Coppock, Alican Akman, Christian Bergler, Maurice Gerczuk, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Jing Han, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Panagiotis Tzirakis, Anton Batliner, Cecilia Mascolo, Björn W. Schuller
The COVID-19 pandemic has caused massive humanitarian and economic damage.
no code implementations • 2 Feb 2022 • Mostafa M. Amin, Björn W. Schuller
We present a theoretical analysis of the method, in addition to an empirical comparison against two standard methods for fairness, namely data balancing and adversarial training.
no code implementations • 10 Jan 2022 • Kun Zhou, Berrak Sisman, Rajib Rana, Björn W. Schuller, Haizhou Li
As desired, the proposed network controls the fine-grained emotion intensity in the output speech.
no code implementations • 4 Nov 2021 • Panagiotis Tzirakis, Dénes Boros, Elnar Hajiyev, Björn W. Schuller
To show the favourable properties of our pre-trained model on modelling facial affect, we use the RECOLA database, and compare with the current state-of-the-art approach.
no code implementations • 13 Oct 2021 • Andreas Triantafyllopoulos, Uwe Reichel, Shuo Liu, Stephan Huber, Florian Eyben, Björn W. Schuller
In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER).
no code implementations • 13 Oct 2021 • Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller
This paper aims to automatically detect COVID-19 patients by analysing the acoustic information embedded in coughs.
no code implementations • 4 Oct 2021 • Andreas Triantafyllopoulos, Manuel Milling, Konstantinos Drossos, Björn W. Schuller
Although these factors play a well-understood role in the performance of ASC models, most works report single evaluation metrics taking into account all different strata of a particular dataset.
no code implementations • 30 Jul 2021 • Alican Akman, Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Lyn Jones, Björn W. Schuller
We report on cross-running the recent COVID-19 Identification ResNet (CIdeR) on the two Interspeech 2021 COVID-19 diagnosis from cough and speech audio challenges: ComParE and DiCOVA.
no code implementations • 27 Jul 2021 • Alice Baird, Lukas Stappen, Lukas Christ, Lea Schumann, Eva-Maria Meßner, Björn W. Schuller
We utilise a Long Short-Term Memory, Recurrent Neural Network to explore the benefit of fusing these physiological signals with arousal as the target, learning from various audio, video, and textual based features.
1 code implementation • 25 Jul 2021 • Lukas Stappen, Lea Schumann, Benjamin Sertolli, Alice Baird, Benjamin Weigel, Erik Cambria, Björn W. Schuller
With this in mind, the MuSe-Toolbox provides the functionality to run exhaustive searches for meaningful class clusters in the continuous gold standards.
1 code implementation • 18 Jul 2021 • Xiangheng He, Junjie Chen, Georgios Rizos, Björn W. Schuller
Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information.
no code implementations • 30 Jun 2021 • Sicheng Zhao, Xingxu Yao, Jufeng Yang, Guoli Jia, Guiguang Ding, Tat-Seng Chua, Björn W. Schuller, Kurt Keutzer
Images can convey rich semantics and induce various emotions in viewers.
no code implementations • 16 Jun 2021 • Pingchuan Ma, Rodrigo Mira, Stavros Petridis, Björn W. Schuller, Maja Pantic
The large amount of audiovisual content being shared online today has drawn substantial attention to the prospect of audiovisual self-supervised learning.
1 code implementation • 4 May 2021 • Lukas Stappen, Jason Thies, Gerhard Hagerer, Björn W. Schuller, Georg Groh
To unfold the tremendous amount of multimedia data uploaded daily to social media platforms, effective topic modeling techniques are needed.
no code implementations • 27 Apr 2021 • Rodrigo Mira, Konstantinos Vougioukas, Pingchuan Ma, Stavros Petridis, Björn W. Schuller, Maja Pantic
In this work, we propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs) which translates spoken video to waveform end-to-end without using any intermediate representation or separate waveform synthesis algorithm.
1 code implementation • 23 Apr 2021 • Shahin Amiriparian, Tobias Hübner, Maurice Gerczuk, Sandra Ottl, Björn W. Schuller
By obtaining state-of-the-art results on a set of paralinguistics tasks, we demonstrate the suitability of the proposed transfer learning approach for embedded audio signal processing, even when data is scarce.
no code implementations • 20 Apr 2021 • Shahin Amiriparian, Artem Sokolov, Ilhan Aslan, Lukas Christ, Maurice Gerczuk, Tobias Hübner, Dmitry Lamanov, Manuel Milling, Sandra Ottl, Ilya Poduremennykh, Evgeniy Shuranov, Björn W. Schuller
Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 14 Apr 2021 • Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, Björn W. Schuller
Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities.
no code implementations • 19 Mar 2021 • Sicheng Zhao, Quanwei Huang, YouBao Tang, Xingxu Yao, Jufeng Yang, Guiguang Ding, Björn W. Schuller
Recently, extensive research efforts have been dedicated to understanding the emotions of images.
1 code implementation • 4 Mar 2021 • Panagiotis Tzirakis, Anh Nguyen, Stefanos Zafeiriou, Björn W. Schuller
In this paper, we propose a novel framework that can capture both the semantic and the paralinguistic information in the signal.
Speech Emotion Recognition Sound Audio and Speech Processing
no code implementations • 24 Feb 2021 • Björn W. Schuller, Anton Batliner, Christian Bergler, Cecilia Mascolo, Jing Han, Iulia Lefter, Heysem Kaya, Shahin Amiriparian, Alice Baird, Lukas Stappen, Sandra Ottl, Maurice Gerczuk, Panagiotis Tzirakis, Chloë Brown, Jagmohan Chauhan, Andreas Grammenos, Apinan Hasthanasombat, Dimitris Spathis, Tong Xia, Pietro Cicuta, Leon J. M. Rothkrantz, Joeri Zwerts, Jelle Treep, Casper Kaandorp
The INTERSPEECH 2021 Computational Paralinguistics Challenge addresses four different problems for the first time in a research competition under well-defined conditions: In the COVID-19 Cough and COVID-19 Speech Sub-Challenges, a binary classification on COVID-19 infection has to be made based on coughing sounds and speech; in the Escalation SubChallenge, a three-way assessment of the level of escalation in a dialogue is featured; and in the Primates Sub-Challenge, four species vs background need to be classified.
1 code implementation • 7 Jan 2021 • Harry Coppock, Alexander Gaskell, Panagiotis Tzirakis, Alice Baird, Lyn Jones, Björn W. Schuller
Our main contributions are as follows: (I) We demonstrate the first attempt to diagnose COVID-19 using end-to-end deep learning from a crowd-sourced dataset of audio samples, achieving ROC-AUC of 0. 846; (II) Our model, the COVID-19 Identification ResNet, (CIdeR), has potential for rapid scalability, minimal cost and improving performance as more data becomes available.
1 code implementation • 4 Jan 2021 • Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Björn W. Schuller, Jiajun Liu
In addition, extended learning period is a general challenge for deep RL which can impact the speed of learning for SER.
no code implementations • 29 Dec 2020 • Björn W. Schuller, Harry Coppock, Alexander Gaskell
The COVID-19 pandemic has affected the world unevenly; while industrial economies have been able to produce the tests necessary to track the spread of the virus and mostly avoided complete lockdowns, developing countries have faced issues with testing capacity.
no code implementations • 17 Dec 2020 • Katrin D. Bartl-Pokorny, Florian B. Pokorny, Anton Batliner, Shahin Amiriparian, Anastasia Semertzidou, Florian Eyben, Elena Kramer, Florian Schmidt, Rainer Schönweiler, Markus Wehler, Björn W. Schuller
Group differences in the front vowels /i:/ and /e:/ are additionally reflected in the variation of the fundamental frequency and the harmonics-to-noise ratio, group differences in back vowels /o:/ and /u:/ in statistics of the Mel-frequency cepstral coefficients and the spectral slope.
no code implementations • 29 Nov 2020 • Gauri Deshpande, Björn W. Schuller
This drives the research focus towards identifying the markers of COVID-19 in speech and other human generated audio signals.
no code implementations • 12 Jul 2020 • Liang Zhang, Johann Li, Ping Li, Xiaoyuan Lu, Peiyi Shen, Guangming Zhu, Syed Afaq Shah, Mohammed Bennarmoun, Kun Qian, Björn W. Schuller
To the best of our knowledge, MeDaS is the first open-source platform proving a collaborative and interactive service for researchers from a medical background easily using DL related toolkits, and at the same time for scientists or engineers from information sciences to understand the medical knowledge side.
no code implementations • 15 Jun 2020 • Lukas Stappen, Xinchen Du, Vincent Karas, Stefan Müller, Björn W. Schuller
Systems for the automatic recognition and detection of automotive parts are crucial in several emerging research areas in the development of intelligent vehicles.
no code implementations • 1 Jun 2020 • Kazi Nazmul Haque, Rajib Rana, Björn W. Schuller
Hence, with the extensive experimental results, we have demonstrated that by harnessing the power of the high-fidelity audio generation, the proposed GAAE model can learn powerful representation from unlabelled dataset leveraging a fewer percentage of labelled data as supervision/guidance.
no code implementations • 15 May 2020 • Mostafa M. Mohamed, Mina A. Nessiem, Björn W. Schuller
In this mini-survey, we review all the literature we found to date, that attempt to solve the packet-loss in speech using deep learning methods.
no code implementations • 15 May 2020 • Mostafa M. Mohamed, Björn W. Schuller
We explore matched, mismatched, and multi-condition training settings.
no code implementations • 15 May 2020 • Mostafa M. Mohamed, Björn W. Schuller
Additionally, extending this with an end-to-end emotion prediction neural network provides a network that performs SER from audio with lost frames, end-to-end.
1 code implementation • 15 May 2020 • Shahin Amiriparian, Pawel Winokurow, Vincent Karas, Sandra Ottl, Maurice Gerczuk, Björn W. Schuller
On the development partition of the data, we achieve Spearman's correlation coefficients of . 324, . 283, and . 320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively.
no code implementations • 11 May 2020 • Tomoya Koike, Kun Qian, Björn W. Schuller, Yoshiharu Yamamoto
To the best of our knowledge, it is the first public toolkit assembling a series of state-of-the-art deep learning technologies.
no code implementations • 30 Apr 2020 • Jing Han, Kun Qian, Meishu Song, Zijiang Yang, Zhao Ren, Shuo Liu, Juan Liu, Huaiyuan Zheng, Wei Ji, Tomoya Koike, Xiao Li, Zixing Zhang, Yoshiharu Yamamoto, Björn W. Schuller
In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety.
1 code implementation • 30 Apr 2020 • Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris
Multimodal Sentiment Analysis in Real-life Media (MuSe) 2020 is a Challenge-based Workshop focusing on the tasks of sentiment recognition, as well as emotion-target engagement and trustworthiness detection by means of more comprehensively integrating the audio-visual and language modalities.
no code implementations • 24 Mar 2020 • Björn W. Schuller, Dagmar M. Schuller, Kun Qian, Juan Liu, Huaiyuan Zheng, Xiao Li
We come to the conclusion that CA appears ready for implementation of (pre-)diagnosis and monitoring tools, and more generally provides rich and significant, yet so far untapped potential in the fight against COVID-19 spread.
no code implementations • 2 Jan 2020 • Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Junaid Qadir, Björn W. Schuller
Research on speech processing has traditionally considered the task of designing hand-engineered acoustic features (feature engineering) as a separate distinct problem from the task of designing efficient machine learning (ML) models to make prediction and classification decisions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 24 Oct 2019 • Thejan Rajapakshe, Rajib Rana, Siddique Latif, Sara Khalifa, Björn W. Schuller
Deep reinforcement learning (deep RL) is a combination of deep learning with reinforcement learning principles to create efficient methods that can learn by interacting with its environment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 22 May 2016 • Maximilian Schmitt, Björn W. Schuller
We introduce openXBOW, an open-source toolkit for the generation of bag-of-words (BoW) representations from multimodal input.