no code implementations • 5 Nov 2024 • Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder.
no code implementations • 26 Sep 2024 • Xin Hong, Yuan Gong, Vidhyasaharan Sethu, Ting Dang
Recent advancements in Large Language Models (LLMs) have demonstrated great success in many Natural Language Processing (NLP) tasks.
no code implementations • 17 Sep 2024 • Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed
Despite the crucial role relational thinking plays in human understanding of speech, it has yet to be leveraged in any artificial speech recognition systems.
1 code implementation • 31 Jul 2024 • Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity.
no code implementations • 18 Jun 2024 • Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing.
1 code implementation • 10 Apr 2024 • Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah
There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems.
no code implementations • 17 Oct 2023 • Antoni Dimitriadis, Siqi Pan, Vidhyasaharan Sethu, Beena Ahmed
Spatial HuBERT learns representations that outperform state-of-the-art single-channel speech representations on a variety of spatial downstream tasks, particularly in reverberant and noisy environments.
no code implementations • 21 Sep 2023 • Zheng Nan, Ting Dang, Vidhyasaharan Sethu, Beena Ahmed
Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences.
no code implementations • 10 Aug 2021 • Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
We propose a Markovian framework referred to as Dynamic Ordinal Markov Model (DOMM) that makes use of both absolute and relative ordinal information, to improve speech based ordinal emotion prediction.
no code implementations • 1 Sep 2019 • Vidhyasaharan Sethu, Emily Mower Provost, Julien Epps, Carlos Busso, NIcholas Cummins, Shrikanth Narayanan
A key reason for this is the lack of a common mathematical framework to describe all the relevant elements of emotion representations.