no code implementations • 5 Feb 2025 • Qiquan Zhang, Buddhi Wickramasinghe, Eliathamby Ambikairajah, Vidhyasaharan Sethu, Haizhou Li
Hand-crafted features, such as Mel-filterbanks, have traditionally been the choice for many audio processing applications.
no code implementations • 5 Nov 2024 • Hanyu Meng, Jeroen Breebaart, Jeremy Stoddard, Vidhyasaharan Sethu, Eliathamby Ambikairajah
Additionally, we introduce FOA-Conv3D, a novel back-end network for effectively utilising the SSCV feature with a 3D convolutional encoder.
1 code implementation • 31 Jul 2024 • Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
There has been a significant focus on modelling emotion ambiguity in recent years, with advancements made in representing emotions as distributions to capture ambiguity.
no code implementations • 18 Jun 2024 • Hanyu Meng, Qiquan Zhang, Xiangyu Zhang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
The remarkable ability of humans to selectively focus on a target speaker in cocktail party scenarios is facilitated by binaural audio processing.
no code implementations • 17 Jun 2024 • Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li
In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer.
1 code implementation • 21 May 2024 • Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps
Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task.
1 code implementation • 10 Apr 2024 • Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah
There is increasing interest in the use of the LEArnable Front-end (LEAF) in a variety of speech processing systems.
no code implementations • 18 Jan 2024 • Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li
Transformer architecture has enabled recent progress in speech enhancement.
no code implementations • 10 Aug 2021 • Jingyao Wu, Ting Dang, Vidhyasaharan Sethu, Eliathamby Ambikairajah
We propose a Markovian framework referred to as Dynamic Ordinal Markov Model (DOMM) that makes use of both absolute and relative ordinal information, to improve speech based ordinal emotion prediction.
no code implementations • 3 Sep 2019 • Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah
The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve.