1 code implementation • 30 May 2025 • Xin Jing, Jiadong Wang, Iosif Tsangko, Andreas Triantafyllopoulos, Björn W. Schuller
We demonstrate the effectiveness of MELT by fine-tuning four self-supervised learning (SSL) backbones and assessing speech emotion recognition performance across emotion datasets.
no code implementations • 15 May 2025 • Changzeng Fu, Zelin Fu, Xinhe Kuang, Jiacheng Dong, Qi Zhang, Kaifeng Su, Yikai Su, Wenbo Shi, Junfeng Yao, Yuliang Zhao, Shiqi Zhao, Jiadong Wang, Siyang Song, Chaoran Liu, Yuichiro Yoshikawa, Björn Schuller, Hiroshi Ishiguro
The Multimodal Personality-aware Depression Detection (MPDD) Challenge aims to address this gap by incorporating multimodal data alongside individual difference factors.
no code implementations • 1 Apr 2025 • Wenxuan Wu, Xueyuan Chen, Shuai Wang, Jiadong Wang, Lingwei Meng, Xixin Wu, Helen Meng, Haizhou Li
Audio-Visual Target Speaker Extraction (AV-TSE) aims to mimic the human ability to enhance auditory perception using visual cues.
1 code implementation • 24 Jul 2024 • Yeying Jin, Xin Li, Jiadong Wang, Yan Zhang, Malu Zhang
There are 5, 442 daytime raindrop images and 9, 744 nighttime raindrop images.
1 code implementation • 29 Apr 2024 • Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li
To this end, we propose a novel selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.
no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li
Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.
Active Speaker Detection
Audio-Visual Active Speaker Detection
+2
1 code implementation • CVPR 2023 • Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li
To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.
no code implementations • 26 Mar 2020 • Malu Zhang, Jiadong Wang, Burin Amornpaisannon, Zhixuan Zhang, VPK Miriyala, Ammar Belatreche, Hong Qu, Jibin Wu, Yansong Chua, Trevor E. Carlson, Haizhou Li
In STDBP algorithm, the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner.