no code implementations • 11 Jan 2025 • Fuyuan Feng, Longting Xu, Rohan Kumar Das
Speech enhancement (SE) aims to improve the clarity, intelligibility, and quality of speech signals for various speech enabled applications.
no code implementations • 15 Nov 2024 • Yang Xiao, Rohan Kumar Das
Transformers and their variants have achieved great success in speech processing.
1 code implementation • 2 Nov 2024 • Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das
Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events.
Ranked #1 on Sound Event Detection on WildDESED (using extra training data)
1 code implementation • 20 Sep 2024 • Han Yin, Jisheng Bai, Yang Xiao, Hui Wang, Siqi Zheng, Yafeng Chen, Rohan Kumar Das, Chong Deng, Jianfeng Chen
To address this issue, we propose the text-queried SED (TQ-SED) framework.
no code implementations • 8 Sep 2024 • Yang Xiao, Rohan Kumar Das
We consider the Mamba-based model to analyze spatial features from speech signals by fusing both time and frequency features, and we develop an SSL system called TF-Mamba.
no code implementations • 4 Jul 2024 • Yang Xiao, Rohan Kumar Das
This study introduces a progressive neural network (PNN) model for direction of arrival (DOA) estimation, DOA-PNN, addressing the challenge due to catastrophic forgetting in adapting dynamic acoustic environments.
no code implementations • 4 Jul 2024 • Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das
This work explores domain generalization (DG) for sound event detection (SED), advancing adaptability towards real-world scenarios.
1 code implementation • 4 Jul 2024 • Yang Xiao, Rohan Kumar Das
This work aims to advance sound event detection (SED) research by presenting a new large language model (LLM)-powered dataset namely wild domestic environment sound event detection (WildDESED).
Ranked #3 on Sound Event Detection on WildDESED
no code implementations • 4 Jul 2024 • Yang Xiao, Rohan Kumar Das
This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios.
no code implementations • 29 Jun 2024 • Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das
Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.
no code implementations • 4 Jun 2024 • Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li
Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing.
1 code implementation • 14 Apr 2024 • Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf
The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.
no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li
Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.
Active Speaker Detection Audio-Visual Active Speaker Detection +2
no code implementations • 5 Feb 2024 • Yang Xiao, Rohan Kumar Das
To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work.
Ranked #2 on Sound Event Detection on DESED (using extra training data)
no code implementations • 10 Jan 2024 • Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi Zhang
In this work, we propose a novel vision transformer referred to as adaptive-avg-pooling based attention vision transformer (AAViT) that uses modules of adaptive average pooling and attention to replace the module of average value computing.
no code implementations • 18 May 2023 • Tanmay Khandelwal, Rohan Kumar Das
Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals.
no code implementations • 25 Apr 2023 • Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh, Eng Siong Chng
Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.
no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera
This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.
no code implementations • 27 Oct 2022 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li
We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.
no code implementations • 3 Feb 2022 • Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li
The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.
3 code implementations • 12 Nov 2021 • Rohan Kumar Das, Ruijie Tao, Haizhou Li
This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).
1 code implementation • 8 Oct 2021 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li
In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals.
no code implementations • 2 Oct 2021 • Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna
The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons.
4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
Active Speaker Detection Audio-Visual Active Speaker Detection
1 code implementation • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
Active Speaker Detection Audio-Visual Active Speaker Detection
no code implementations • 20 Aug 2020 • Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, ShengMei Shen, Haizhou Li
The proposed SUDA features an attention mask mechanism to learn the interaction between the speaker and utterance information streams.
no code implementations • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2020 • Rohan Kumar Das, Jichen Yang and Haizhou Li
In this paper, we summarize the findings from the perspective of long range acoustic and deep features for spoof detection.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 17 Sep 2018 • Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li
Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems.