no code implementations • 18 May 2023 • Tanmay Khandelwal, Rohan Kumar Das
Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals.
no code implementations • 25 Apr 2023 • Tanmay Khandelwal, Rohan Kumar Das, Andrew Koh, Eng Siong Chng
Stage-1 of our proposed framework focuses on audio-tagging (AT), which assists the sound event detection (SED) system in Stage-2.
no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera
This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.
no code implementations • 27 Oct 2022 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li
We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.
no code implementations • 3 Feb 2022 • Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li
The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.
3 code implementations • 12 Nov 2021 • Rohan Kumar Das, Ruijie Tao, Haizhou Li
This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).
1 code implementation • 8 Oct 2021 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li
In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals.
no code implementations • 2 Oct 2021 • Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna
The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons.
4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
1 code implementation • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
no code implementations • 20 Aug 2020 • Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, ShengMei Shen, Haizhou Li
The proposed SUDA features an attention mask mechanism to learn the interaction between the speaker and utterance information streams.
no code implementations • 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2020 • Rohan Kumar Das, Jichen Yang and Haizhou Li
In this paper, we summarize the findings from the perspective of long range acoustic and deep features for spoof detection.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 17 Sep 2018 • Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li
Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems.