Search Results for author: Rohan Kumar Das

Found 29 papers, 8 papers with code

Multi-modal Speech Enhancement with Limited Electromyography Channels

no code implementations11 Jan 2025 Fuyuan Feng, Longting Xu, Rohan Kumar Das

Speech enhancement (SE) aims to improve the clarity, intelligibility, and quality of speech signals for various speech enabled applications.

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection

1 code implementation2 Nov 2024 Han Yin, Yang Xiao, Jisheng Bai, Rohan Kumar Das

Sound Event Detection (SED) is challenging in noisy environments where overlapping sounds obscure target events.

 Ranked #1 on Sound Event Detection on WildDESED (using extra training data)

Audio Source Separation Event Detection +1

TF-Mamba: A Time-Frequency Network for Sound Source Localization

no code implementations8 Sep 2024 Yang Xiao, Rohan Kumar Das

We consider the Mamba-based model to analyze spatial features from speech signals by fusing both time and frequency features, and we develop an SSL system called TF-Mamba.

Mamba Sound Source Localization +1

Configurable DOA Estimation using Incremental Learning

no code implementations4 Jul 2024 Yang Xiao, Rohan Kumar Das

This study introduces a progressive neural network (PNN) model for direction of arrival (DOA) estimation, DOA-PNN, addressing the challenge due to catastrophic forgetting in adapting dynamic acoustic environments.

Continual Learning Incremental Learning

Mixstyle based Domain Generalization for Sound Event Detection with Heterogeneous Training Data

no code implementations4 Jul 2024 Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

This work explores domain generalization (DG) for sound event detection (SED), advancing adaptability towards real-world scenarios.

Domain Generalization Event Detection +1

WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System

1 code implementation4 Jul 2024 Yang Xiao, Rohan Kumar Das

This work aims to advance sound event detection (SED) research by presenting a new large language model (LLM)-powered dataset namely wild domestic environment sound event detection (WildDESED).

Event Detection Language Modeling +3

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection

no code implementations4 Jul 2024 Yang Xiao, Rohan Kumar Das

This work explores class-incremental learning (CIL) for sound event detection (SED), advancing adaptability towards real-world scenarios.

class-incremental learning Class Incremental Learning +4

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels

no code implementations29 Jun 2024 Yang Xiao, Han Yin, Jisheng Bai, Rohan Kumar Das

Our proposed method shows superior macro-average pAUC and polyphonic SED score performance on the DCASE 2024 Challenge Task 4 validation dataset and public evaluation dataset.

Domain Generalization Event Detection +2

How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?

no code implementations4 Jun 2024 Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li

Recent work shows that countermeasures (CMs) trained on partially spoofed audio can effectively detect such spoofing.

Decision Making Sentence

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

1 code implementation14 Apr 2024 Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario.

Dual Knowledge Distillation for Efficient Sound Event Detection

no code implementations5 Feb 2024 Yang Xiao, Rohan Kumar Das

To address this issue, we introduce a novel framework referred to as dual knowledge distillation for developing efficient SED systems in this work.

Ranked #2 on Sound Event Detection on DESED (using extra training data)

Event Detection Knowledge Distillation +1

Adaptive-avg-pooling based Attention Vision Transformer for Face Anti-spoofing

no code implementations10 Jan 2024 Jichen Yang, Fangfan Chen, Rohan Kumar Das, Zhengyu Zhu, Shunsi Zhang

In this work, we propose a novel vision transformer referred to as adaptive-avg-pooling based attention vision transformer (AAViT) that uses modules of adaptive average pooling and attention to replace the module of average value computing.

Avg Face Anti-Spoofing

A Multi-Task Learning Framework for Sound Event Detection using High-level Acoustic Characteristics of Sounds

no code implementations18 May 2023 Tanmay Khandelwal, Rohan Kumar Das

Sound event detection (SED) entails identifying the type of sound and estimating its temporal boundaries from acoustic signals.

Event Detection Multi-Task Learning +1

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations27 Oct 2022 Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

no code implementations3 Feb 2022 Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.

Text-Independent Speaker Verification

HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

3 code implementations12 Nov 2021 Rohan Kumar Das, Ruijie Tao, Haizhou Li

This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).

Domain Adaptation Speaker Recognition

Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

no code implementations2 Oct 2021 Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

The automatic recognition of pathological speech, particularly from children with any articulatory impairment, is a challenging task due to various reasons.

Data Augmentation speech-recognition +1

Speaker-Utterance Dual Attention for Speaker and Utterance Verification

no code implementations20 Aug 2020 Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, ShengMei Shen, Haizhou Li

The proposed SUDA features an attention mask mechanism to learn the interaction between the speaker and utterance information streams.

Speaker Verification

Generative x-vectors for text-independent speaker verification

no code implementations17 Sep 2018 Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems.

Text-Independent Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.