Search Results for author: Desh Raj

Found 22 papers, 9 papers with code

On Speaker Attribution with SURT

1 code implementation28 Jan 2024 Desh Raj, Matthew Wiesner, Matthew Maciejewski, Leibny Paola Garcia-Perera, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR).

speech-recognition Speech Recognition

Updated Corpora and Benchmarks for Long-Form Speech Recognition

1 code implementation26 Sep 2023 Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté

The vast majority of ASR research uses corpora in which both the training and test data have been pre-segmented into utterances.

speech-recognition Speech Recognition

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

1 code implementation18 Sep 2023 George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

1 code implementation18 Jun 2023 Desh Raj, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR).

Domain Adaptation speech-recognition +1

GPU-accelerated Guided Source Separation for Meeting Transcription

2 code implementations10 Dec 2022 Desh Raj, Daniel Povey, Sanjeev Khudanpur

In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference.

blind source separation Target Speaker Extraction

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

no code implementations1 Nov 2022 Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Anchored Speech Recognition with Neural Transducers

no code implementations20 Oct 2022 Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

no code implementations10 Oct 2021 Matthew Wiesner, Desh Raj, Sanjeev Khudanpur

Self-supervised model pre-training has recently garnered significant interest, but relatively few efforts have explored using additional resources in fine-tuning these models.

Few-Shot Learning

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations17 Sep 2021 Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations7 Aug 2021 Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem

1 code implementation5 Apr 2021 Desh Raj, Sanjeev Khudanpur

We also derive an approximation bound for the algorithm in terms of the maximum number of hypotheses speakers.

graph partitioning speaker-diarization +1

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation3 Nov 2020 Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations18 Nov 2019 Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

Probing the Information Encoded in X-vectors

no code implementations13 Sep 2019 Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks.

Data Augmentation Sentence +3

Cannot find the paper you are looking for? You can Submit a new open access paper.