Search Results for author: Desh Raj

Found 22 papers, 9 papers with code

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives

no code implementations • 14 Feb 2024 • Desh Raj

For this, we describe the Streaming Unmixing and Recognition Transducer (SURT).

speaker-diarization Speaker Diarization +2

Paper
Add Code

On Speaker Attribution with SURT

1 code implementation • 28 Jan 2024 • Desh Raj, Matthew Wiesner, Matthew Maciejewski, Leibny Paola Garcia-Perera, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR).

speech-recognition Speech Recognition

770

Paper
Code

Updated Corpora and Benchmarks for Long-Form Speech Recognition

1 code implementation • 26 Sep 2023 • Jennifer Drexler Fox, Desh Raj, Natalie Delworth, Quinn McNamara, Corey Miller, Migüel Jetté

The vast majority of ASR research uses corpora in which both the training and test data have been pre-segmented into utterances.

speech-recognition Speech Recognition

Paper
Code

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

1 code implementation • 26 Sep 2023 • Dongji Gao, Hainan Xu, Desh Raj, Leibny Paola Garcia Perera, Daniel Povey, Sanjeev Khudanpur

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

770

Paper
Code

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

1 code implementation • 18 Sep 2023 • George August Wright, Umberto Cappellazzo, Salah Zaiem, Desh Raj, Lucas Ondel Yang, Daniele Falavigna, Mohamed Nabih Ali, Alessio Brutti

In self-attention models for automatic speech recognition (ASR), early-exit architectures enable the development of dynamic models capable of adapting their size and architecture to varying levels of computational resources and ASR performance demands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios

no code implementations • 23 Jun 2023 • Samuele Cornell, Matthew Wiesner, Shinji Watanabe, Desh Raj, Xuankai Chang, Paola Garcia, Matthew Maciejewski, Yoshiki Masuyama, Zhong-Qiu Wang, Stefano Squartini, Sanjeev Khudanpur

The CHiME challenges have played a significant role in the development and evaluation of robust automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition

1 code implementation • 18 Jun 2023 • Desh Raj, Daniel Povey, Sanjeev Khudanpur

The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR).

Domain Adaptation speech-recognition +1

770

Paper
Code

GPU-accelerated Guided Source Separation for Meeting Transcription

2 code implementations • 10 Dec 2022 • Desh Raj, Daniel Povey, Sanjeev Khudanpur

In this paper, we describe our improved implementation of GSS that leverages the power of modern GPU-based pipelines, including batched processing of frequencies and segments, to provide 300x speed-up over CPU-based inference.

Ranked #2 on Speech Recognition on LibriCSS

blind source separation Target Speaker Extraction

Paper
Code

Adapting self-supervised models to multi-talker speech recognition using speaker embeddings

no code implementations • 1 Nov 2022 • Zili Huang, Desh Raj, Paola García, Sanjeev Khudanpur

Self-supervised learning (SSL) methods which learn representations of data without explicit supervision have gained popularity in speech-processing tasks, particularly for single-talker applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Anchored Speech Recognition with Neural Transducers

no code implementations • 20 Oct 2022 • Desh Raj, Junteng Jia, Jay Mahadeokar, Chunyang Wu, Niko Moritz, Xiaohui Zhang, Ozlem Kalinli

In this paper, we investigate anchored speech recognition to make neural transducers robust to background speech.

speech-recognition Speech Recognition

Paper
Add Code

Low-Latency Speech Separation Guided Diarization for Telephone Conversations

1 code implementation • 5 Apr 2022 • Giovanni Morrone, Samuele Cornell, Desh Raj, Luca Serafini, Enrico Zovato, Alessio Brutti, Stefano Squartini

In particular, we compare two low-latency speech separation models.

Action Detection Activity Detection +5

Paper
Code

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

no code implementations • 10 Oct 2021 • Matthew Wiesner, Desh Raj, Sanjeev Khudanpur

Self-supervised model pre-training has recently garnered significant interest, but relatively few efforts have explored using additional resources in fine-tuning these models.

Few-Shot Learning

Paper
Add Code

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

no code implementations • 17 Sep 2021 • Desh Raj, Liang Lu, Zhuo Chen, Yashesh Gaur, Jinyu Li

Streaming recognition of multi-talker conversations has so far been evaluated only for 2-speaker single-turn sessions.

Speech Separation

Paper
Add Code

Target-speaker Voice Activity Detection with Improved I-Vector Estimation for Unknown Number of Speaker

no code implementations • 7 Aug 2021 • Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech.

Action Detection Activity Detection +3

Paper
Add Code

Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem

1 code implementation • 5 Apr 2021 • Desh Raj, Sanjeev Khudanpur

We also derive an approximation bound for the algorithm in terms of the maximum number of hypotheses speakers.

graph partitioning speaker-diarization +1

Paper
Code

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

no code implementations • 2 Feb 2021 • Shota Horiguchi, Nelson Yalta, Paola Garcia, Yuki Takashima, Yawen Xue, Desh Raj, Zili Huang, Yusuke Fujita, Shinji Watanabe, Sanjeev Khudanpur

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge.

Clustering

Paper
Add Code

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation • 3 Nov 2020 • Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Paper
Code

CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

no code implementations • 20 Apr 2020 • Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Following the success of the 1st, 2nd, 3rd, 4th and 5th CHiME challenges we organize the 6th CHiME Speech Separation and Recognition Challenge (CHiME-6).

speaker-diarization Speaker Diarization +4

Paper
Add Code

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

no code implementations • 18 Nov 2019 • Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.

Speaker Separation Speech Enhancement +3

Paper
Add Code

Probing the Information Encoded in X-vectors

no code implementations • 13 Sep 2019 • Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks.

Data Augmentation Sentence +3

Paper
Add Code

Learning local and global contexts using a convolutional recurrent network model for relation classification in biomedical text

no code implementations • CONLL 2017 • Desh Raj, Sunil Sahu, Ashish Anand

Classification Dependency Parsing +9

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.