Search Results for author: Krishna C. Puvvada

Found 6 papers, 2 papers with code

Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio

2 code implementations • 9 Aug 2023 • Yang Zhang, Krishna C. Puvvada, Vitaly Lavrukhin, Boris Ginsburg

We propose CONF-TSASR, a non-autoregressive end-to-end time-frequency domain architecture for single-channel target-speaker automatic speech recognition (TS-ASR).

Automatic Speech Recognition speech-recognition +1

9,997

Paper
Code

SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

1 code implementation • 13 Oct 2023 • Zhehuai Chen, He Huang, Andrei Andrusenko, Oleksii Hrinchuk, Krishna C. Puvvada, Jason Li, Subhankar Ghosh, Jagadeesh Balam, Boris Ginsburg

We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

9,997

Paper
Code

Few-shot acoustic event detection via meta-learning

no code implementations • 21 Feb 2020 • Bowen Shi, Ming Sun, Krishna C. Puvvada, Chieh-Chi Kao, Spyros Matsoukas, Chao Wang

We study few-shot acoustic event detection (AED) in this paper.

Event Detection Few-Shot Learning

Paper
Add Code

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

no code implementations • 9 Nov 2022 • Travis M. Bartley, Fei Jia, Krishna C. Puvvada, Samuel Kriman, Boris Ginsburg

In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm.

Language Identification Spoken language identification

Paper
Add Code

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

Paper
Add Code

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations • 18 Oct 2023 • Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.