Search Results for author: Sanjeev Khudanpur

Found 46 papers, 11 papers with code

Learning Curricula for Multilingual Neural Machine Translation Training

no code implementations MTSummit 2021 Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur

Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs.

Machine Translation Translation

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

no code implementations8 Apr 2022 Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint model of ASR and denoiser.

Automatic Speech Recognition

PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification

1 code implementation23 Mar 2022 Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Suzy J. Styles, Sanjeev Khudanpur

We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training.

Language Identification

Enhance Language Identification using Dual-mode Model with Knowledge Distillation

no code implementations7 Mar 2022 Hexin Liu, Leibny Paola Garcia Perera, Andy W. H. Khong, Justin Dauwels, Suzy J. Styles, Sanjeev Khudanpur

In this paper, we propose to employ a dual-mode framework on the x-vector self-attention (XSA-LID) model with knowledge distillation (KD) to enhance its language identification (LID) performance for both long and short utterances.

Knowledge Distillation Language Identification

Injecting Text and Cross-lingual Supervision in Few-shot Learning from Self-Supervised Models

no code implementations10 Oct 2021 Matthew Wiesner, Desh Raj, Sanjeev Khudanpur

Self-supervised model pre-training has recently garnered significant interest, but relatively few efforts have explored using additional resources in fine-tuning these models.

Few-Shot Learning

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

1 code implementation13 Jun 2021 Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

Speech Recognition

Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem

1 code implementation5 Apr 2021 Desh Raj, Sanjeev Khudanpur

We also derive an approximation bound for the algorithm in terms of the maximum number of hypotheses speakers.

graph partitioning Speaker Diarization

Adversarial Attacks and Defenses for Speech Recognition Systems

no code implementations31 Mar 2021 Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

We investigate two threat models: a denial-of-service scenario where fast gradient-sign method (FGSM) or weak projected gradient descent (PGD) attacks are used to degrade the model's word error rate (WER); and a targeted scenario where a more potent imperceptible attack forces the system to recognize a specific phrase.

Adversarial Robustness Automatic Speech Recognition

Learning Policies for Multilingual Training of Neural Machine Translation Systems

no code implementations11 Mar 2021 Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur

Low-resource Multilingual Neural Machine Translation (MNMT) is typically tasked with improving the translation performance on one or more language pairs with the aid of high-resource language pairs.

Machine Translation Translation

Learning Feature Weights using Reward Modeling for Denoising Parallel Corpora

no code implementations WMT (EMNLP) 2021 Gaurav Kumar, Philipp Koehn, Sanjeev Khudanpur

These feature weights which are optimized directly for the task of improving translation performance, are used to score and filter sentences in the noisy corpora more effectively.

Denoising Language Modelling +2

A Parallelizable Lattice Rescoring Strategy with Neural Language Models

1 code implementation8 Mar 2021 Ke Li, Daniel Povey, Sanjeev Khudanpur

This paper proposes a parallel computation strategy and a posterior-based lattice expansion algorithm for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition.

Automatic Speech Recognition

Wake Word Detection with Streaming Transformers

no code implementations8 Feb 2021 Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Modern wake word detection systems usually rely on neural networks for acoustic modeling.

Fine-grained activity recognition for assembly videos

no code implementations2 Dec 2020 Jonathan D. Jones, Cathryn Cortesa, Amy Shelton, Barbara Landau, Sanjeev Khudanpur, Gregory D. Hager

In this paper we address the task of recognizing assembly actions as a structure (e. g. a piece of furniture or a toy block tower) is built up from a set of primitive objects.

Action Recognition

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation3 Nov 2020 Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Efficient MDI Adaptation for n-gram Language Models

no code implementations5 Aug 2020 Ruizhe Huang, Ke Li, Ashish Arora, Dan Povey, Sanjeev Khudanpur

This paper presents an efficient algorithm for n-gram language model adaptation under the minimum discrimination information (MDI) principle, where an out-of-domain language model is adapted to satisfy the constraints of marginal probabilities of the in-domain data.

Language Modelling

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

1 code implementation20 May 2020 Yiwen Shao, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We present PyChain, a fully parallelized PyTorch implementation of end-to-end lattice-free maximum mutual information (LF-MMI) training for the so-called \emph{chain models} in the Kaldi automatic speech recognition (ASR) toolkit.

Automatic Speech Recognition

Wake Word Detection with Alignment-Free Lattice-Free MMI

1 code implementation17 May 2020 Yiming Wang, Hang Lv, Daniel Povey, Lei Xie, Sanjeev Khudanpur

Always-on spoken language interfaces, e. g. personal digital assistants, rely on a wake word to start processing spoken input.

Frame

Speaker Diarization with Region Proposal Network

1 code implementation14 Feb 2020 Zili Huang, Shinji Watanabe, Yusuke Fujita, Paola Garcia, Yiwen Shao, Daniel Povey, Sanjeev Khudanpur

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem.

Region Proposal Speaker Diarization

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

1 code implementation18 Sep 2019 Yiming Wang, Tongfei Chen, Hainan Xu, Shuoyang Ding, Hang Lv, Yiwen Shao, Nanyun Peng, Lei Xie, Shinji Watanabe, Sanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq.

 Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard (Eval2000 metric)

Automatic Speech Recognition Data Augmentation +2

Probing the Information Encoded in X-vectors

no code implementations13 Sep 2019 Desh Raj, David Snyder, Daniel Povey, Sanjeev Khudanpur

Deep neural network based speaker embeddings, such as x-vectors, have been shown to perform well in text-independent speaker recognition/verification tasks.

Data Augmentation Speaker Recognition +2

Building Corpora for Single-Channel Speech Separation Across Multiple Domains

no code implementations6 Nov 2018 Matthew Maciejewski, Gregory Sell, Leibny Paola Garcia-Perera, Shinji Watanabe, Sanjeev Khudanpur

To date, the bulk of research on single-channel speech separation has been conducted using clean, near-field, read speech, which is not representative of many modern applications.

Speech Separation

End-to-end speech recognition using lattice-free MMI

no code implementations Interspeech 2018 2018 Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models.

Speech Recognition

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation Interspeech 2018 2018 Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

Speech Recognition

Low-Resource Contextual Topic Identification on Speech

no code implementations17 Jul 2018 Chunxi Liu, Matthew Wiesner, Shinji Watanabe, Craig Harman, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

In topic identification (topic ID) on real-world unstructured audio, an audio instance of variable topic shifts is first broken into sequential segments, and each segment is independently classified.

General Classification Topic Classification +1

Neural Network Language Modeling with Letter-based Features and Importance Sampling

no code implementations ICASSP 2018 Hainan Xu, Ke Li, Yiming Wang, Jian Wang, Shiyin Kang, Xie Chen, Daniel Povey, Sanjeev Khudanpur

In this paper we describe an extension of the Kaldi software toolkit to support neural-based language modeling, intended for use in automatic speech recognition (ASR) and related tasks.

Automatic Speech Recognition

A GPU-based WFST Decoder with Exact Lattice Generation

no code implementations9 Apr 2018 Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Daniel Povey, Sanjeev Khudanpur

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs).

Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages

no code implementations23 Feb 2018 Matthew Wiesner, Chunxi Liu, Lucas Ondel, Craig Harman, Vimal Manohar, Jan Trmal, Zhongqiang Huang, Najim Dehak, Sanjeev Khudanpur

Automatic speech recognition (ASR) systems often need to be developed for extremely low-resource languages to serve end-uses such as audio content categorization and search.

Automatic Speech Recognition Humanitarian

Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework

no code implementations12 Jun 2017 Xiaohui Zhang, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur

Speech recognition systems for irregularly-spelled languages like English normally require hand-written pronunciations.

Speech Recognition

Using of heterogeneous corpora for training of an ASR system

no code implementations1 Jun 2017 Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee

The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages".

Speech Recognition Speech-to-Text Translation +1

Topic Identification for Speech without ASR

no code implementations22 Mar 2017 Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur

Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs.

Automatic Speech Recognition General Classification +1

An Empirical Evaluation of Zero Resource Acoustic Unit Discovery

no code implementations5 Feb 2017 Chunxi Liu, Jinyi Yang, Ming Sun, Santosh Kesiraju, Alena Rott, Lucas Ondel, Pegah Ghahremani, Najim Dehak, Lukas Burget, Sanjeev Khudanpur

Acoustic unit discovery (AUD) is a process of automatically identifying a categorical acoustic unit inventory from speech and producing corresponding acoustic unit tokenizations.

Acoustic Unit Discovery

Purely sequence-trained neural networks for ASR based on lattice-free MMI

no code implementations INTERSPEECH 2016 2016 Daniel Povey, Vijayaditya Peddinti, Daniel Galvez, Pegah Ghahrmani, Vimal Manohar, Xingyu Na, Yiming Wang, Sanjeev Khudanpur

Models trained with LFMMI provide a relative word error rate reduction of ∼11. 5%, over those trained with cross-entropy objective function, and ∼8%, over those trained with cross-entropy and sMBR objective functions.

Frame Speech Recognition

New release of Mixer-6: Improved validity for phonetic study of speaker variation and identification

no code implementations LREC 2016 Eleanor Chodroff, Matthew Maciejewski, Jan Trmal, Sanjeev Khudanpur, John Godfrey

The Mixer series of speech corpora were collected over several years, principally to support annual NIST evaluations of speaker recognition (SR) technologies.

Speaker Recognition

Highway Long Short-Term Memory RNNs for Distant Speech Recognition

no code implementations30 Oct 2015 Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yao, Sanjeev Khudanpur, James Glass

In this paper, we extend the deep long short-term memory (DLSTM) recurrent neural networks by introducing gated direct connections between memory cells in adjacent layers.

Distant Speech Recognition Frame

Parallel training of DNNs with Natural Gradient and Parameter Averaging

1 code implementation27 Oct 2014 Daniel Povey, Xiaohui Zhang, Sanjeev Khudanpur

However, we have another method, an approximate and efficient implementation of Natural Gradient for Stochastic Gradient Descent (NG-SGD), which seems to allow our periodic-averaging method to work well, as well as substantially improving the convergence of SGD on a single machine.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.