Search Results for author: Andreas Stolcke

Found 49 papers, 4 papers with code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations26 Jan 2024 Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

no code implementations19 Jan 2024 Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.

Language Modelling speech-recognition +1

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

no code implementations5 Jan 2024 Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.

In-Context Learning intent-classification +6

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

no code implementations23 Dec 2023 Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko

Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.

Attribute Language Modelling +4

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

no code implementations27 Sep 2023 Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.

Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)

In-Context Learning speech-recognition +1

Learning When to Trust Which Teacher for Weakly Supervised ASR

no code implementations21 Jun 2023 Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke

We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25\% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Streaming Speech-to-Confusion Network Speech Recognition

no code implementations2 Jun 2023 Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

no code implementations30 Mar 2023 Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cross-utterance ASR Rescoring with Graph-based Label Propagation

no code implementations27 Mar 2023 Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.

Fairness Language Modelling

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

no code implementations23 Mar 2023 Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.

Multi-Armed Bandits

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations4 Nov 2022 Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Adversarial Reweighting for Speaker Verification Fairness

no code implementations15 Jul 2022 Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke

Results show that the pairwise weighting method can achieve 1. 08% overall EER, 1. 25% for male and 0. 67% for female speakers, with relative EER reductions of 7. 7%, 10. 1% and 3. 0%, respectively.

Fairness Metric Learning +1

Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification

no code implementations8 Jul 2022 Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke

Speaker identification (SID) in the household scenario (e. g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances.

Fairness Speaker Identification +1

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer

no code implementations24 Feb 2022 Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee

Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.

Open Set Learning Speaker Identification

Contrastive-mixup learning for improved speaker verification

no code implementations22 Feb 2022 Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric.

Data Augmentation Metric Learning +1

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

no code implementations7 Feb 2022 Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.

Contrastive Learning Speaker Recognition

ASR-Aware End-to-end Neural Diarization

no code implementations2 Feb 2022 Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

no code implementations18 Jun 2021 Ruirui Li, Chelsea J. -T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke

Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used.

Speaker Identification Speaker Recognition

Graph-based Label Propagation for Semi-Supervised Speaker Identification

no code implementations15 Jun 2021 Long Chen, Venkatesh Ravichandran, Andreas Stolcke

We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.

Speaker Identification Speaker Recognition

End-to-end Neural Diarization: From Transformer to Conformer

no code implementations14 Jun 2021 Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.

Data Augmentation

Attention-based Contextual Language Model Adaptation for Speech Recognition

1 code implementation Findings (ACL) 2021 Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7. 0% relative over a standard LM that does not incorporate contextual information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

no code implementations25 Apr 2021 Wen Wang, Andreas Stolcke, Jing Zheng

In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.

Language Modelling Machine Translation +2

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Personalization Strategies for End-to-End Speech Recognition Systems

no code implementations15 Feb 2021 Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko

We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.

speech-recognition Speech Recognition

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations12 Feb 2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

no code implementations5 Nov 2020 Eunjung Han, Chul Lee, Andreas Stolcke

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.

Clustering speaker-diarization +1

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation3 Nov 2020 Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations27 Jul 2020 Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

speech-recognition Speech Recognition

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings

no code implementations24 Oct 2019 Dave Makhervaks, William Hinthorn, Dimitrios Dimitriadis, Andreas Stolcke

Involvement hot spots have been proposed as a useful concept for meeting analysis and studied off and on for over 15 years.

Word Embeddings

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

no code implementations24 Oct 2019 Andreas Stolcke

Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting.

Clustering speaker-diarization +1

DOVER: A Method for Combining Diarization Outputs

2 code implementations17 Sep 2019 Andreas Stolcke, Takuya Yoshioka

Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems.

speech-recognition Speech Recognition

Session-level Language Modeling for Conversational Speech

no code implementations EMNLP 2018 Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.

Language Modelling speech-recognition +1

Comparing Human and Machine Errors in Conversational Speech Transcription

no code implementations29 Aug 2017 Andreas Stolcke, Jasha Droppo

In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.

Cannot find the paper you are looking for? You can Submit a new open access paper.