Search Results for author: Andreas Stolcke

Found 49 papers, 4 papers with code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Post-Training Embedding Alignment for Decoupling Enrollment and Runtime Speaker Recognition Models

no code implementations • 23 Jan 2024 • Chenyang Gao, Brecht Desplanques, Chelsea J. -T. Ju, Aman Chadha, Andreas Stolcke

Automated speaker identification (SID) is a crucial step for the personalization of a wide range of speech-enabled services.

Speaker Identification Speaker Recognition

Paper
Add Code

Investigating Training Strategies and Model Robustness of Low-Rank Adaptation for Language Modeling in Speech Recognition

no code implementations • 19 Jan 2024 • Yu Yu, Chao-Han Huck Yang, Tuan Dinh, Sungho Ryu, Jari Kolehmainen, Roger Ren, Denis Filimonov, Prashanth G. Shivakumar, Ankur Gandhe, Ariya Rastow, Jia Xu, Ivan Bulyko, Andreas Stolcke

The use of low-rank adaptation (LoRA) with frozen pretrained language models (PLMs) has become increasing popular as a mainstream, resource-efficient modeling approach for memory-constrained hardware.

Language Modelling speech-recognition +1

Paper
Add Code

Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks

no code implementations • 5 Jan 2024 • Kevin Everson, Yile Gu, Huck Yang, Prashanth Gurunath Shivakumar, Guan-Ting Lin, Jari Kolehmainen, Ivan Bulyko, Ankur Gandhe, Shalini Ghosh, Wael Hamza, Hung-Yi Lee, Ariya Rastrow, Andreas Stolcke

In the realm of spoken language understanding (SLU), numerous natural language understanding (NLU) methodologies have been adapted by supplying large language models (LLMs) with transcribed speech instead of conventional written text.

In-Context Learning intent-classification +6

Paper
Add Code

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

no code implementations • 23 Dec 2023 • Guan-Ting Lin, Prashanth Gurunath Shivakumar, Ankur Gandhe, Chao-Han Huck Yang, Yile Gu, Shalini Ghosh, Andreas Stolcke, Hung-Yi Lee, Ivan Bulyko

Specifically, our framework serializes tasks in the order of current paralinguistic attribute prediction, response paralinguistic attribute prediction, and response text generation with autoregressive conditioning.

Attribute Language Modelling +4

Paper
Add Code

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

no code implementations • 27 Sep 2023 • Chao-Han Huck Yang, Yile Gu, Yi-Chieh Liu, Shalini Ghosh, Ivan Bulyko, Andreas Stolcke

We explore the ability of large language models (LLMs) to act as speech recognition post-processors that perform rescoring and error correction.

Ranked #3 on Speech Recognition on WSJ eval92 (using extra training data)

In-Context Learning speech-recognition +1

Paper
Add Code

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

no code implementations • 26 Sep 2023 • Yu Yu, Chao-Han Huck Yang, Jari Kolehmainen, Prashanth G. Shivakumar, Yile Gu, Sungho Ryu, Roger Ren, Qi Luo, Aditya Gourav, I-Fan Chen, Yi-Chieh Liu, Tuan Dinh, Ankur Gandhe, Denis Filimonov, Shalini Ghosh, Andreas Stolcke, Ariya Rastow, Ivan Bulyko

We propose a neural language modeling system based on low-rank adaptation (LoRA) for speech recognition output rescoring.

Language Modelling Large Language Model +2

Paper
Add Code

Learning When to Trust Which Teacher for Weakly Supervised ASR

no code implementations • 21 Jun 2023 • Aakriti Agrawal, Milind Rao, Anit Kumar Sahu, Gopinath Chennupati, Andreas Stolcke

We show the efficacy of our approach using LibriSpeech and LibriLight benchmarks and find an improvement of 4 to 25\% over baselines that uniformly weight all the experts, use a single expert model, or combine experts using ROVER.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Streaming Speech-to-Confusion Network Speech Recognition

no code implementations • 2 Jun 2023 • Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

no code implementations • 30 Mar 2023 • Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cross-utterance ASR Rescoring with Graph-based Label Propagation

no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.

Fairness Language Modelling

Paper
Add Code

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

no code implementations • 23 Mar 2023 • Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.

Multi-Armed Bandits

Paper
Add Code

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations • 4 Nov 2022 • Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

An Experimental Study on Private Aggregation of Teacher Ensemble Learning for End-to-End Speech Recognition

no code implementations • 11 Oct 2022 • Chao-Han Huck Yang, I-Fan Chen, Andreas Stolcke, Sabato Marco Siniscalchi, Chin-Hui Lee

We evaluate three end-to-end deep models, including LAS, hybrid CTC/attention, and RNN transducer, on the open-source LibriSpeech and TIMIT corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

no code implementations • 22 Jul 2022 • Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts.

Fairness speech-recognition +1

Paper
Add Code

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Adversarial Reweighting for Speaker Verification Fairness

no code implementations • 15 Jul 2022 • Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke

Results show that the pairwise weighting method can achieve 1. 08% overall EER, 1. 25% for male and 0. 67% for female speakers, with relative EER reductions of 7. 7%, 10. 1% and 3. 0%, respectively.

Fairness Metric Learning +1

Paper
Add Code

Graph-based Multi-View Fusion and Local Adaptation: Mitigating Within-Household Confusability for Speaker Identification

no code implementations • 8 Jul 2022 • Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke

Speaker identification (SID) in the household scenario (e. g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances.

Fairness Speaker Identification +1

Paper
Add Code

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

no code implementations • Findings (ACL) 2022 • Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke

Training the model initially with proxy context retains 67% of the perplexity gain after adapting to real context.

Sentence

Paper
Add Code

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer

no code implementations • 24 Feb 2022 • Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee

Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.

Open Set Learning Speaker Identification

Paper
Add Code

Improving fairness in speaker verification via Group-adapted Fusion Network

1 code implementation • 23 Feb 2022 • Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke

This is observed especially with underrepresented demographic groups sharing similar voice characteristics.

Fairness Speaker Recognition +1

Paper
Code

Contrastive-mixup learning for improved speaker verification

no code implementations • 22 Feb 2022 • Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric.

Data Augmentation Metric Learning +1

Paper
Add Code

Mitigating Closed-model Adversarial Examples with Bayesian Neural Modeling for Enhanced End-to-End Speech Recognition

no code implementations • 17 Feb 2022 • Chao-Han Huck Yang, Zeeshan Ahmed, Yile Gu, Joseph Szurley, Roger Ren, Linda Liu, Andreas Stolcke, Ivan Bulyko

In this work, we aim to enhance the system robustness of end-to-end automatic speech recognition (ASR) against adversarially-noisy speech examples.

Adversarial Robustness Automatic Speech Recognition +3

Paper
Add Code

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

no code implementations • 7 Feb 2022 • Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.

Contrastive Learning Speaker Recognition

Paper
Add Code

ASR-Aware End-to-end Neural Diarization

no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

RescoreBERT: Discriminative Speech Recognition Rescoring with BERT

no code implementations • 2 Feb 2022 • Liyan Xu, Yile Gu, Jari Kolehmainen, Haidar Khan, Ankur Gandhe, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko

Specifically, training a bidirectional model like BERT on a discriminative objective such as minimum WER (MWER) has not been explored.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Improving Speaker Identification for Shared Devices by Adapting Embeddings to Speaker Subsets

no code implementations • 6 Sep 2021 • Zhenning Tan, Yuguang Yang, Eunjung Han, Andreas Stolcke

Second, a scoring function is applied between a runtime utterance and each speaker profile.

Speaker Identification

Paper
Add Code

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

no code implementations • 18 Jun 2021 • Ruirui Li, Chelsea J. -T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke

Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used.

Speaker Identification Speaker Recognition

Paper
Add Code

Graph-based Label Propagation for Semi-Supervised Speaker Identification

no code implementations • 15 Jun 2021 • Long Chen, Venkatesh Ravichandran, Andreas Stolcke

We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.

Speaker Identification Speaker Recognition

Paper
Add Code

End-to-end Neural Diarization: From Transformer to Conformer

no code implementations • 14 Jun 2021 • Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.

Data Augmentation

Paper
Add Code

Attention-based Contextual Language Model Adaptation for Speech Recognition

1 code implementation • Findings (ACL) 2021 • Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7. 0% relative over a standard LM that does not incorporate contextual information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

no code implementations • 14 May 2021 • Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3. 33% relative WERR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

no code implementations • 25 Apr 2021 • Wen Wang, Andreas Stolcke, Jing Zheng

In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.

Language Modelling Machine Translation +2

Paper
Add Code

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Paper
Add Code

Personalization Strategies for End-to-End Speech Recognition Systems

no code implementations • 15 Feb 2021 • Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko

We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.

speech-recognition Speech Recognition

Paper
Add Code

Contrastive Unsupervised Learning for Speech Emotion Recognition

no code implementations • 12 Feb 2021 • Mao Li, Bo Yang, Joshua Levy, Andreas Stolcke, Viktor Rozgic, Spyros Matsoukas, Constantinos Papayiannis, Daniel Bone, Chao Wang

Speech emotion recognition (SER) is a key technology to enable more natural human-machine communication.

Ranked #2 on Speech Emotion Recognition on MSP-Podcast (Dominance) (using extra training data)

Representation Learning Speech Emotion Recognition

Paper
Add Code

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations • 12 Feb 2021 • Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

REDAT: Accent-Invariant Representation for End-to-End ASR by Domain Adversarial Training with Relabeling

no code implementations • 14 Dec 2020 • Hu Hu, Xuesong Yang, Zeynab Raeesy, Jinxi Guo, Gokce Keskin, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Roland Maas

Accents mismatching is a critical problem for end-to-end ASR.

Clustering

Paper
Add Code

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

no code implementations • 5 Nov 2020 • Eunjung Han, Chul Lee, Andreas Stolcke

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.

Clustering speaker-diarization +1

Paper
Add Code

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation • 3 Nov 2020 • Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Paper
Code

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations • 27 Jul 2020 • Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

speech-recognition Speech Recognition

Paper
Add Code

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings

no code implementations • 24 Oct 2019 • Dave Makhervaks, William Hinthorn, Dimitrios Dimitriadis, Andreas Stolcke

Involvement hot spots have been proposed as a useful concept for meeting analysis and studied off and on for over 15 years.

Word Embeddings

Paper
Add Code

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

no code implementations • 24 Oct 2019 • Andreas Stolcke

Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting.

Clustering speaker-diarization +1

Paper
Add Code

DOVER: A Method for Combining Diarization Outputs

2 code implementations • 17 Sep 2019 • Andreas Stolcke, Takuya Yoshioka

Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems.

speech-recognition Speech Recognition

Paper
Code

Meeting Transcription Using Virtual Microphone Arrays

no code implementations • 3 May 2019 • Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng

The speaker-attributed WER (SAWER) is 26. 7%.

speaker-diarization Speaker Diarization +2

Paper
Add Code

Session-level Language Modeling for Conversational Speech

no code implementations • EMNLP 2018 • Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.

Language Modelling speech-recognition +1

Paper
Add Code

Comparing Human and Machine Errors in Conversational Speech Transcription

no code implementations • 29 Aug 2017 • Andreas Stolcke, Jasha Droppo

In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.

Paper
Add Code

A Cross-language Study on Automatic Speech Disfluency Detection

no code implementations • NAACL 2013 • Wen Wang, Andreas Stolcke, Jiahong Yuan, Mark Liberman

Language Modelling Speech Recognition

Paper
Add Code

Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog

no code implementations • NAACL 2013 • Heeyoung Lee, Andreas Stolcke, Elizabeth Shriberg

Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.