Search Results for author: Andreas Stolcke

Found 31 papers, 4 papers with code

openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer

no code implementations24 Feb 2022 Kishan K C, Zhenning Tan, Long Chen, Minho Jin, Eunjung Han, Andreas Stolcke, Chul Lee

Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics.

Open Set Learning Speaker Identification

Contrastive-mixup learning for improved speaker verification

no code implementations22 Feb 2022 Xin Zhang, Minho Jin, Roger Cheng, Ruirui Li, Eunjung Han, Andreas Stolcke

In this work, we propose contrastive-mixup, a novel augmentation strategy that learns distinguishing representations based on a distance metric.

Data Augmentation Metric Learning +1

Self-supervised Speaker Recognition Training Using Human-Machine Dialogues

no code implementations7 Feb 2022 Metehan Cekic, Ruirui Li, Zeya Chen, Yuguang Yang, Andreas Stolcke, Upamanyu Madhow

Speaker recognition, recognizing speaker identities based on voice alone, enables important downstream applications, such as personalization and authentication.

Contrastive Learning Speaker Recognition

ASR-Aware End-to-end Neural Diarization

no code implementations2 Feb 2022 Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.

Automatic Speech Recognition Change Detection +1

Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition

no code implementations18 Jun 2021 Ruirui Li, Chelsea J. -T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke

Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used.

Speaker Identification Speaker Recognition +1

Graph-based Label Propagation for Semi-Supervised Speaker Identification

no code implementations15 Jun 2021 Long Chen, Venkatesh Ravichandran, Andreas Stolcke

We show in experiments on the VoxCeleb dataset that this approach makes effective use of unlabeled data and improves speaker identification accuracy compared to two state-of-the-art scoring methods as well as their semi-supervised variants based on pseudo-labels.

Speaker Identification Speaker Recognition

End-to-end Neural Diarization: From Transformer to Conformer

no code implementations14 Jun 2021 Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

We propose a new end-to-end neural diarization (EEND) system that is based on Conformer, a recently proposed neural architecture that combines convolutional mappings and Transformer to model both local and global dependencies in speech.

Data Augmentation

Attention-based Contextual Language Model Adaptation for Speech Recognition

1 code implementation Findings (ACL) 2021 Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7. 0% relative over a standard LM that does not incorporate contextual information.

Automatic Speech Recognition voice assistant

Reranking Machine Translation Hypotheses with Structured and Web-based Language Models

no code implementations25 Apr 2021 Wen Wang, Andreas Stolcke, Jing Zheng

In this paper, we investigate the use of linguistically motivated and computationally efficient structured language models for reranking N-best hypotheses in a statistical machine translation system.

Language Modelling Machine Translation +2

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Personalization Strategies for End-to-End Speech Recognition Systems

no code implementations15 Feb 2021 Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko

We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.

Speech Recognition

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations12 Feb 2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Natural Language Understanding +1

BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

no code implementations5 Nov 2020 Eunjung Han, Chul Lee, Andreas Stolcke

We present a novel online end-to-end neural diarization system, BW-EDA-EEND, that processes data incrementally for a variable number of speakers.

Speaker Diarization

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

1 code implementation3 Nov 2020 Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Several advances have been made recently towards handling overlapping speech for speaker diarization.

Audio and Speech Processing Sound

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations27 Jul 2020 Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

Speech Recognition

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings

no code implementations24 Oct 2019 Dave Makhervaks, William Hinthorn, Dimitrios Dimitriadis, Andreas Stolcke

Involvement hot spots have been proposed as a useful concept for meeting analysis and studied off and on for over 15 years.

Word Embeddings

Improving Diarization Robustness using Diversification, Randomization and the DOVER Algorithm

no code implementations24 Oct 2019 Andreas Stolcke

Speaker diarization based on bottom-up clustering of speech segments by acoustic similarity is often highly sensitive to the choice of hyperparameters, such as the initial number of clusters and feature weighting.

Speaker Diarization

DOVER: A Method for Combining Diarization Outputs

2 code implementations17 Sep 2019 Andreas Stolcke, Takuya Yoshioka

Speech recognition and other natural language tasks have long benefited from voting-based algorithms as a method to aggregate outputs from several systems to achieve a higher accuracy than any of the individual systems.

Speech Recognition

Session-level Language Modeling for Conversational Speech

no code implementations EMNLP 2018 Wayne Xiong, Lingfeng Wu, Jun Zhang, Andreas Stolcke

We propose to generalize language models for conversational speech recognition to allow them to operate across utterance boundaries and speaker changes, thereby capturing conversation-level phenomena such as adjacency pairs, lexical entrainment, and topical coherence.

Speech Recognition

Comparing Human and Machine Errors in Conversational Speech Transcription

no code implementations29 Aug 2017 Andreas Stolcke, Jasha Droppo

In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.

Cannot find the paper you are looking for? You can Submit a new open access paper.