Search Results for author: Hagen Soltau

Found 17 papers, 0 papers with code

Retrieval Augmented End-to-End Spoken Dialog Models

no code implementations2 Feb 2024 Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM.

dialog state tracking In-Context Learning +3

Efficient Adapters for Giant Speech Models

no code implementations13 Jun 2023 Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales.

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

no code implementations8 Jun 2023 Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations.

dialog state tracking Language Modelling +1

AnyTOD: A Programmable Task-Oriented Dialog System

no code implementations20 Dec 2022 Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu

We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of handling unseen tasks without task-specific training.

Benchmarking Language Modelling

Knowledge-grounded Dialog State Tracking

no code implementations13 Oct 2022 Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau

Knowledge (including structured knowledge such as schema and ontology, and unstructured knowledge such as web corpus) is a critical part of dialog understanding, especially for unseen tasks and domains.

dialog state tracking Few-Shot Learning

Unsupervised Slot Schema Induction for Task-oriented Dialog

no code implementations NAACL 2022 Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau

Carefully-designed schemas describing how to collect and annotate dialog corpora are a prerequisite towards building task-oriented dialog systems.

dialog state tracking Response Generation

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences

no code implementations8 Feb 2022 Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey

Through empirical experiments on a challenging real-world medical NER task with multiple nested ontologies, we demonstrate that our fixed alignment model outperforms the standard RNN-T model, improving F1-score from 0. 70 to 0. 74.

named-entity-recognition Named Entity Recognition +3

Word-level confidence estimation for RNN transducers

no code implementations28 Sep 2021 Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran

Confidence estimate is an often requested feature in applications such as medical transcription where errors can impact patient care and the confidence estimate could be used to alert medical professionals to verify potential errors in recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction

no code implementations6 Apr 2021 Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey

Our transformer-based streaming model performs at about 20% WER on the ASR task, 6% WDER on the diarization task, 43% SER on periods, 52% SER on commas, 43% SER on question marks and 30% SER on capitalization.

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

no code implementations9 Jul 2019 Laurent El Shafey, Hagen Soltau, Izhak Shafran

The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

no code implementations31 Oct 2016 Hagen Soltau, Hank Liao, Hasim Sak

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.

Language Modelling speech-recognition +1

Improvements to deep convolutional neural networks for LVCSR

no code implementations5 Sep 2013 Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.