Search Results for author: Hagen Soltau

Found 17 papers, 0 papers with code

Retrieval Augmented End-to-End Spoken Dialog Models

no code implementations • 2 Feb 2024 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

We recently developed SLM, a joint speech and language model, which fuses a pretrained foundational speech model and a large language model (LLM), while preserving the in-context learning capability intrinsic to the pretrained LLM.

dialog state tracking In-Context Learning +3

Paper
Add Code

Detecting Speech Abnormalities with a Perceiver-based Sequence Classifier that Leverages a Universal Speech Model

no code implementations • 16 Oct 2023 • Hagen Soltau, Izhak Shafran, Alex Ottenwess, Joseph R. JR Duffy, Rene L. Utianski, Leland R. Barnard, John L. Stricker, Daniela Wiepert, David T. Jones, Hugo Botha

We propose a Perceiver-based sequence classifier to detect abnormalities in speech reflective of several neurological disorders.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SLM: Bridge the thin gap between speech and text foundation models

no code implementations • 30 Sep 2023 • Mingqiu Wang, Wei Han, Izhak Shafran, Zelin Wu, Chung-Cheng Chiu, Yuan Cao, Yongqiang Wang, Nanxin Chen, Yu Zhang, Hagen Soltau, Paul Rubenstein, Lukas Zilka, Dian Yu, Zhong Meng, Golan Pundak, Nikhil Siddhartha, Johan Schalkwyk, Yonghui Wu

We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-modal model that takes advantage of pretrained foundational speech and language models.

Instruction Following Language Modelling +3

Paper
Add Code

Efficient Adapters for Giant Speech Models

no code implementations • 13 Jun 2023 • Nanxin Chen, Izhak Shafran, Yu Zhang, Chung-Cheng Chiu, Hagen Soltau, James Qin, Yonghui Wu

However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales.

Paper
Add Code

Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding

no code implementations • 8 Jun 2023 • Mingqiu Wang, Izhak Shafran, Hagen Soltau, Wei Han, Yuan Cao, Dian Yu, Laurent El Shafey

Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations.

dialog state tracking Language Modelling +1

Paper
Add Code

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

no code implementations • 2 Mar 2023 • Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu

We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

AnyTOD: A Programmable Task-Oriented Dialog System

no code implementations • 20 Dec 2022 • Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, Yonghui Wu

We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of handling unseen tasks without task-specific training.

Benchmarking Language Modelling

Paper
Add Code

Speech Aware Dialog System Technology Challenge (DSTC11)

no code implementations • 16 Dec 2022 • Hagen Soltau, Izhak Shafran, Mingqiu Wang, Abhinav Rastogi, Jeffrey Zhao, Ye Jia, Wei Han, Yuan Cao, Aramys Miranda

The research on this topic is stymied by the lack of a public corpus.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Knowledge-grounded Dialog State Tracking

no code implementations • 13 Oct 2022 • Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau

Knowledge (including structured knowledge such as schema and ontology, and unstructured knowledge such as web corpus) is a critical part of dialog understanding, especially for unseen tasks and domains.

dialog state tracking Few-Shot Learning

Paper
Add Code

Unsupervised Slot Schema Induction for Task-oriented Dialog

no code implementations • NAACL 2022 • Dian Yu, Mingqiu Wang, Yuan Cao, Izhak Shafran, Laurent El Shafey, Hagen Soltau

Carefully-designed schemas describing how to collect and annotate dialog corpora are a prerequisite towards building task-oriented dialog systems.

dialog state tracking Response Generation

Paper
Add Code

RNN Transducers for Nested Named Entity Recognition with constraints on alignment for long sequences

no code implementations • 8 Feb 2022 • Hagen Soltau, Izhak Shafran, Mingqiu Wang, Laurent El Shafey

Through empirical experiments on a challenging real-world medical NER task with multiple nested ontologies, we demonstrate that our fixed alignment model outperforms the standard RNN-T model, improving F1-score from 0. 70 to 0. 74.

named-entity-recognition Named Entity Recognition +3

Paper
Add Code

Word-level confidence estimation for RNN transducers

no code implementations • 28 Sep 2021 • Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran

Confidence estimate is an often requested feature in applications such as medical transcription where errors can impact patient care and the confidence estimate could be used to alert medical professionals to verify potential errors in recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction

no code implementations • 6 Apr 2021 • Hagen Soltau, Mingqiu Wang, Izhak Shafran, Laurent El Shafey

Our transformer-based streaming model performs at about 20% WER on the ASR task, 6% WDER on the diarization task, 43% SER on periods, 52% SER on commas, 43% SER on question marks and 30% SER on capitalization.

Paper
Add Code

The Medical Scribe: Corpus Development and Model Performance Analyses

no code implementations • LREC 2020 • Izhak Shafran, Nan Du, Linh Tran, Amanda Perry, Lauren Keyes, Mark Knichel, Ashley Domin, Lei Huang, Yu-Hui Chen, Gang Li, Mingqiu Wang, Laurent El Shafey, Hagen Soltau, Justin S. Paul

We used this annotation scheme to label a corpus of about 6k clinical encounters.

Paper
Add Code

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

no code implementations • 9 Jul 2019 • Laurent El Shafey, Hagen Soltau, Izhak Shafran

The task of assigning words to speakers is typically addressed by merging the outputs of two separate systems, namely, an automatic speech recognition (ASR) system and a speaker diarization (SD) system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

no code implementations • 31 Oct 2016 • Hagen Soltau, Hank Liao, Hasim Sak

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.

Language Modelling speech-recognition +1

Paper
Add Code

Improvements to deep convolutional neural networks for LVCSR

no code implementations • 5 Sep 2013 • Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin, Bhuvana Ramabhadran

We find that with these improvements, particularly with fMLLR and dropout, we are able to achieve an additional 2-3% relative improvement in WER on a 50-hour Broadcast News task over our previous best CNN baseline.

Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.