Search Results for author: Katrin Kirchhoff

Found 36 papers, 6 papers with code

Self-supervised Representation Learning for Speech Processing

1 code implementation NAACL (ACL) 2022 Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff

Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.

Representation Learning

AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

no code implementations24 Apr 2024 Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning.

DeAL: Decoding-time Alignment for Large Language Models

no code implementations5 Feb 2024 James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-An Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth

Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF).

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

1 code implementation18 Dec 2022 Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth

Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance.

In-Context Learning Language Modelling +1

Device Directedness with Contextual Cues for Spoken Dialog Systems

no code implementations23 Nov 2022 Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

no code implementations18 Oct 2022 Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently.

speech-recognition Speech Recognition

Self-Supervised Speech Representation Learning: A Review

no code implementations21 May 2022 Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe

Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems

no code implementations16 Dec 2021 Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

no code implementations30 Nov 2021 Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff

To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training.

Emotion Classification Representation Learning +1

Prompt-tuning in ASR systems for efficient domain-adaptation

no code implementations13 Oct 2021 Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff

In this work, we overcome the problem using prompt-tuning, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Remember the context! ASR slot error correction through memorization

no code implementations10 Sep 2021 Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff

Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

no code implementations ACL (ECNLP) 2021 Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Speaker-conversation factorial designs for diarization error analysis

no code implementations10 Jun 2021 Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex.

Clustering speaker-diarization +1

Adapting Long Context NLM for ASR Rescoring in Conversational Agents

no code implementations21 Apr 2021 Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context.

intent-classification Intent Classification +2

Contextual Biasing of Language Models for Speech Recognition in Goal-Oriented Conversational Agents

no code implementations18 Mar 2021 Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff

In this paper, we explore different ways to incorporate context into a LSTM based NLM in order to model long range dependencies and improve speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Neural Inverse Text Normalization

no code implementations12 Feb 2021 Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff

We propose an efficient and robust neural solution for ITN leveraging transformer based seq2seq models and FST-based text normalization techniques for data preparation.

Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment

no code implementations NAACL 2021 Ethan A. Chi, Julian Salazar, Katrin Kirchhoff

Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance.

speech-recognition Speech Recognition

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

no code implementations3 Aug 2020 Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

Experiments conducted on the Fisher corpus show that our proposed approach achieves ~6-9% and ~3-4% absolute improvement (F1 score) over the baseline BLSTM model on reference transcripts and ASR outputs respectively.

Data Augmentation

Masked Language Model Scoring

6 code implementations ACL 2020 Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff

Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.

Attribute Domain Adaptation +4

BERTphone: Phonetically-Aware Encoder Representations for Utterance-Level Speaker and Language Recognition

1 code implementation30 Jun 2019 Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff

We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.

Avg Representation Learning +2

Simple, Fast, Accurate Intent Classification and Slot Labeling for Goal-Oriented Dialogue Systems

no code implementations WS 2019 Arshit Gupta, John Hewitt, Katrin Kirchhoff

With the advent of conversational assistants, like Amazon Alexa, Google Now, etc., dialogue systems are gaining a lot of traction, especially in industrial setting.

General Classification Goal-Oriented Dialogue Systems +3

Self-Attention Networks for Connectionist Temporal Classification in Speech Recognition

1 code implementation22 Jan 2019 Julian Salazar, Katrin Kirchhoff, Zhiheng Huang

The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition.

Classification General Classification +3

Context Models for OOV Word Translation in Low-Resource Languages

no code implementations WS 2018 Angli Liu, Katrin Kirchhoff

Out-of-vocabulary word translation is a major problem for the translation of low-resource languages that suffer from a lack of parallel training data.

Machine Translation Sentence +2

Syntactic and Semantic Features For Code-Switching Factored Language Models

no code implementations4 Oct 2017 Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja Schultz

The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Exploiting Out-of-Domain Data Sources for Dialectal Arabic Statistical Machine Translation

no code implementations7 Sep 2015 Katrin Kirchhoff, Bing Zhao, Wen Wang

Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.