1 code implementation • NAACL (ACL) 2022 • Hung-Yi Lee, Abdelrahman Mohamed, Shinji Watanabe, Tara Sainath, Karen Livescu, Shang-Wen Li, Shu-wen Yang, Katrin Kirchhoff
Due to the growing popularity of SSL, and the shared mission of the areas in bringing speech and language technologies to more use cases with better quality and scaling the technologies for under-represented languages, we propose this tutorial to systematically survey the latest SSL techniques, tools, datasets, and performance achievement in speech processing.
no code implementations • 24 Dec 2024 • Karel Mundnich, Xing Niu, Prashant Mathur, Srikanth Ronanki, Brady Houston, Veera Raghavendra Elluru, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Anshu Bhatia, Daniel Garcia-Romero, Kyu J. Han, Katrin Kirchhoff
We achieve this by using a pre-trained multilingual speech encoder, a multilingual LLM, and a lightweight adaptation module that maps the audio representations to the token embedding space of the LLM.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 3 Oct 2024 • Han He, Qianchu Liu, Lei Xu, Chaitanya Shivade, Yi Zhang, Sundararajan Srinivasan, Katrin Kirchhoff
However, these approaches are suboptimal for generative tasks, which require more nuanced guidance beyond a single numeric metric to improve the prompt and optimize multiple aspects of the generated text.
Ranked #1 on
Text Summarization
on ACI-Bench
no code implementations • 14 May 2024 • Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, Zhaocheng Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff
The models are instruction finetuned using continuous latent representations extracted from the speech foundation model to achieve optimal zero-shot performance on a diverse range of speech processing tasks using natural language instructions.
no code implementations • 14 May 2024 • Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff
Despite safety guardrails, experiments on jailbreaking demonstrate the vulnerability of SLMs to adversarial perturbations and transfer attacks, with average attack success rates of 90% and 10% respectively when evaluated on a dataset of carefully designed harmful questions spanning 12 different toxic categories.
no code implementations • 24 Apr 2024 • Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis
AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning.
no code implementations • 5 Feb 2024 • James Y. Huang, Sailik Sengupta, Daniele Bonadiman, Yi-An Lai, Arshit Gupta, Nikolaos Pappas, Saab Mansour, Katrin Kirchhoff, Dan Roth
Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF).
no code implementations • 2 Jul 2023 • Anshu Bhatia, Sanchit Sinha, Saket Dingliwal, Karthik Gopalakrishnan, Sravan Bodapati, Katrin Kirchhoff
Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 13 Jun 2023 • Goeric Huybrechts, Srikanth Ronanki, Xilai Li, Hadis Nosrati, Sravan Bodapati, Katrin Kirchhoff
To address this issue, we propose the integration of a novel dynamic contextual carry-over mechanism in a state-of-the-art (SOTA) unified ASR system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 5 May 2023 • Nilaksh Das, Monica Sunkara, Sravan Bodapati, Jinglun Cai, Devang Kulshreshtha, Jeff Farris, Katrin Kirchhoff
Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models such as attention-based encoder-decoder and RNN-T.
1 code implementation • 18 Dec 2022 • Hritik Bansal, Karthik Gopalakrishnan, Saket Dingliwal, Sravan Bodapati, Katrin Kirchhoff, Dan Roth
Using a 66 billion parameter language model (OPT-66B) across a diverse set of 14 downstream tasks, we find this is indeed the case: $\sim$70% of attention heads and $\sim$20% of feed forward networks can be removed with minimal decline in task performance.
no code implementations • 23 Nov 2022 • Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff
In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Oct 2022 • Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff
End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently.
no code implementations • 21 May 2022 • Abdelrahman Mohamed, Hung-Yi Lee, Lasse Borgholt, Jakob D. Havtorn, Joakim Edin, Christian Igel, Katrin Kirchhoff, Shang-Wen Li, Karen Livescu, Lars Maaløe, Tara N. Sainath, Shinji Watanabe
Although self-supervised speech representation is still a nascent research area, it is closely related to acoustic word embedding and learning with zero lexical resources, both of which have seen active research for many years.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 16 Dec 2021 • Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff
Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 10 Dec 2021 • Rohit Paturi, Sundararajan Srinivasan, Katrin Kirchhoff, Daniel Garcia-Romero
Also, most of these models are trained with synthetic mixtures and do not generalize to real conversational data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 30 Nov 2021 • Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff
To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training.
no code implementations • 13 Oct 2021 • Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, Ankur Gandhe, Ravi Teja Gadde, Katrin Kirchhoff
In this work, we overcome the problem using prompt-tuning, a methodology that trains a small number of domain token embedding parameters to prime a transformer-based LM to a particular domain.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 10 Sep 2021 • Dhanush Bekal, Ashish Shenoy, Monica Sunkara, Sravan Bodapati, Katrin Kirchhoff
Accurate recognition of slot values such as domain specific words or named entities by automatic speech recognition (ASR) systems forms the core of the Goal-oriented Dialogue Systems.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • ACL (ECNLP) 2021 • Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff
In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 10 Jun 2021 • Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff
Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex.
no code implementations • 21 Apr 2021 • Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff
Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context.
no code implementations • 18 Mar 2021 • Ashish Shenoy, Sravan Bodapati, Katrin Kirchhoff
In this paper, we explore different ways to incorporate context into a LSTM based NLM in order to model long range dependencies and improve speech recognition.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 12 Feb 2021 • Monica Sunkara, Chaitanya Shivade, Sravan Bodapati, Katrin Kirchhoff
We propose an efficient and robust neural solution for ITN leveraging transformer based seq2seq models and FST-based text normalization techniques for data preparation.
no code implementations • 30 Nov 2020 • Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff
We live in a world where 60% of the population can speak two or more languages fluently.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • NAACL 2021 • Ethan A. Chi, Julian Salazar, Katrin Kirchhoff
Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance.
no code implementations • 3 Aug 2020 • Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff
Experiments conducted on the Fisher corpus show that our proposed approach achieves ~6-9% and ~3-4% absolute improvement (F1 score) over the baseline BLSTM model on reference transcripts and ASR outputs respectively.
no code implementations • WS 2020 • Monica Sunkara, Srikanth Ronanki, Kalpit Dixit, Sravan Bodapati, Katrin Kirchhoff
We also present techniques for domain and task specific adaptation by fine-tuning masked language models with medical domain data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 3 Dec 2019 • Shaoshi Ling, Yuzong Liu, Julian Salazar, Katrin Kirchhoff
We propose a novel approach to semi-supervised automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
6 code implementations • ACL 2020 • Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff
Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.
1 code implementation • 30 Jun 2019 • Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff
We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.
no code implementations • WS 2019 • Arshit Gupta, John Hewitt, Katrin Kirchhoff
With the advent of conversational assistants, like Amazon Alexa, Google Now, etc., dialogue systems are gaining a lot of traction, especially in industrial setting.
1 code implementation • 22 Jan 2019 • Julian Salazar, Katrin Kirchhoff, Zhiheng Huang
The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition.
no code implementations • WS 2018 • Angli Liu, Katrin Kirchhoff
Out-of-vocabulary word translation is a major problem for the translation of low-resource languages that suffer from a lack of parallel training data.
no code implementations • 4 Oct 2017 • Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja Schultz
The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 7 Sep 2015 • Katrin Kirchhoff, Bing Zhao, Wen Wang
Statistical machine translation for dialectal Arabic is characterized by a lack of data since data acquisition involves the transcription and translation of spoken language.