no code implementations • ACL (IWSLT) 2021 • Pavel Denisov, Manuel Mager, Ngoc Thang Vu
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 11 Oct 2024 • Nicolo' Brandizzi, Hammam Abdelwahab, Anirban Bhowmick, Lennard Helmer, Benny Jörg Stein, Pavel Denisov, Qasid Saleem, Michael Fromm, Mehdi Ali, Richard Rutmann, Farzad Naderi, Mohamad Saif Agy, Alexander Schwirjow, Fabian Küch, Luzian Hahn, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Dennis Wegener, Nicolas Flores-Herr, Joachim köhler, Johannes Leveling
This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs).
no code implementations • 30 Sep 2024 • Mehdi Ali, Michael Fromm, Klaudia Thellmann, Jan Ebert, Alexander Arno Weber, Richard Rutmann, Charvi Jain, Max Lübbering, Daniel Steinigen, Johannes Leveling, Katrin Klug, Jasper Schulze Buschhoff, Lena Jurkschat, Hammam Abdelwahab, Benny Jörg Stein, Karl-Heinz Sylla, Pavel Denisov, Nicolo Brandizzi, Qasid Saleem, Bhowmick Anirban, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Alex Jude, Lalith Manjunath, Samuel Weinbach, Carolin Penke, Shima Asaadi, Fabio Barth, Rafet Sifa, Fabian Küch, René Jäkel, Georg Rehm, Stefan Kesselheim, Joachim köhler, Nicolas Flores-Herr
We present preliminary results of the project OpenGPT-X.
1 code implementation • 10 Sep 2024 • Sakshi Deo Shukla, Pavel Denisov, Tugtekin Turan
In this paper, we introduce an end-to-end scheme that bypasses this conventional two-step process by directly employing semantic speech encoders for segmentation.
1 code implementation • 16 Apr 2024 • Pavel Denisov, Ngoc Thang Vu
Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.
1 code implementation • 26 Oct 2023 • Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu
For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021.
1 code implementation • 9 Oct 2023 • Pavel Denisov, Ngoc Thang Vu
A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling.
no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.
1 code implementation • 13 Oct 2022 • Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu
In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.
1 code implementation • 11 Jul 2022 • Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu
In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe
However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.
no code implementations • 29 Aug 2021 • Injy Hamed, Pavel Denisov, Chia-Yu Li, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang Vu
In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 30 Jun 2021 • Pavel Denisov, Manuel Mager, Ngoc Thang Vu
This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey
Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 3 Jul 2020 • Pavel Denisov, Ngoc Thang Vu
Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps.
1 code implementation • ACL 2020 • Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu
We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e. g. emotion recognition, engagement level prediction and backchanneling) conversational agents.
no code implementations • 13 Aug 2019 • Pavel Denisov, Ngoc Thang Vu
This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 13 Aug 2019 • Pavel Denisov, Ngoc Thang Vu
We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials.
Ranked #4 on Speech Recognition on TUDA (using extra training data)
no code implementations • 28 Feb 2019 • Daniel Ortega, Chia-Yu Li, Gisela Vallejo, Pavel Denisov, Ngoc Thang Vu
This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 30 Jul 2018 • Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font
In this paper, we investigate the use of adversarial learning for unsupervised adaptation to unseen recording conditions, more specifically, single microphone far-field speech.