Search Results for author: Pavel Denisov

Found 17 papers, 7 papers with code

IMS’ Systems for the IWSLT 2021 Low-Resource Speech Translation Task

no code implementations • ACL (IWSLT) 2021 • Pavel Denisov, Manuel Mager, Ngoc Thang Vu

This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

1 code implementation • 16 Apr 2024 • Pavel Denisov, Ngoc Thang Vu

Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.

Language Modelling Large Language Model +3

Paper
Code

The IMS Toucan System for the Blizzard Challenge 2023

1 code implementation • 26 Oct 2023 • Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021.

449

Paper
Code

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

1 code implementation • 9 Oct 2023 • Pavel Denisov, Ngoc Thang Vu

A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling.

slot-filling Slot Filling +3

Paper
Code

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.

Automatic Speech Recognition Self-Supervised Learning +3

Paper
Add Code

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

1 code implementation • 13 Oct 2022 • Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu

In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.

Generative Adversarial Network

Paper
Code

Speaker Anonymization with Phonetic Intermediate Representations

1 code implementation • 11 Jul 2022 • Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

7,871

Paper
Code

Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

no code implementations • 29 Aug 2021 • Injy Hamed, Pavel Denisov, Chia-Yu Li, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang Vu

In this paper, we present our work on code-switched Egyptian Arabic-English automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

IMS' Systems for the IWSLT 2021 Low-Resource Speech Translation Task

no code implementations • 30 Jun 2021 • Pavel Denisov, Manuel Mager, Ngoc Thang Vu

This paper describes the submission to the IWSLT 2021 Low-Resource Speech Translation Shared Task by IMS team.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

no code implementations • 3 Nov 2020 • Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

Multi-speaker speech recognition of unsegmented recordings has diverse applications such as meeting transcription and automatic subtitle generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

no code implementations • 3 Jul 2020 • Pavel Denisov, Ngoc Thang Vu

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps.

Natural Language Understanding speech-recognition +2

Paper
Add Code

ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents

1 code implementation • ACL 2020 • Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e. g. emotion recognition, engagement level prediction and backchanneling) conversational agents.

BIG-bench Machine Learning Emotion Recognition

Paper
Code

IMS-Speech: A Speech to Text Tool

no code implementations • 13 Aug 2019 • Pavel Denisov, Ngoc Thang Vu

We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials.

Ranked #4 on Speech Recognition on TUDA (using extra training data)

speech-recognition Speech Recognition

Paper
Add Code

End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

no code implementations • 13 Aug 2019 • Pavel Denisov, Ngoc Thang Vu

This paper presents our latest investigation on end-to-end automatic speech recognition (ASR) for overlapped speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Context-aware Neural-based Dialog Act Classification on Automatically Generated Transcriptions

no code implementations • 28 Feb 2019 • Daniel Ortega, Chia-Yu Li, Gisela Vallejo, Pavel Denisov, Ngoc Thang Vu

This paper presents our latest investigations on dialog act (DA) classification on automatically generated transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition

no code implementations • 30 Jul 2018 • Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font

In this paper, we investigate the use of adversarial learning for unsupervised adaptation to unseen recording conditions, more specifically, single microphone far-field speech.

Robust Speech Recognition speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.