Search Results for author: Pavel Denisov

Found 20 papers, 8 papers with code

Data Processing for the OpenGPT-X Model Family

no code implementations11 Oct 2024 Nicolo' Brandizzi, Hammam Abdelwahab, Anirban Bhowmick, Lennard Helmer, Benny Jörg Stein, Pavel Denisov, Qasid Saleem, Michael Fromm, Mehdi Ali, Richard Rutmann, Farzad Naderi, Mohamad Saif Agy, Alexander Schwirjow, Fabian Küch, Luzian Hahn, Malte Ostendorff, Pedro Ortiz Suarez, Georg Rehm, Dennis Wegener, Nicolas Flores-Herr, Joachim köhler, Johannes Leveling

This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs).

Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings

1 code implementation10 Sep 2024 Sakshi Deo Shukla, Pavel Denisov, Tugtekin Turan

In this paper, we introduce an end-to-end scheme that bypasses this conventional two-step process by directly employing semantic speech encoders for segmentation.

Automatic Speech Recognition Diversity +3

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

1 code implementation16 Apr 2024 Pavel Denisov, Ngoc Thang Vu

Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.

Language Modelling Large Language Model +3

The IMS Toucan System for the Blizzard Challenge 2023

1 code implementation26 Oct 2023 Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021.

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

1 code implementation9 Oct 2023 Pavel Denisov, Ngoc Thang Vu

A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling.

slot-filling Slot Filling +3

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

1 code implementation13 Oct 2022 Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu

In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.

Generative Adversarial Network Text to Speech

Speaker Anonymization with Phonetic Intermediate Representations

1 code implementation11 Jul 2022 Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

no code implementations3 Jul 2020 Pavel Denisov, Ngoc Thang Vu

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps.

Natural Language Understanding speech-recognition +2

ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents

1 code implementation ACL 2020 Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e. g. emotion recognition, engagement level prediction and backchanneling) conversational agents.

BIG-bench Machine Learning Emotion Recognition

IMS-Speech: A Speech to Text Tool

no code implementations13 Aug 2019 Pavel Denisov, Ngoc Thang Vu

We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials.

Ranked #4 on Speech Recognition on TUDA (using extra training data)

speech-recognition Speech Recognition

Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition

no code implementations30 Jul 2018 Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font

In this paper, we investigate the use of adversarial learning for unsupervised adaptation to unseen recording conditions, more specifically, single microphone far-field speech.

Robust Speech Recognition speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.