Search Results for author: Thomas Drugman

Found 45 papers, 4 papers with code

Multi-Scale Spectrogram Modelling for Neural Text-to-Speech

no code implementations29 Jun 2021 Ammar Abbas, Bajibabu Bollepalli, Alexis Moinet, Arnaud Joly, Penny Karanasou, Peter Makarov, Simon Slangens, Sri Karlapati, Thomas Drugman

We propose a novel Multi-Scale Spectrogram (MSS) modelling approach to synthesise speech with an improved coarse and fine-grained prosody.

Voicy: Zero-Shot Non-Parallel Voice Conversion in Noisy Reverberant Environments

1 code implementation16 Jun 2021 Alejandro Mottini, Jaime Lorenzo-Trueba, Sri Vishnu Kumar Karlapati, Thomas Drugman

Voice Conversion (VC) is a technique that aims to transform the non-linguistic information of a source utterance to change the perceived identity of the speaker.

Voice Conversion

Weakly-supervised word-level pronunciation error detection in non-native English speech

no code implementations7 Jun 2021 Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek

To train this model, phonetically transcribed L2 speech is not required and we only need to mark mispronounced words.

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

no code implementations16 Jan 2021 Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman, Bozena Kostek

A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker.

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations29 Dec 2020 Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

no code implementations4 Nov 2020 Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman

In Stage II, we propose a novel method to sample from this learnt prosodic distribution using the contextual information available in text.

Graph Attention Representation Learning +1

Maximum Phase Modeling for Sparse Linear Prediction of Speech

no code implementations7 Jun 2020 Thomas Drugman

The proposed method is shown to significantly increase the sparsity of the LP residual signal and to be effective in two illustrative applications: speech polarity detection and excitation modeling.

Analysis and Synthesis of Hypo and Hyperarticulated Speech

no code implementations7 Jun 2020 Benjamin Picart, Thomas Drugman, Thierry Dutoit

This paper focuses on the analysis and synthesis of hypo and hyperarticulated speech in the framework of HMM-based speech synthesis.

Speech Quality Speech Synthesis

Data-driven Detection and Analysis of the Patterns of Creaky Voice

no code implementations31 May 2020 Thomas Drugman, John Kane, Christer Gobl

This paper investigates the temporal excitation patterns of creaky voice.

Residual Excitation Skewness for Automatic Speech Polarity Detection

no code implementations31 May 2020 Thomas Drugman

Detecting the correct speech polarity is a necessary step prior to several speech processing techniques.

Maximum Voiced Frequency Estimation: Exploiting Amplitude and Phase Spectra

no code implementations31 May 2020 Thomas Drugman, Yannis Stylianou

Recent studies have shown that its proper estimation and modeling enhance the quality of statistical parametric speech synthesizers.

Oscillating Statistical Moments for Speech Polarity Detection

no code implementations16 May 2020 Thomas Drugman, Thierry Dutoit

An inversion of the speech polarity may have a dramatic detrimental effect on the performance of various techniques of speech processing.

Glottal Source Estimation using an Automatic Chirp Decomposition

no code implementations16 May 2020 Thomas Drugman, Baris Bozkurt, Thierry Dutoit

In a previous work, we showed that the glottal source can be estimated from speech signals by computing the Zeros of the Z-Transform (ZZT).

Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis

no code implementations10 May 2020 Thomas Drugman, Thierry Dutoit

It was recently shown that complex cepstrum can be effectively used for glottal flow estimation by separating the causal and anticausal components of speech.

Eigenresiduals for improved Parametric Speech Synthesis

no code implementations2 Jan 2020 Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices.

Speech Synthesis

On the Mutual Information between Source and Filter Contributions for Voice Pathology Detection

no code implementations2 Jan 2020 Thomas Drugman, Thomas Dubuisson, Thierry Dutoit

This paper addresses the problem of automatic detection of voice pathologies directly from the speech signal.

A Comparative Evaluation of Pitch Modification Techniques

no code implementations2 Jan 2020 Thomas Drugman, Thierry Dutoit

This paper addresses the problem of pitch modification, as an important module for an efficient voice transformation system.

Excitation-based Voice Quality Analysis and Modification

no code implementations2 Jan 2020 Thomas Drugman, Thierry Dutoit, Baris Bozkurt

This paper investigates the differences occuring in the excitation for different voice qualities.

Speech Synthesis

Phase-based Information for Voice Pathology Detection

no code implementations2 Jan 2020 Thomas Drugman, Thomas Dubuisson, Thierry Dutoit

In most current approaches of speech processing, information is extracted from the magnitude spectrum.

Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

no code implementations30 Dec 2019 Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Via a systematic study of the windowing effects on the deconvolution quality, we show that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met.

Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

no code implementations30 Dec 2019 Thomas Drugman, Alexis Moinet, Thierry Dutoit, Geoffrey Wilfart

The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input.

Speech Synthesis

A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis

no code implementations29 Dec 2019 Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

For this, we hereby propose an adaptation of the Deterministic plus Stochastic Model (DSM) for the residual.

Speech Synthesis

Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation

no code implementations29 Dec 2019 Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Homomorphic analysis is a well-known method for the separation of non-linearly combined signals.

Glottal Source Processing: from Analysis to Applications

no code implementations29 Dec 2019 Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana

The great majority of current voice technology applications relies on acoustic features characterizing the vocal tract response, such as the widely used MFCC of LPC parameters.

A Comparative Study of Glottal Source Estimation Techniques

no code implementations28 Dec 2019 Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Techniques based on the mixed-phase decomposition and on a closed-phase inverse filtering process turn out to give the best results on both clean synthetic and real speech signals.

Glottal Closure and Opening Instant Detection from Speech Signals

no code implementations28 Dec 2019 Thomas Drugman, Thierry Dutoit

This paper proposes a new procedure to detect Glottal Closure and Opening Instants (GCIs and GOIs) directly from speech waveforms.

Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review

no code implementations28 Dec 2019 Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor, Thierry Dutoit

The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA).

Event Detection

Joint Robust Voicing Detection and Pitch Estimation Based on Residual Harmonics

no code implementations28 Dec 2019 Thomas Drugman, Abeer Alwan

This paper focuses on the problem of pitch tracking in noisy conditions.

Singing Synthesis: with a little help from my attention

no code implementations12 Dec 2019 Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman

We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions.

Voice Conversion for Whispered Speech Synthesis

no code implementations11 Dec 2019 Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet

We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech.

Speech Synthesis Voice Conversion

Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

no code implementations2 Dec 2019 Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba

Recent advances in Text-to-Speech (TTS) have improved quality and naturalness to near-human capabilities when considering isolated sentences.

Speech Synthesis

Towards achieving robust universal neural vocoding

1 code implementation4 Jul 2019 Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote, Alexis Moinet, Vatsal Aggarwal

This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality.

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

no code implementations4 Jul 2019 Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman

However, when trained on a single-speaker dataset, the conventional prosody transfer systems are not robust enough to speaker variability, especially in the case of a reference signal coming from an unseen speaker.

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

no code implementations7 Mar 2019 Thomas Drugman, Janne Pylkkonen, Reinhard Kneser

The goal of this paper is to simulate the benefits of jointly applying active learning (AL) and semi-supervised training (SST) in a new speech recognition application.

Active Learning Speech Recognition

Traditional Machine Learning for Pitch Detection

no code implementations4 Mar 2019 Thomas Drugman, Goeric Huybrechts, Viacheslav Klimkov, Alexis Moinet

In this paper, we consider voicing detection as a classification problem and F0 contour estimation as a regression problem.

Effect of data reduction on sequence-to-sequence neural TTS

no code implementations15 Nov 2018 Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman, Srikanth Ronanki, Klimkov Viacheslav

Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings.

Speech Synthesis

Robust universal neural vocoding

7 code implementations15 Nov 2018 Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages.

LSTM-based Whisper Detection

no code implementations20 Sep 2018 Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.

Cannot find the paper you are looking for? You can Submit a new open access paper.