About

Objective quality estimation of a speech sample.

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Datasets

Greatest papers with code

FastSpeech: Fast,Robustand Controllable Text-to-Speech

22 May 2019TensorSpeech/TensorflowTTS

Compared with traditional concatenative and statistical parametric approaches, neural network based end-to-end models suffer from slow inference speed, and the synthesized speech is usually not robust (i. e., some words are skipped or repeated) and lack of controllability (voice speed or prosody control).

SPEECH QUALITY TEXT-TO-SPEECH SYNTHESIS

FastSpeech: Fast, Robust and Controllable Text to Speech

NeurIPS 2019 as-ideas/TransformerTTS

In this work, we propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

SPEECH QUALITY SPEECH SYNTHESIS TEXT-TO-SPEECH SYNTHESIS

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

ICLR 2021 NVIDIA/flowtron

In this paper we propose Flowtron: an autoregressive flow-based generative network for text-to-speech synthesis with control over speech variation and style transfer.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (Pleasantness MOS metric)

SPEECH QUALITY SPEECH SYNTHESIS STYLE TRANSFER TEXT-TO-SPEECH SYNTHESIS

Utilizing Self-supervised Representations for MOS Prediction

7 Apr 2021s3prl/s3prl

In this paper, we use self-supervised pre-trained models for MOS prediction.

SPEECH QUALITY VOICE CONVERSION

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

23 Sep 2017r9y9/gantts

In the proposed framework incorporating the GANs, the discriminator is trained to distinguish natural and generated speech parameters, while the acoustic models are trained to minimize the weighted sum of the conventional minimum generation loss and an adversarial loss for deceiving the discriminator.

SPEECH QUALITY SPEECH SYNTHESIS VOICE CONVERSION

Interspeech 2021 Deep Noise Suppression Challenge

6 Jan 2021microsoft/DNS-Challenge

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.

DENOISING SPEECH QUALITY

Deep learning for minimum mean-square error approaches to speech enhancement

Speech communication 2019 anicolson/DeepXi

MMSE approaches utilising the proposed a priori SNR estimator are able to achieve higher enhanced speech quality and intelligibility scores than recent masking- and mapping-based deep learning approaches.

SPEECH ENHANCEMENT SPEECH QUALITY

DiffWave: A Versatile Diffusion Model for Audio Synthesis

ICLR 2021 lmnt-com/diffwave

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation.

SPEECH QUALITY SPEECH SYNTHESIS

Neural network based spectral mask estimation for acoustic beamforming

ICASSP 2016 fgnt/nn-gev

The network training is independent of the number and the geometric configuration of the microphones.

SPEECH QUALITY SPEECH RECOGNITION