Search Results for author: Junichi Yamagishi

Found 120 papers, 45 papers with code

MOSNet: Deep Learning based Objective Assessment for Voice Conversion

6 code implementations • 17 Apr 2019 • Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.

Voice Conversion

830

Paper
Code

Estimating the confidence of speech spoofing countermeasure

1 code implementation • 10 Oct 2021 • Xin Wang, Junichi Yamagishi

On the ASVspoof2019 logical access database, the results demonstrate that an energy-based estimator and a neural-network-based one achieved acceptable performance in identifying unknown attacks in the test set.

295

Paper
Code

A Practical Guide to Logical Access Voice Presentation Attack Detection

1 code implementation • 10 Jan 2022 • Xin Wang, Junichi Yamagishi

Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable.

Artifact Detection Speaker Verification +2

295

Paper
Code

Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders

1 code implementation • 19 Oct 2022 • Xin Wang, Junichi Yamagishi

To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion.

295

Paper
Code

Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?

1 code implementation • 12 Sep 2023 • Xin Wang, Junichi Yamagishi

While many datasets use spoofed data generated by speech synthesis systems, it was recently found that data vocoded by neural vocoders were also effective as the spoofed training data.

Self-Supervised Learning Speech Synthesis

295

Paper
Code

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations • 23 Oct 2019 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

264

Paper
Code

Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?

1 code implementation • 4 May 2020 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi

This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.

Speech Synthesis

264

Paper
Code

MesoNet: a Compact Facial Video Forgery Detection Network

7 code implementations • 4 Sep 2018 • Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen

This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face.

DeepFake Detection Face Swapping +2

237

Paper
Code

ASVspoof 2021: Automatic Speaker Verification Spoofing and Countermeasures Challenge Evaluation Plan

1 code implementation • 1 Sep 2021 • Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi

The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.

Face Swapping Speaker Verification

165

Paper
Code

Investigating self-supervised front ends for speech spoofing countermeasures

1 code implementation • 15 Nov 2021 • Xin Wang, Junichi Yamagishi

Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks.

Face Swapping

165

Paper
Code

Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos

3 code implementations • 26 Oct 2018 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Recent advances in media generation techniques have made it easier for attackers to create forged images and videos.

Image and Video Forgery Detection

115

Paper
Code

Use of a Capsule Network to Detect Fake Images and Videos

2 code implementations • 28 Oct 2019 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning.

Image and Video Forgery Detection

115

Paper
Code

Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language

1 code implementation • 29 Oct 2018 • Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi

Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.

Speech Synthesis Text-To-Speech Synthesis

111

Paper
Code

Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos

1 code implementation • 17 Jun 2019 • Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen

The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance.

Binary Classification Face Swapping +2

Paper
Code

Automatic speaker verification spoofing and deepfake detection using wav2vec 2.0 and data augmentation

1 code implementation • 24 Feb 2022 • Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans

The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data.

Data Augmentation DeepFake Detection +3

Paper
Code

Generalization Ability of MOS Prediction Networks

1 code implementation • 6 Oct 2021 • Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi

Automatic methods to predict listener opinions of synthesized speech remain elusive since listeners, systems being evaluated, characteristics of the speech, and even the instructions given and the rating scale all vary from test to test.

Paper
Code

The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment

2 code implementations • 19 May 2020 • Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul-Gauthier Noe, Jean-Francois Bonastre, Massimiliano Todisco, Nicholas Evans

Mounting privacy legislation calls for the preservation of privacy in speech technology, though solutions are gravely lacking.

Cryptography and Security Audio and Speech Processing

Paper
Code

The VoicePrivacy 2022 Challenge Evaluation Plan

1 code implementation • 23 Mar 2022 • Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre

Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers.

Speaker Verification

Paper
Code

Introducing the VoicePrivacy Initiative

3 code implementations • 4 May 2020 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

Paper
Code

The VoicePrivacy 2020 Challenge: Results and findings

1 code implementation • 1 Sep 2021 • Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche

We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results.

Paper
Code

LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

1 code implementation • 18 Oct 2021 • Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda

An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores.

Voice Conversion

Paper
Code

The VoicePrivacy 2020 Challenge Evaluation Plan

1 code implementation • 14 May 2022 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco

The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.

Benchmarking

Paper
Code

Speech waveform synthesis from MFCC sequences with generative adversarial networks

1 code implementation • 3 Apr 2018 • Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.

Generative Adversarial Network Speech Synthesis

Paper
Code

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

1 code implementation • Interspeech 2020 • Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.

Audio and Speech Processing Sound

Paper
Code

Attentive Filtering Networks for Audio Replay Attack Detection

1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King

In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.

Speaker Verification

Paper
Code

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

1 code implementation • 4 Apr 2021 • Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.

Ranked #1 on Speaker Verification on CN-CELEB

Speaker Verification

Paper
Code

STFT spectral loss for training a neural speech waveform model

1 code implementation • 29 Oct 2018 • Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi

This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly.

Paper
Code

Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model

1 code implementation • 10 Nov 2019 • Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi

Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.

Sound Audio and Speech Processing

Paper
Code

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

1 code implementation • 4 May 2021 • Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi

This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data.

Disentanglement

Paper
Code

GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram

1 code implementation • 8 Apr 2019 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku

Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech.

Speech Synthesis

Paper
Code

Range-Based Equal Error Rate for Spoof Localization

1 code implementation • 28 May 2023 • Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

To properly measure misclassified ranges and better evaluate spoof localization performance, we upgrade point-based EER to range-based EER.

Paper
Code

The VoicePrivacy 2024 Challenge Evaluation Plan

1 code implementation • 3 Apr 2024 • Natalia Tomashenko, Xiaoxiao Miao, Pierre Champion, Sarina Meyer, Xin Wang, Emmanuel Vincent, Michele Panariello, Nicholas Evans, Junichi Yamagishi, Massimiliano Todisco

The task of the challenge is to develop a voice anonymization system for speech data which conceals the speaker's voice identity while protecting linguistic content and emotional states.

Paper
Code

Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement

1 code implementation • 17 Apr 2021 • Haoyu Li, Junichi Yamagishi

The intelligibility of speech severely degrades in the presence of environmental noise and reverberation.

Paper
Code

Learning Disentangled Phone and Speaker Representations in a Semi-Supervised VQ-VAE Paradigm

1 code implementation • 21 Oct 2020 • Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi

Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions.

speaker-diarization Speaker Diarization +1

Paper
Code

Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline

1 code implementation • 29 Nov 2022 • Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf

The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes.

Voice Conversion

Paper
Code

An initial investigation on optimizing tandem speaker verification and countermeasure systems using reinforcement learning

1 code implementation • 6 Feb 2020 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other.

Speaker Verification

Paper
Code

Fashion-Guided Adversarial Attack on Person Segmentation

1 code implementation • 17 Apr 2021 • Marc Treu, Trung-Nghia Le, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

It generates adversarial textures learned from fashion style images and then overlays them on the clothing regions in the original image to make all persons in the image invisible to person segmentation networks.

Adversarial Attack Human Instance Segmentation +2

Paper
Code

A Multi-Level Attention Model for Evidence-Based Fact Checking

1 code implementation • Findings (ACL) 2021 • Canasai Kruengkrai, Junichi Yamagishi, Xin Wang

Evidence-based fact checking aims to verify the truthfulness of a claim against evidence extracted from textual sources.

Fact Checking Sentence

Paper
Code

Visualizing Classifier Adjacency Relations: A Case Study in Speaker Verification and Voice Anti-Spoofing

1 code implementation • 11 Jun 2021 • Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee

Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.

Speaker Verification Voice Anti-spoofing

Paper
Code

Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds

1 code implementation • 24 Jul 2021 • Xuan Shi, Erica Cooper, Junichi Yamagishi

Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer.

Data Augmentation Instrument Recognition +4

Paper
Code

Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting

1 code implementation • 8 Oct 2023 • Hemant Yadav, Erica Cooper, Junichi Yamagishi, Sunayana Sitaram, Rajiv Ratn Shah

That is the partial rank similarity is measured (PRS) rather than the individual MOS values as with the L1 loss.

Speech Synthesis

Paper
Code

Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms

1 code implementation • Interspeech 2023 • Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi

The ability of countermeasure models to generalize from seen speech synthesis methods to unseen ones has been investigated in the ASVspoof challenge.

Speech Synthesis

Paper
Code

XFEVER: Exploring Fact Verification across Languages

1 code implementation • 25 Oct 2023 • Yi-Chen Chang, Canasai Kruengkrai, Junichi Yamagishi

Experimental results show that the multilingual language model can be used to build fact verification models in different languages efficiently.

Benchmarking Fact Verification +3

Paper
Code

t-DCF: a Detection Cost Function for the Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification

no code implementations • 25 Apr 2018 • Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.

Speaker Verification

Paper
Add Code

Speaker-independent raw waveform model for glottal excitation

no code implementations • 25 Apr 2018 • Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i. e., generating speech waveforms from acoustic features.

Speech Synthesis Text-To-Speech Synthesis +1

Paper
Add Code

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling

As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.

Benchmarking Speaker Verification +1

Paper
Add Code

Transformation on Computer-Generated Facial Image to Avoid Detection by Spoofing Detector

no code implementations • 12 Apr 2018 • Huy H. Nguyen, Ngoc-Dung T. Tieu, Hoang-Quoc Nguyen-Son, Junichi Yamagishi, Isao Echizen

Making computer-generated (CG) images more difficult to detect is an interesting problem in computer graphics and security.

Paper
Add Code

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.

Voice Conversion

Paper
Add Code

A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis

no code implementations • 7 Apr 2018 • Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi

Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.

Speech Synthesis

Paper
Add Code

High-quality nonparallel voice conversion based on cycle-consistent adversarial network

no code implementations • 2 Apr 2018 • Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba

Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data.

Generative Adversarial Network Image-to-Image Translation +4

Paper
Add Code

Complex-Valued Restricted Boltzmann Machine for Direct Speech Parameterization from Complex Spectra

no code implementations • 27 Mar 2018 • Toru Nakashika, Shinji Takaki, Junichi Yamagishi

In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units).

Paper
Add Code

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

no code implementations • 2 Mar 2018 • Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.

Generative Adversarial Network Speech Enhancement +2

Paper
Add Code

Deep Denoising Auto-encoder for Statistical Speech Synthesis

no code implementations • 17 Jun 2015 • Zhenzhou Wu, Shinji Takaki, Junichi Yamagishi

This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis.

Denoising Speech Synthesis

Paper
Add Code

Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis

no code implementations • 30 Jul 2018 • Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi

Generating versatile and appropriate synthetic speech requires control over the output expression separate from the spoken text.

Acoustic Modelling Emotional Speech Synthesis +1

Paper
Add Code

Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder

no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu

In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.

Generative Adversarial Network Speech Synthesis +1

Paper
Add Code

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

no code implementations • 31 Jul 2018 • Hieu-Thi Luong, Junichi Yamagishi

Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches.

Speech Synthesis

Paper
Add Code

Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects

no code implementations • 2 Aug 2018 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder.

Denoising Speech Synthesis

Paper
Add Code

Multimodal speech synthesis architecture for unsupervised speaker adaptation

no code implementations • 20 Aug 2018 • Hieu-Thi Luong, Junichi Yamagishi

Two new training schemes for the new architecture are also proposed in this paper.

Speech Synthesis

Paper
Add Code

Neural source-filter-based waveform model for statistical parametric speech synthesis

no code implementations • 29 Oct 2018 • Xin Wang, Shinji Takaki, Junichi Yamagishi

Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure.

Speech Synthesis

Paper
Add Code

Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks

no code implementations • 30 Oct 2018 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku

The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet.

Image Generation Speech Synthesis +2

Paper
Add Code

Audiovisual speaker conversion: jointly and simultaneously transforming facial expression and acoustic characteristics

no code implementations • 29 Oct 2018 • Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen

Transforming the facial and acoustic features together makes it possible for the converted voice and facial expressions to be highly correlated and for the generated target speaker to appear and sound natural.

Image Reconstruction

Paper
Add Code

Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM

no code implementations • COLING 2016 • Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Ascension Gallardo-Antolin, Junichi Yamagishi, Juan M. Montero

This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text.

Expressive Speech Synthesis Speech Recognition

Paper
Add Code

Identifying Computer-Translated Paragraphs using Coherence Features

no code implementations • PACLIC 2018 • Hoang-Quoc Nguyen-Son, Ngoc-Dung T. Tieu, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences.

Paper generation Translation

Paper
Add Code

Introduction to Voice Presentation Attack Detection and Recent Advances

no code implementations • 4 Jan 2019 • Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee

Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV).

Benchmarking Speaker Recognition

Paper
Add Code

A Comparison of Manual and Automatic Voice Repair for Individual with Vocal Disabilities

no code implementations • WS 2015 • Christophe Veaux, Junichi Yamagishi, Simon King

Speech Synthesis

Paper
Add Code

Towards Personalised Synthesised Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

no code implementations • WS 2013 • Christophe Veaux, Junichi Yamagishi, Simon King

Speech Synthesis

Paper
Add Code

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

no code implementations • 29 Mar 2019 • Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi

We propose using an extended model architecture of Tacotron, that is a multi-source sequence-to-sequence model with a dual attention mechanism as the shared model for both the TTS and VC tasks.

Speech Synthesis Voice Conversion

Paper
Add Code

Training a Neural Speech Waveform Model using Spectral Losses of Short-Time Fourier Transform and Continuous Wavelet Transform

no code implementations • 29 Mar 2019 • Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi

Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model.

Paper
Add Code

Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora

no code implementations • 1 Apr 2019 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa

When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead.

Paper
Add Code

Neural source-filter waveform models for statistical parametric speech synthesis

no code implementations • 27 Apr 2019 • Xin Wang, Shinji Takaki, Junichi Yamagishi

Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation.

Speech Synthesis

Paper
Add Code

Speaker Anonymization Using X-vector and Neural Waveform Models

no code implementations • 30 May 2019 • Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre

One solution to mitigate these concerns involves the concealing of speaker identities before the sharing of speech data.

Speaker Verification Speech Synthesis

Paper
Add Code

A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation

no code implementations • 18 Jun 2019 • Hieu-Thi Luong, Junichi Yamagishi

In this study, we propose a novel speech synthesis model, which can be adapted to unseen speakers by fine-tuning part of or all of the network using either transcribed or untranscribed speech.

Speech Synthesis

Paper
Add Code

Generating Sentiment-Preserving Fake Online Reviews Using Neural Language Models and Their Human- and Machine-based Detection

no code implementations • 22 Jul 2019 • David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences.

Paper
Add Code

Initial investigation of an encoder-decoder end-to-end TTS framework using marginalization of monotonic hard latent alignments

no code implementations • 30 Aug 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function.

Paper
Add Code

Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech

no code implementations • 14 Sep 2019 • Hieu-Thi Luong, Junichi Yamagishi

Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice.

Voice Conversion

Paper
Add Code

Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment

no code implementations • 28 Oct 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods.

Hard Attention Speech Synthesis +1

Paper
Add Code

Transferring neural speech waveform synthesizers to musical instrument sounds generation

no code implementations • 27 Oct 2019 • Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi

Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation.

Audio Generation Audio Synthesis +2

Paper
Add Code

Security of Facial Forensics Models Against Adversarial Attacks

no code implementations • 2 Nov 2019 • Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

We experimentally demonstrated the existence of individual adversarial perturbations (IAPs) and universal adversarial perturbations (UAPs) that can lead a well-performed FFM to misbehave.

Paper
Add Code

A Method for Identifying Origin of Digital Images Using a Convolution Neural Network

no code implementations • 2 Nov 2019 • Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

The rapid development of deep learning techniques has created new challenges in identifying the origin of digital images because generative adversarial networks and variational autoencoders can create plausible digital images whose contents are not present in natural scenes.

Paper
Add Code

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.

Person Recognition Speaker Verification +2

Paper
Add Code

Detecting and Correcting Adversarial Images Using Image Processing Operations

no code implementations • 11 Dec 2019 • Huy H. Nguyen, Minoru Kuribayashi, Junichi Yamagishi, Isao Echizen

Deep neural networks (DNNs) have achieved excellent performance on several tasks and have been widely applied in both academia and industry.

BIG-bench Machine Learning Object Recognition

Paper
Add Code

Design Choices for X-vector Based Speaker Anonymization

no code implementations • 18 May 2020 • Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi

The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker.

Speaker Verification

Paper
Add Code

Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis

no code implementations • 20 May 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

NAUTILUS: a Versatile Voice Cloning System

no code implementations • 22 May 2020 • Hieu-Thi Luong, Junichi Yamagishi

By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untranscribed speech of target speakers on the basis of the backpropagation algorithm.

Speech Synthesis Voice Cloning +1

Paper
Add Code

Generating Master Faces for Use in Performing Wolf Attacks on Face Recognition Systems

no code implementations • 15 Jun 2020 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen, Sébastien Marcel

In this work, we demonstrated that wolf (generic) faces, which we call "master faces," can also compromise face recognition systems and that the master face concept can be generalized in some cases.

Face Recognition

Paper
Add Code

Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

no code implementations • 8 Apr 2020 • Haoyu Li, Junichi Yamagishi

In recent years, speech enhancement (SE) has achieved impressive progress with the success of deep neural networks (DNNs).

Audio and Speech Processing

Paper
Add Code

Tandem Assessment of Spoofing Countermeasures and Automatic Speaker Verification: Fundamentals

no code implementations • 12 Jul 2020 • Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds

Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.

Speaker Verification

Paper
Add Code

Viable Threat on News Reading: Generating Biased News Using Natural Language Models

no code implementations • EMNLP (NLP+CSS) 2020 • Saurabh Gupta, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

Recent advancements in natural language generation has raised serious concerns.

Text Generation

Paper
Add Code

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

no code implementations • 8 Oct 2020 • Hieu-Thi Luong, Junichi Yamagishi

As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system.

Voice Cloning Voice Conversion

Paper
Add Code

End-to-End Text-to-Speech using Latent Duration based on VQ-VAE

no code implementations • 19 Oct 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi

Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS).

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems

no code implementations • 21 Oct 2020 • Antoine Perquin, Erica Cooper, Junichi Yamagishi

Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.

Relation Speech Synthesis +1

Paper
Add Code

Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis

no code implementations • 10 Nov 2020 • Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi

We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.

Speech Synthesis

Paper
Add Code

ASVspoof 2019: spoofing countermeasures for the detection of synthesized, converted and replayed speech

no code implementations • 11 Feb 2021 • Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee

The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV).

Speaker Verification Speech Synthesis +2

Paper
Add Code

An Initial Investigation for Detecting Partially Spoofed Audio

no code implementations • 6 Apr 2021 • Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino, Nicholas Evans

By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances.

Voice Anti-spoofing

Paper
Add Code

Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance

no code implementations • 25 Jun 2021 • Hieu-Thi Luong, Junichi Yamagishi

Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers.

Quantization Speech Synthesis +1

Paper
Add Code

SVSNet: An End-to-end Speaker Voice Similarity Assessment Model

no code implementations • 20 Jul 2021 • Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang

Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.

Voice Conversion Voice Similarity

Paper
Add Code

OpenForensics: Large-Scale Challenging Dataset For Multi-Face Forgery Detection And Segmentation In-The-Wild

no code implementations • ICCV 2021 • Trung-Nghia Le, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

To promote these new tasks, we have created the first large-scale dataset posing a high level of challenges that is designed with face-wise rich annotations explicitly for face forgery detection and segmentation, namely OpenForensics.

Face Detection Face Swapping +1

Paper
Add Code

ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection

no code implementations • 1 Sep 2021 • Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado

In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.

Face Swapping Speaker Verification

Paper
Add Code

Master Face Attacks on Face Recognition Systems

no code implementations • 8 Sep 2021 • Huy H. Nguyen, Sébastien Marcel, Junichi Yamagishi, Isao Echizen

Previous work has proven the existence of master faces, i. e., faces that match multiple enrolled templates in face recognition systems, and their existence extends the ability of presentation attacks.

Face Recognition

Paper
Add Code

DDS: A new device-degraded speech dataset for speech enhancement

no code implementations • 16 Sep 2021 • Haoyu Li, Junichi Yamagishi

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality.

Speech Enhancement

Paper
Add Code

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Are end-to-end text-to-speech (TTS) models over-parametrized?

Knowledge Distillation Speech Synthesis

Paper
Add Code

LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example

no code implementations • 11 Oct 2021 • Hieu-Thi Luong, Junichi Yamagishi

Emotional and controllable speech synthesis is a topic that has received much attention.

Speech Synthesis

Paper
Add Code

Revisiting Speech Content Privacy

no code implementations • 13 Oct 2021 • Jennifer Williams, Junichi Yamagishi, Paul-Gauthier Noe, Cassia Valentini Botinhao, Jean-Francois Bonastre

In this paper, we discuss an important aspect of speech privacy: protecting spoken content.

Paper
Add Code

Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio

no code implementations • 25 Nov 2021 • Khanh-Duy Nguyen, Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen

However, there is still a lack of comprehensive research on both methodologies and datasets.

object-detection Object Detection +1

Paper
Add Code

Optimizing Tandem Speaker Verification and Anti-Spoofing Systems

no code implementations • 24 Jan 2022 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security.

Speaker Verification

Paper
Add Code

Robust Deepfake On Unrestricted Media: Generation And Detection

no code implementations • 13 Feb 2022 • Trung-Nghia Le, Huy H Nguyen, Junichi Yamagishi, Isao Echizen

Recent advances in deep learning have led to substantial improvements in deepfake generation, resulting in fake media with a more realistic appearance.

DeepFake Detection Face Swapping

Paper
Add Code

Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement

no code implementations • 22 Mar 2022 • Haoyu Li, Yun Liu, Junichi Yamagishi

Speech enhancement (SE) methods mainly focus on recovering clean speech from noisy input.

Speech Enhancement

Paper
Add Code

The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance

no code implementations • 11 Apr 2022 • Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi

Since the short spoofed speech segments to be embedded by attackers are of variable length, six different temporal resolutions are considered, ranging from as short as 20 ms to as large as 640 ms. Third, we propose a new CM that enables the simultaneous use of the segment-level labels at different temporal resolutions as well as utterance-level labels to execute utterance- and segment-level detection at the same time.

Speaker Verification Speech Synthesis +2

Paper
Add Code

Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances

no code implementations • 1 Sep 2022 • Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi

Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring.

Data Augmentation Speaker Verification

Paper
Add Code

Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022

no code implementations • 1 Sep 2022 • Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi

Current state-of-the-art automatic speaker verification (ASV) systems are vulnerable to presentation attacks, and several countermeasures (CMs), which distinguish bona fide trials from spoofing ones, have been explored to protect ASV.

Speaker Verification

Paper
Add Code

Mitigating the Diminishing Effect of Elastic Weight Consolidation

no code implementations • COLING 2022 • Canasai Kruengkrai, Junichi Yamagishi

Elastic weight consolidation (EWC, Kirkpatrick et al. 2017) is a promising approach to addressing catastrophic forgetting in sequential training.

Fact Checking Natural Language Inference

Paper
Add Code

Analysis of Master Vein Attacks on Finger Vein Recognition Systems

no code implementations • 18 Oct 2022 • Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen

The results raise the alarm about the robustness of such systems and suggest that master vein attacks should be considered an important security measure.

Finger Vein Recognition

Paper
Add Code

Outlier-Aware Training for Improving Group Accuracy Disparities

no code implementations • 27 Oct 2022 • Li-Kuang Chen, Canasai Kruengkrai, Junichi Yamagishi

Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107. 09044v2) involve reweighting a subset of the training set to maximize the worst-group accuracy.

Paper
Add Code

Investigating Range-Equalizing Bias in Mean Opinion Score Ratings of Synthesized Speech

no code implementations • 17 May 2023 • Erica Cooper, Junichi Yamagishi

Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech.

Paper
Add Code

Towards single integrated spoofing-aware speaker verification embeddings

1 code implementation • 30 May 2023 • Sung Hwan Mun, Hye-jin Shim, Hemlata Tak, Xin Wang, Xuechen Liu, Md Sahidullah, Myeonghun Jeong, Min Hyun Han, Massimiliano Todisco, Kong Aik Lee, Junichi Yamagishi, Nicholas Evans, Tomi Kinnunen, Nam Soo Kim, Jee-weon Jung

Second, competitive performance should be demonstrated compared to the fusion of automatic speaker verification (ASV) and countermeasure (CM) embeddings, which outperformed single embedding solutions by a large margin in the SASV2022 challenge.

Speaker Verification

Paper
Code

How Close are Other Computer Vision Tasks to Deepfake Detection?

no code implementations • 2 Oct 2023 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen

In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection.

DeepFake Detection Face Recognition +1

Paper
Add Code

The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains

no code implementations • 4 Oct 2023 • Erica Cooper, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda, Junichi Yamagishi

We present the second edition of the VoiceMOS Challenge, a scientific event that aims to promote the study of automatic prediction of the mean opinion score (MOS) of synthesized and processed speech.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer

no code implementations • 7 Sep 2023 • Kunkun Pang, Dafei Qin, Yingruo Fan, Julian Habekost, Takaaki Shiratori, Junichi Yamagishi, Taku Komura

Learning the mapping between speech and 3D full-body gestures is difficult due to the stochastic nature of the problem and the lack of a rich cross-modal dataset that is needed for training.

Paper
Add Code

Uncertainty as a Predictor: Leveraging Self-Supervised Learning for Zero-Shot MOS Prediction

no code implementations • 25 Dec 2023 • Aditya Ravuri, Erica Cooper, Junichi Yamagishi

Predicting audio quality in voice synthesis and conversion systems is a critical yet challenging task, especially when traditional methods like Mean Opinion Scores (MOS) are cumbersome to collect at scale.

Self-Supervised Learning

Paper
Add Code

Bridging Textual and Tabular Worlds for Fact Verification: A Lightweight, Attention-Based Model

1 code implementation • 26 Mar 2024 • Shirin Dabbaghi Varnosfaderani, Canasai Kruengkrai, Ramin Yahyapour, Junichi Yamagishi

FEVEROUS is a benchmark and research initiative focused on fact extraction and verification tasks involving unstructured text and structured tabular data.

Fact Verification

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.