no code implementations • COLING 2022 • Canasai Kruengkrai, Junichi Yamagishi
Elastic weight consolidation (EWC, Kirkpatrick et al. 2017) is a promising approach to addressing catastrophic forgetting in sequential training.
no code implementations • 18 May 2023 • Chang Zeng, Xin Wang, Xiaoxiao Miao, Erica Cooper, Junichi Yamagishi
The ability of countermeasure models to generalize from seen speech synthesis methods to unseen ones has been investigated in the ASVspoof challenge.
no code implementations • 17 May 2023 • Erica Cooper, Junichi Yamagishi
Mean Opinion Score (MOS) is a popular measure for evaluating synthesized speech.
1 code implementation • 29 Nov 2022 • Paul-Gauthier Noé, Xiaoxiao Miao, Xin Wang, Junichi Yamagishi, Jean-François Bonastre, Driss Matrouf
The use of modern vocoders in an analysis/synthesis pipeline allows us to investigate high-quality voice conversion that can be used for privacy purposes.
no code implementations • 27 Oct 2022 • Li-Kuang Chen, Canasai Kruengkrai, Junichi Yamagishi
Methods addressing spurious correlations such as Just Train Twice (JTT, arXiv:2107. 09044v2) involve reweighting a subset of the training set to maximize the worst-group accuracy.
1 code implementation • 19 Oct 2022 • Xin Wang, Junichi Yamagishi
To make better use of pairs of bona fide and spoofed data, this study introduces a contrastive feature loss that can be plugged into the standard training criterion.
no code implementations • 18 Oct 2022 • Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen
The results raise the alarm about the robustness of such systems and suggest that master vein attacks should be considered an important security measure.
no code implementations • 1 Sep 2022 • Chang Zeng, Lin Zhang, Meng Liu, Junichi Yamagishi
Current state-of-the-art automatic speaker verification (ASV) systems are vulnerable to presentation attacks, and several countermeasures (CMs), which distinguish bona fide trials from spoofing ones, have been explored to protect ASV.
no code implementations • 1 Sep 2022 • Chang Zeng, Xiaoxiao Miao, Xin Wang, Erica Cooper, Junichi Yamagishi
Conventional automatic speaker verification systems can usually be decomposed into a front-end model such as time delay neural network (TDNN) for extracting speaker embeddings and a back-end model such as statistics-based probabilistic linear discriminant analysis (PLDA) or neural network-based neural PLDA (NPLDA) for similarity scoring.
1 code implementation • 14 May 2022 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
The VoicePrivacy Challenge aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.
no code implementations • 11 Apr 2022 • Lin Zhang, Xin Wang, Erica Cooper, Nicholas Evans, Junichi Yamagishi
Since the short spoofed speech segments to be embedded by attackers are of variable length, six different temporal resolutions are considered, ranging from as short as 20 ms to as large as 640 ms. Third, we propose a new CM that enables the simultaneous use of the segment-level labels at different temporal resolutions as well as utterance-level labels to execute utterance- and segment-level detection at the same time.
1 code implementation • 23 Mar 2022 • Natalia Tomashenko, Xin Wang, Xiaoxiao Miao, Hubert Nourtel, Pierre Champion, Massimiliano Todisco, Emmanuel Vincent, Nicholas Evans, Junichi Yamagishi, Jean-François Bonastre
Participants apply their developed anonymization systems, run evaluation scripts and submit objective evaluation results and anonymized speech data to the organizers.
no code implementations • 22 Mar 2022 • Haoyu Li, Yun Liu, Junichi Yamagishi
Speech enhancement (SE) methods mainly focus on recovering clean speech from noisy input.
no code implementations • 24 Feb 2022 • Hemlata Tak, Massimiliano Todisco, Xin Wang, Jee-weon Jung, Junichi Yamagishi, Nicholas Evans
The performance of spoofing countermeasure systems depends fundamentally upon the use of sufficiently representative training data.
no code implementations • 13 Feb 2022 • Trung-Nghia Le, Huy H Nguyen, Junichi Yamagishi, Isao Echizen
Recent advances in deep learning have led to substantial improvements in deepfake generation, resulting in fake media with a more realistic appearance.
no code implementations • 24 Jan 2022 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security.
1 code implementation • 10 Jan 2022 • Xin Wang, Junichi Yamagishi
Presentation attack detection (PAD) for ASV, or speech anti-spoofing, is therefore indispensable.
no code implementations • 25 Nov 2021 • Khanh-Duy Nguyen, Huy H. Nguyen, Trung-Nghia Le, Junichi Yamagishi, Isao Echizen
However, there is still a lack of comprehensive research on both methodologies and datasets.
1 code implementation • 15 Nov 2021 • Xin Wang, Junichi Yamagishi
Self-supervised speech model is a rapid progressing research topic, and many pre-trained models have been released and used in various down stream tasks.
1 code implementation • 18 Oct 2021 • Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda
An effective approach to automatically predict the subjective rating for synthetic speech is to train on a listening test dataset with human-annotated scores.
no code implementations • 13 Oct 2021 • Jennifer Williams, Junichi Yamagishi, Paul-Gauthier Noe, Cassia Valentini Botinhao, Jean-Francois Bonastre
In this paper, we discuss an important aspect of speech privacy: protecting spoken content.
no code implementations • 11 Oct 2021 • Hieu-Thi Luong, Junichi Yamagishi
Emotional and controllable speech synthesis is a topic that has received much attention.
1 code implementation • 10 Oct 2021 • Xin Wang, Junichi Yamagishi
On the ASVspoof2019 logical access database, the results demonstrate that an energy-based estimator and a neural-network-based one achieved acceptable performance in identifying unknown attacks in the test set.
1 code implementation • 6 Oct 2021 • Erica Cooper, Wen-Chin Huang, Tomoki Toda, Junichi Yamagishi
Automatic methods to predict listener opinions of synthesized speech remain elusive since listeners, systems being evaluated, characteristics of the speech, and even the instructions given and the rating scale all vary from test to test.
no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
Are end-to-end text-to-speech (TTS) models over-parametrized?
no code implementations • 16 Sep 2021 • Haoyu Li, Junichi Yamagishi
A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality.
no code implementations • 8 Sep 2021 • Huy H. Nguyen, Sébastien Marcel, Junichi Yamagishi, Isao Echizen
Previous work has proven the existence of master faces, i. e., faces that match multiple enrolled templates in face recognition systems, and their existence extends the ability of presentation attacks.
no code implementations • 1 Sep 2021 • Junichi Yamagishi, Xin Wang, Massimiliano Todisco, Md Sahidullah, Jose Patino, Andreas Nautsch, Xuechen Liu, Kong Aik Lee, Tomi Kinnunen, Nicholas Evans, Héctor Delgado
In addition to a continued focus upon logical and physical access tasks in which there are a number of advances compared to previous editions, ASVspoof 2021 introduces a new task involving deepfake speech detection.
1 code implementation • 1 Sep 2021 • Héctor Delgado, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Xuechen Liu, Andreas Nautsch, Jose Patino, Md Sahidullah, Massimiliano Todisco, Xin Wang, Junichi Yamagishi
The automatic speaker verification spoofing and countermeasures (ASVspoof) challenge series is a community-led initiative which aims to promote the consideration of spoofing and the development of countermeasures.
1 code implementation • 1 Sep 2021 • Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Jose Patino, Brij Mohan Lal Srivastava, Paul-Gauthier Noé, Andreas Nautsch, Nicholas Evans, Junichi Yamagishi, Benjamin O'Brien, Anaïs Chanclu, Jean-François Bonastre, Massimiliano Todisco, Mohamed Maouche
We provide a systematic overview of the challenge design with an analysis of submitted systems and evaluation results.
no code implementations • ICCV 2021 • Trung-Nghia Le, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
To promote these new tasks, we have created the first large-scale dataset posing a high level of challenges that is designed with face-wise rich annotations explicitly for face forgery detection and segmentation, namely OpenForensics.
1 code implementation • 24 Jul 2021 • Xuan Shi, Erica Cooper, Junichi Yamagishi
Constructing an embedding space for musical instrument sounds that can meaningfully represent new and unseen instruments is important for downstream music generation tasks such as multi-instrument synthesis and timbre transfer.
no code implementations • 20 Jul 2021 • Cheng-Hung Hu, Yu-Huai Peng, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
Neural evaluation metrics derived for numerous speech generation tasks have recently attracted great attention.
no code implementations • 25 Jun 2021 • Hieu-Thi Luong, Junichi Yamagishi
Generally speaking, the main objective when training a neural speech synthesis system is to synthesize natural and expressive speech from the output layer of the neural network without much attention given to the hidden layers.
1 code implementation • 11 Jun 2021 • Tomi Kinnunen, Andreas Nautsch, Md Sahidullah, Nicholas Evans, Xin Wang, Massimiliano Todisco, Héctor Delgado, Junichi Yamagishi, Kong Aik Lee
Whether it be for results summarization, or the analysis of classifier fusion, some means to compare different classifiers can often provide illuminating insight into their behaviour, (dis)similarity or complementarity.
1 code implementation • Findings (ACL) 2021 • Canasai Kruengkrai, Junichi Yamagishi, Xin Wang
Evidence-based fact checking aims to verify the truthfulness of a claim against evidence extracted from textual sources.
1 code implementation • 4 May 2021 • Jennifer Williams, Jason Fong, Erica Cooper, Junichi Yamagishi
This work examines the content and usefulness of disentangled phone and speaker representations from two separately trained VQ-VAE systems: one trained on multilingual data and another trained on monolingual data.
1 code implementation • 17 Apr 2021 • Marc Treu, Trung-Nghia Le, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
It generates adversarial textures learned from fashion style images and then overlays them on the clothing regions in the original image to make all persons in the image invisible to person segmentation networks.
1 code implementation • 17 Apr 2021 • Haoyu Li, Junichi Yamagishi
The intelligibility of speech severely degrades in the presence of environmental noise and reverberation.
no code implementations • 6 Apr 2021 • Lin Zhang, Xin Wang, Erica Cooper, Junichi Yamagishi, Jose Patino, Nicholas Evans
By definition, partially-spoofed utterances contain a mix of both spoofed and bona fide segments, which will likely degrade the performance of countermeasures trained with entirely spoofed utterances.
1 code implementation • 4 Apr 2021 • Chang Zeng, Xin Wang, Erica Cooper, Xiaoxiao Miao, Junichi Yamagishi
Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities.
Ranked #1 on
Speaker Verification
on CN-CELEB
no code implementations • 11 Feb 2021 • Andreas Nautsch, Xin Wang, Nicholas Evans, Tomi Kinnunen, Ville Vestman, Massimiliano Todisco, Héctor Delgado, Md Sahidullah, Junichi Yamagishi, Kong Aik Lee
The ASVspoof initiative was conceived to spearhead research in anti-spoofing for automatic speaker verification (ASV).
no code implementations • 10 Nov 2020 • Erica Cooper, Xin Wang, Yi Zhao, Yusuke Yasuda, Junichi Yamagishi
We explore pretraining strategies including choice of base corpus with the aim of choosing the best strategy for zero-shot multi-speaker end-to-end synthesis.
1 code implementation • 21 Oct 2020 • Jennifer Williams, Yi Zhao, Erica Cooper, Junichi Yamagishi
Additionally, phones can be recognized from sub-phone VQ codebook indices in our semi-supervised VQ-VAE better than self-supervised with global conditions.
no code implementations • 21 Oct 2020 • Antoine Perquin, Erica Cooper, Junichi Yamagishi
Thanks to this property, we show that grapheme embeddings learned by Tacotron models can be useful for tasks such as grapheme-to-phoneme conversion and control of the pronunciation in synthetic speech.
no code implementations • 19 Oct 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Explicit duration modeling is a key to achieving robust and efficient alignment in text-to-speech synthesis (TTS).
no code implementations • 8 Oct 2020 • Hieu-Thi Luong, Junichi Yamagishi
As the recently proposed voice cloning system, NAUTILUS, is capable of cloning unseen voices using untranscribed speech, we investigate the feasibility of using it to develop a unified cross-lingual TTS/VC system.
no code implementations • EMNLP (NLP+CSS) 2020 • Saurabh Gupta, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Recent advancements in natural language generation has raised serious concerns.
no code implementations • 12 Jul 2020 • Tomi Kinnunen, Héctor Delgado, Nicholas Evans, Kong Aik Lee, Ville Vestman, Andreas Nautsch, Massimiliano Todisco, Xin Wang, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds
Recent years have seen growing efforts to develop spoofing countermeasures (CMs) to protect automatic speaker verification (ASV) systems from being deceived by manipulated or artificial inputs.
no code implementations • 15 Jun 2020 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen, Sébastien Marcel
In this work, we demonstrated that wolf (generic) faces, which we call "master faces," can also compromise face recognition systems and that the master face concept can be generalized in some cases.
no code implementations • 22 May 2020 • Hieu-Thi Luong, Junichi Yamagishi
By using a multi-speaker speech corpus to train all requisite encoders and decoders in the initial training stage, our system can clone unseen voices using untranscribed speech of target speakers on the basis of the backpropagation algorithm.
no code implementations • 20 May 2020 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Our experiments suggest that a) a neural sequence-to-sequence TTS system should have a sufficient number of model parameters to produce high quality speech, b) it should also use a powerful encoder when it takes characters as inputs, and c) the encoder still has a room for improvement and needs to have an improved architecture to learn supra-segmental features more appropriately.
2 code implementations • 19 May 2020 • Andreas Nautsch, Jose Patino, Natalia Tomashenko, Junichi Yamagishi, Paul-Gauthier Noe, Jean-Francois Bonastre, Massimiliano Todisco, Nicholas Evans
Mounting privacy legislation calls for the preservation of privacy in speech technology, though solutions are gravely lacking.
Cryptography and Security Audio and Speech Processing
no code implementations • 18 May 2020 • Brij Mohan Lal Srivastava, Natalia Tomashenko, Xin Wang, Emmanuel Vincent, Junichi Yamagishi, Mohamed Maouche, Aurélien Bellet, Marc Tommasi
The recently proposed x-vector based anonymization scheme converts any input voice into that of a random pseudo-speaker.
1 code implementation • 4 May 2020 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Junichi Yamagishi
This is followed by an analysis on synthesis quality, speaker and dialect similarity, and a remark on the effectiveness of our speaker augmentation approach.
3 code implementations • 4 May 2020 • Natalia Tomashenko, Brij Mohan Lal Srivastava, Xin Wang, Emmanuel Vincent, Andreas Nautsch, Junichi Yamagishi, Nicholas Evans, Jose Patino, Jean-François Bonastre, Paul-Gauthier Noé, Massimiliano Todisco
The VoicePrivacy initiative aims to promote the development of privacy preservation tools for speech technology by gathering a new community to define the tasks of interest and the evaluation methodology, and benchmarking solutions through a series of challenges.
no code implementations • 8 Apr 2020 • Haoyu Li, Junichi Yamagishi
In recent years, speech enhancement (SE) has achieved impressive progress with the success of deep neural networks (DNNs).
Audio and Speech Processing
1 code implementation • Interspeech 2020 • Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi
The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments.
Audio and Speech Processing Sound
1 code implementation • 6 Feb 2020 • Anssi Kanervisto, Ville Hautamäki, Tomi Kinnunen, Junichi Yamagishi
The spoofing countermeasure (CM) systems in automatic speaker verification (ASV) are not typically used in isolation of each other.
no code implementations • 11 Dec 2019 • Huy H. Nguyen, Minoru Kuribayashi, Junichi Yamagishi, Isao Echizen
Deep neural networks (DNNs) have achieved excellent performance on several tasks and have been widely applied in both academia and industry.
1 code implementation • 10 Nov 2019 • Seyyed Saeed Sarfjoo, Xin Wang, Gustav Eje Henter, Jaime Lorenzo-Trueba, Shinji Takaki, Junichi Yamagishi
Nowadays vast amounts of speech data are recorded from low-quality recorder devices such as smartphones, tablets, laptops, and medium-quality microphones.
Sound Audio and Speech Processing
no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.
no code implementations • 2 Nov 2019 • Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
The rapid development of deep learning techniques has created new challenges in identifying the origin of digital images because generative adversarial networks and variational autoencoders can create plausible digital images whose contents are not present in natural scenes.
no code implementations • 2 Nov 2019 • Rong Huang, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
We experimentally demonstrated the existence of individual adversarial perturbations (IAPs) and universal adversarial perturbations (UAPs) that can lead a well-performed FFM to misbehave.
no code implementations • 28 Oct 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi
Sequence-to-sequence text-to-speech (TTS) is dominated by soft-attention-based methods.
2 code implementations • 28 Oct 2019 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
In this paper, we introduce a capsule network that can detect various kinds of attacks, from presentation attacks using printed images and replayed videos to attacks using fake videos created using deep learning.
no code implementations • 27 Oct 2019 • Yi Zhao, Xin Wang, Lauri Juvela, Junichi Yamagishi
Recent neural waveform synthesizers such as WaveNet, WaveGlow, and the neural-source-filter (NSF) model have shown good performance in speech synthesis despite their different methods of waveform generation.
3 code implementations • 23 Oct 2019 • Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi
While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.
Audio and Speech Processing
no code implementations • 14 Sep 2019 • Hieu-Thi Luong, Junichi Yamagishi
Voice conversion (VC) and text-to-speech (TTS) are two tasks that share a similar objective, generating speech with a target voice.
no code implementations • 30 Aug 2019 • Yusuke Yasuda, Xin Wang, Junichi Yamagishi
The advantages of our approach are that we can simplify many modules for the soft attention and that we can train the end-to-end TTS model using a single likelihood function.
no code implementations • 22 Jul 2019 • David Ifeoluwa Adelani, Haotian Mai, Fuming Fang, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Advanced neural language models (NLMs) are widely used in sequence generation tasks because they are able to produce fluent and meaningful sentences.
no code implementations • 18 Jun 2019 • Hieu-Thi Luong, Junichi Yamagishi
In this study, we propose a novel speech synthesis model, which can be adapted to unseen speakers by fine-tuning part of or all of the network using either transcribed or untranscribed speech.
1 code implementation • 17 Jun 2019 • Huy H. Nguyen, Fuming Fang, Junichi Yamagishi, Isao Echizen
The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance.
no code implementations • 30 May 2019 • Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen, Massimiliano Todisco, Nicholas Evans, Jean-Francois Bonastre
One solution to mitigate these concerns involves the concealing of speaker identities before the sharing of speech data.
no code implementations • 27 Apr 2019 • Xin Wang, Shinji Takaki, Junichi Yamagishi
Other models such as Parallel WaveNet and ClariNet bring together the benefits of AR and IAF-based models and train an IAF model by transferring the knowledge from a pre-trained AR teacher to an IAF student without any sequential transformation.
6 code implementations • 17 Apr 2019 • Chen-Chou Lo, Szu-Wei Fu, Wen-Chin Huang, Xin Wang, Junichi Yamagishi, Yu Tsao, Hsin-Min Wang
In this paper, we propose deep learning-based assessment models to predict human ratings of converted speech.
1 code implementation • 8 Apr 2019 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech.
no code implementations • 1 Apr 2019 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
When the available data of a target speaker is insufficient to train a high quality speaker-dependent neural text-to-speech (TTS) system, we can combine data from multiple speakers and train a multi-speaker TTS model instead.
no code implementations • 29 Mar 2019 • Shinji Takaki, Hirokazu Kameoka, Junichi Yamagishi
Recently, we proposed short-time Fourier transform (STFT)-based loss functions for training a neural speech waveform model.
no code implementations • 29 Mar 2019 • Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi
We propose using an extended model architecture of Tacotron, that is a multi-source sequence-to-sequence model with a dual attention mechanism as the shared model for both the TTS and VC tasks.
no code implementations • 4 Jan 2019 • Md Sahidullah, Hector Delgado, Massimiliano Todisco, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Kong-Aik Lee
Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV).
no code implementations • PACLIC 2018 • Hoang-Quoc Nguyen-Son, Ngoc-Dung T. Tieu, Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
We have developed a method for extracting the coherence features from a paragraph by matching similar words in its sentences.
1 code implementation • 31 Oct 2018 • Cheng-I Lai, Alberto Abad, Korin Richmond, Junichi Yamagishi, Najim Dehak, Simon King
In this work, we propose our replay attacks detection system - Attentive Filtering Network, which is composed of an attention-based filtering mechanism that enhances feature representations in both the frequency and time domains, and a ResNet-based classifier.
no code implementations • 30 Oct 2018 • Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, Paavo Alku
The state-of-the-art in text-to-speech synthesis has recently improved considerably due to novel neural waveform generation methods, such as WaveNet.
1 code implementation • 29 Oct 2018 • Yusuke Yasuda, Xin Wang, Shinji Takaki, Junichi Yamagishi
Towards end-to-end Japanese speech synthesis, we extend Tacotron to systems with self-attention to capture long-term dependencies related to pitch accents and compare their audio quality with classical pipeline systems under various conditions to show their pros and cons.
no code implementations • 29 Oct 2018 • Fuming Fang, Xin Wang, Junichi Yamagishi, Isao Echizen
Transforming the facial and acoustic features together makes it possible for the converted voice and facial expressions to be highly correlated and for the generated target speaker to appear and sound natural.
1 code implementation • 29 Oct 2018 • Shinji Takaki, Toru Nakashika, Xin Wang, Junichi Yamagishi
This paper proposes a new loss using short-time Fourier transform (STFT) spectra for the aim of training a high-performance neural speech waveform model that predicts raw continuous speech waveform samples directly.
no code implementations • 29 Oct 2018 • Xin Wang, Shinji Takaki, Junichi Yamagishi
Neural waveform models such as the WaveNet are used in many recent text-to-speech systems, but the original WaveNet is quite slow in waveform generation because of its autoregressive (AR) structure.
3 code implementations • 26 Oct 2018 • Huy H. Nguyen, Junichi Yamagishi, Isao Echizen
Recent advances in media generation techniques have made it easier for attackers to create forged images and videos.
6 code implementations • 4 Sep 2018 • Darius Afchar, Vincent Nozick, Junichi Yamagishi, Isao Echizen
This paper presents a method to automatically and efficiently detect face tampering in videos, and particularly focuses on two recent techniques used to generate hyper-realistic forged videos: Deepfake and Face2Face.
no code implementations • 20 Aug 2018 • Hieu-Thi Luong, Junichi Yamagishi
Two new training schemes for the new architecture are also proposed in this paper.
no code implementations • 2 Aug 2018 • Hieu-Thi Luong, Xin Wang, Junichi Yamagishi, Nobuyuki Nishizawa
We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder.
no code implementations • 31 Jul 2018 • Yi Zhao, Shinji Takaki, Hieu-Thi Luong, Junichi Yamagishi, Daisuke Saito, Nobuaki Minematsu
In order to reduce the mismatched characteristics between natural and generated acoustic features, we propose frameworks that incorporate either a conditional generative adversarial network (GAN) or its variant, Wasserstein GAN with gradient penalty (WGAN-GP), into multi-speaker speech synthesis that uses the WaveNet vocoder.
no code implementations • 31 Jul 2018 • Hieu-Thi Luong, Junichi Yamagishi
Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches.
no code implementations • 30 Jul 2018 • Gustav Eje Henter, Jaime Lorenzo-Trueba, Xin Wang, Junichi Yamagishi
Generating versatile and appropriate synthetic speech requires control over the output expression separate from the spoken text.
no code implementations • 25 Apr 2018 • Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i. e., generating speech waveforms from acoustic features.
no code implementations • 25 Apr 2018 • Tomi Kinnunen, Kong Aik Lee, Hector Delgado, Nicholas Evans, Massimiliano Todisco, Md Sahidullah, Junichi Yamagishi, Douglas A. Reynolds
The two challenge editions in 2015 and 2017 involved the assessment of spoofing countermeasures (CMs) in isolation from ASV using an equal error rate (EER) metric.
no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling
As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.
no code implementations • 12 Apr 2018 • Huy H. Nguyen, Ngoc-Dung T. Tieu, Hoang-Quoc Nguyen-Son, Junichi Yamagishi, Isao Echizen
Making computer-generated (CG) images more difficult to detect is an interesting problem in computer graphics and security.
no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.
no code implementations • 7 Apr 2018 • Xin Wang, Jaime Lorenzo-Trueba, Shinji Takaki, Lauri Juvela, Junichi Yamagishi
Recent advances in speech synthesis suggest that limitations such as the lossy nature of the amplitude spectrum with minimum phase approximation and the over-smoothing effect in acoustic modeling can be overcome by using advanced machine learning approaches.
1 code implementation • 3 Apr 2018 • Lauri Juvela, Bajibabu Bollepalli, Xin Wang, Hirokazu Kameoka, Manu Airaksinen, Junichi Yamagishi, Paavo Alku
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis.
no code implementations • 2 Apr 2018 • Fuming Fang, Junichi Yamagishi, Isao Echizen, Jaime Lorenzo-Trueba
Although voice conversion (VC) algorithms have achieved remarkable success along with the development of machine learning, superior performance is still difficult to achieve when using nonparallel data.
no code implementations • 27 Mar 2018 • Toru Nakashika, Shinji Takaki, Junichi Yamagishi
In contrast, the proposed feature extractor using the CRBM directly encodes the complex spectra (or another complex-valued representation of the complex spectra) into binary-valued latent features (hidden units).
no code implementations • 2 Mar 2018 • Jaime Lorenzo-Trueba, Fuming Fang, Xin Wang, Isao Echizen, Junichi Yamagishi, Tomi Kinnunen
Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database.
no code implementations • COLING 2016 • Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Ascension Gallardo-Antolin, Junichi Yamagishi, Juan M. Montero
This paper introduces a continuous system capable of automatically producing the most adequate speaking style to synthesize a desired target text.
no code implementations • 17 Jun 2015 • Zhenzhou Wu, Shinji Takaki, Junichi Yamagishi
This paper proposes a deep denoising auto-encoder technique to extract better acoustic features for speech synthesis.