Search Results for author: Yukiya Hono

Found 12 papers, 2 papers with code

Release of Pre-Trained Models for the Japanese Language

no code implementations • 2 Apr 2024 • Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda

AI democratization aims to create a world in which the average person can utilize AI techniques.

Paper
Add Code

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model

no code implementations • 22 Feb 2024 • Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

This paper presents a neural vocoder based on a denoising diffusion probabilistic model (DDPM) incorporating explicit periodic signals as auxiliary conditioning signals.

Denoising Pitch control +1

Paper
Add Code

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

no code implementations • 6 Dec 2023 • Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Advances in machine learning have made it possible to perform various text and speech processing tasks, including automatic speech recognition (ASR), in an end-to-end (E2E) manner.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Towards human-like spoken dialogue generation between AI agents from written dialogue

no code implementations • 2 Oct 2023 • Kentaro Mitsui, Yukiya Hono, Kei Sawada

The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents.

Dialogue Generation

Paper
Add Code

UniFLG: Unified Facial Landmark Generator from Text or Speech

no code implementations • 28 Feb 2023 • Kentaro Mitsui, Yukiya Hono, Kei Sawada

The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech.

Speech Synthesis Talking Face Generation

Paper
Add Code

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

no code implementations • 5 Jan 2023 • Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

In the proposed system, the attention mechanism absorbs alignment errors in phoneme boundaries.

Singing Voice Synthesis

Paper
Add Code

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

no code implementations • 28 Dec 2022 • Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

This paper proposes a novel sequence-to-sequence (seq2seq) model with a musical note position-aware attention mechanism for singing voice synthesis (SVS).

Position Singing Voice Synthesis

Paper
Add Code

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

1 code implementation • 21 Nov 2022 • Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis.

Speech Synthesis

148

Paper
Code

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

no code implementations • 24 Jun 2022 • Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda

A style encoder that extracts a latent speaking style representation from speech is trained jointly with TTS.

Variational Inference

Paper
Add Code

Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System

1 code implementation • 5 Aug 2021 • Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

To better model a singing voice, the proposed system incorporates improved approaches to modeling pitch and vibrato and better training criteria into the acoustic model.

Singing Voice Synthesis

655

Paper
Code

PeriodNet: A non-autoregressive waveform generation model with a structure separating periodic and aperiodic components

no code implementations • 15 Feb 2021 • Yukiya Hono, Shinji Takaki, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

We also show that the speech waveforms with a pitch outside of the training data range can be generated with more naturalness.

Paper
Add Code

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

no code implementations • 17 Sep 2020 • Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

This framework consists of a multi-grained variational autoencoder, a conditional prior, and a multi-level auto-regressive latent converter to obtain the different time-resolution latent variables and sample the finer-level latent variables from the coarser-level ones by taking into account the input text.

Expressive Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.