Search Results for author: Kentaro Mitsui

Found 7 papers, 0 papers with code

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

no code implementations7 Aug 2020 Kentaro Mitsui, Tomoki Koriyama, Hiroshi Saruwatari

We propose a framework for multi-speaker speech synthesis using deep Gaussian processes (DGPs); a DGP is a deep architecture of Bayesian kernel regressions and thus robust to overfitting.

Gaussian Processes Speech Synthesis +1

MSR-NV: Neural Vocoder Using Multiple Sampling Rates

no code implementations28 Sep 2021 Kentaro Mitsui, Kei Sawada

In this study, we propose a method to handle multiple sampling rates in a single NV, called the MSR-NV.

Text-Guided Scene Sketch-to-Photo Synthesis

no code implementations14 Feb 2023 AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio Okura

To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synthesis without the need for reference images.

Self-Supervised Learning

UniFLG: Unified Facial Landmark Generator from Text or Speech

no code implementations28 Feb 2023 Kentaro Mitsui, Yukiya Hono, Kei Sawada

The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech.

Speech Synthesis Talking Face Generation

Towards human-like spoken dialogue generation between AI agents from written dialogue

no code implementations2 Oct 2023 Kentaro Mitsui, Yukiya Hono, Kei Sawada

The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents.

Dialogue Generation

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

no code implementations6 Dec 2023 Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Advances in machine learning have made it possible to perform various text and speech processing tasks, including automatic speech recognition (ASR), in an end-to-end (E2E) manner.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Cannot find the paper you are looking for? You can Submit a new open access paper.