Search Results for author: Tatsuya Komatsu

Found 16 papers, 1 papers with code

Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

no code implementations8 Apr 2019 Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

This paper proposes a determined blind source separation method using Bayesian non-parametric modelling of sources.

blind source separation

Differentially Private Variational Autoencoders with Term-wise Gradient Aggregation

no code implementations19 Jun 2020 Tsubasa Takahashi, Shun Takagi, Hajime Ono, Tatsuya Komatsu

This paper studies how to learn variational autoencoders with a variety of divergences under differential privacy constraints.

Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions

no code implementations6 Apr 2021 Jumon Nozaki, Tatsuya Komatsu

This paper proposes a method to relax the conditional independence assumption of connectionist temporal classification (CTC)-based automatic speech recognition (ASR) models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers

no code implementations21 Apr 2021 Yusuke Kida, Tatsuya Komatsu, Masahito Togami

The speech-to-text alignment is a problem of splitting long audio recordings with un-aligned transcripts into utterance-wise pairs of speech and text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

no code implementations17 Feb 2022 Tatsuya Komatsu

The proposed method realizes non-autoregressive ASR with fewer parameters by folding the conventional stack of encoders into only two blocks; base encoders and folded encoders.

Acoustic Event Detection with Classifier Chains

no code implementations17 Feb 2022 Tatsuya Komatsu, Shinji Watanabe, Koichi Miyazaki, Tomoki Hayashi

In each iteration, the event's activity is estimated and used to condition the next output based on the probabilistic chain rule to form classifier chains.

Event Detection

Better Intermediates Improve CTC Inference

no code implementations1 Apr 2022 Tatsuya Komatsu, Yusuke Fujita, Jaesong Lee, Lukas Lee, Shinji Watanabe, Yusuke Kida

This paper proposes a method for improved CTC inference with searched intermediates and multi-pass conditioning.

InterAug: Augmenting Noisy Intermediate Predictions for CTC-based ASR

no code implementations1 Apr 2022 Yu Nakagome, Tatsuya Komatsu, Yusuke Fujita, Shuta Ichimura, Yusuke Kida

The proposed method exploits the conditioning framework of self-conditioned CTC to train robust models by conditioning with "noisy" intermediate predictions.

speech-recognition Speech Recognition

Neural Diarization with Non-autoregressive Intermediate Attractors

1 code implementation13 Mar 2023 Yusuke Fujita, Tatsuya Komatsu, Robin Scheibler, Yusuke Kida, Tetsuji Ogawa

The experiments with the two-speaker CALLHOME dataset show that the intermediate labels with the proposed non-autoregressive intermediate attractors boost the diarization performance.

speaker-diarization Speaker Diarization

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions

no code implementations15 Sep 2023 Reo Shimizu, Ryuichi Yamamoto, Masaya Kawamura, Yuma Shirahata, Hironori Doi, Tatsuya Komatsu, Kentaro Tachibana

We propose PromptTTS++, a prompt-based text-to-speech (TTS) synthesis system that allows control over speaker identity using natural language descriptions.

Audio Difference Learning for Audio Captioning

no code implementations15 Sep 2023 Tatsuya Komatsu, Yusuke Fujita, Kazuya Takeda, Tomoki Toda

Furthermore, a unique technique is proposed that involves mixing the input audio with additional audio, and using the additional audio as a reference.

Audio captioning

Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers

no code implementations22 Jan 2024 Michael Hentschel, Yuta Nishikawa, Tatsuya Komatsu, Yusuke Fujita

This study presents a novel approach for knowledge distillation (KD) from a BERT teacher model to an automatic speech recognition (ASR) model using intermediate layers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Cannot find the paper you are looking for? You can Submit a new open access paper.