Search Results for author: Kazuyoshi Yoshii

Found 26 papers, 7 papers with code

Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

no code implementations17 Jun 2023 Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii

Our neural separation model introduced for AVI alternately performs neural network blocks and single steps of an efficient iterative algorithm called iterative source steering.

blind source separation Variational Inference

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

no code implementations22 Jul 2022 Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

Our DNN-free system leverages the posteriors of the latest source spectrograms given by block-online FastMNMF to derive the current source covariance matrices for frame-online beamforming.

blind source separation Speech Enhancement

Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

1 code implementation15 Jul 2022 Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e. g., cocktail party).

blind source separation Speech Enhancement

Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation

no code implementations11 May 2022 Mathieu Fontaine, Kouhei Sekiguchi, Aditya Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view.

blind source separation Speech Enhancement

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

no code implementations12 May 2021 Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii

To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores.

Decoder Drum Transcription

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

no code implementations8 Oct 2020 Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii

This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance.

Drum Transcription Language Modelling

The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

1 code implementation30 Sep 2020 Andrew McLeod, James Owers, Kazuyoshi Yoshii

To that end, MDTK includes a script that measures the distribution of different types of errors in a transcription, and creates a degraded dataset with similar properties.

Music Transcription

End-to-end Music-mixed Speech Recognition

1 code implementation27 Aug 2020 Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara

The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.

Audio and Speech Processing

MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

no code implementations8 Apr 2020 Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features.

2D Pose Estimation Pose Estimation

Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network

1 code implementation12 Nov 2019 Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii

In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels.


Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior

1 code implementation IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.

Speech Enhancement

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

no code implementations29 Aug 2019 Yoshiaki Bando, Yoko SASAKI, Kazuyoshi Yoshii

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals.

Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions

no code implementations18 Aug 2019 Eita Nakamura, Kazuyoshi Yoshii

Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly using the sparse transition probabilities of notes or note patterns.

Computational Efficiency Language Modelling +1

Statistical Learning and Estimation of Piano Fingering

no code implementations23 Apr 2019 Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii

We find that the methods based on high-order HMMs outperform the other methods in terms of estimation accuracies.

Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

2 code implementations European Association for Signal Processing (EUSIPCO) 2019 Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain.

Speech Enhancement

A Deep Generative Model of Speech Complex Spectrograms

no code implementations8 Mar 2019 Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii

To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i. e., the group delay and the instantaneous frequency.


Statistical Piano Reduction Controlling Performance Difficulty

no code implementations15 Aug 2018 Eita Nakamura, Kazuyoshi Yoshii

We present a statistical-modelling method for piano reduction, i. e. converting an ensemble score into piano scores, that can control performance difficulty.

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

no code implementations31 Oct 2017 Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.

Speech Enhancement

Note Value Recognition for Piano Transcription Using Markov Random Fields

no code implementations23 Mar 2017 Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon

This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals.

Music Transcription

Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

no code implementations29 Jan 2017 Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama

In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music.


Cannot find the paper you are looking for? You can Submit a new open access paper.