Search Results for author: Kazuyoshi Yoshii

Found 19 papers, 6 papers with code

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

no code implementations12 May 2021 Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii

To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores.

Drum Transcription

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

no code implementations8 Oct 2020 Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii

This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance.

Drum Transcription Language Modelling

The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

1 code implementation30 Sep 2020 Andrew McLeod, James Owers, Kazuyoshi Yoshii

To that end, MDTK includes a script that measures the distribution of different types of errors in a transcription, and creates a degraded dataset with similar properties.

Music Transcription

End-to-end Music-mixed Speech Recognition

1 code implementation27 Aug 2020 Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara

The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.

Audio and Speech Processing

MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

no code implementations8 Apr 2020 Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features.

Pose Estimation

Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network

1 code implementation12 Nov 2019 Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii

In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels.

Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior

1 code implementation IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.

Speech Enhancement

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

no code implementations29 Aug 2019 Yoshiaki Bando, Yoko SASAKI, Kazuyoshi Yoshii

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals.

Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions

no code implementations18 Aug 2019 Eita Nakamura, Kazuyoshi Yoshii

Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly using the sparse transition probabilities of notes or note patterns.

Language Modelling Music Transcription

Statistical Learning and Estimation of Piano Fingering

no code implementations23 Apr 2019 Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii

We find that the methods based on high-order HMMs outperform the other methods in terms of estimation accuracies.

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

no code implementations22 Mar 2019 Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF).

automatic-speech-recognition Speech Enhancement +1

Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

2 code implementations European Association for Signal Processing (EUSIPCO) 2019 Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain.

Speech Enhancement

A Deep Generative Model of Speech Complex Spectrograms

no code implementations8 Mar 2019 Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii

To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i. e., the group delay and the instantaneous frequency.

Statistical Piano Reduction Controlling Performance Difficulty

no code implementations15 Aug 2018 Eita Nakamura, Kazuyoshi Yoshii

We present a statistical-modelling method for piano reduction, i. e. converting an ensemble score into piano scores, that can control performance difficulty.

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

no code implementations31 Oct 2017 Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.

Speech Enhancement

Note Value Recognition for Piano Transcription Using Markov Random Fields

no code implementations23 Mar 2017 Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon

This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals.

Music Transcription

Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

no code implementations29 Jan 2017 Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama

In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music.

Cannot find the paper you are looking for? You can Submit a new open access paper.