Search Results for author: Kazuyoshi Yoshii

Found 26 papers, 7 papers with code

Paper
Add Code

Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

no code implementations • 17 Jun 2023 • Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii

Our neural separation model introduced for AVI alternately performs neural network blocks and single steps of an efficient iterative algorithm called iterative source steering.

blind source separation Variational Inference

Paper
Add Code

Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions

no code implementations • 8 May 2023 • Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii

We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field.

Novel View Synthesis speech-recognition +1

Paper
Add Code

DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

no code implementations • 22 Jul 2022 • Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

Our DNN-free system leverages the posteriors of the latest source spectrograms given by block-online FastMNMF to derive the current source covariance matrices for frame-online beamforming.

blind source separation Speech Enhancement

Paper
Add Code

Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

1 code implementation • 15 Jul 2022 • Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e. g., cocktail party).

blind source separation Speech Enhancement

179

Paper
Code

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

no code implementations • 15 Jul 2022 • Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments.

Ranked #1 on Speech Enhancement on EasyCom (SDR metric)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Generalized Fast Multichannel Nonnegative Matrix Factorization Based on Gaussian Scale Mixtures for Blind Source Separation

no code implementations • 11 May 2022 • Mathieu Fontaine, Kouhei Sekiguchi, Aditya Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

This paper describes heavy-tailed extensions of a state-of-the-art versatile blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) from a unified point of view.

blind source separation Speech Enhancement

Paper
Add Code

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

no code implementations • 12 May 2021 • Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii

To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the estimated scores, we propose a regularized training method that uses a global structure-aware masked language (score) model with a self-attention mechanism pretrained from an extensive collection of drum scores.

Drum Transcription

Paper
Add Code

Tatum-Level Drum Transcription Based on a Convolutional Recurrent Neural Network with Language Model-Based Regularized Training

no code implementations • 8 Oct 2020 • Ryoto Ishizuka, Ryo Nishikimi, Eita Nakamura, Kazuyoshi Yoshii

This paper describes a neural drum transcription method that detects from music signals the onset times of drums at the $\textit{tatum}$ level, where tatum times are assumed to be estimated in advance.

Drum Transcription Language Modelling

Paper
Add Code

The MIDI Degradation Toolkit: Symbolic Music Augmentation and Correction

1 code implementation • 30 Sep 2020 • Andrew McLeod, James Owers, Kazuyoshi Yoshii

To that end, MDTK includes a script that measures the distribution of different types of errors in a transcription, and creates a degraded dataset with similar properties.

Music Transcription

Paper
Code

End-to-end Music-mixed Speech Recognition

1 code implementation • 27 Aug 2020 • Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara

The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.

Audio and Speech Processing

Paper
Code

Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Latent Chord Labels and Features

1 code implementation • 14 May 2020 • Yiming Wu, Tristan Carsault, Eita Nakamura, Kazuyoshi Yoshii

In contrast, we propose a unified generative and discriminative approach in the framework of amortized variational inference.

General Classification Variational Inference

Paper
Code

MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation from Human Images

no code implementations • 8 Apr 2020 • Takayuki Nakatsuka, Kazuyoshi Yoshii, Yuki Koyama, Satoru Fukayama, Masataka Goto, Shigeo Morishima

Specifically, we formulate a hierarchical generative model of poses and images by integrating a deep generative model of poses from pose features with that of images from poses and image features.

2D Pose Estimation Pose Estimation

Paper
Add Code

Multi-Step Chord Sequence Prediction Based on Aggregated Multi-Scale Encoder-Decoder Network

1 code implementation • 12 Nov 2019 • Tristan Carsault, Andrew McLeod, Philippe Esling, Jérôme Nika, Eita Nakamura, Kazuyoshi Yoshii

In this paper, we postulate that this comes from the multi-scale structure of musical information and propose new architectures based on an iterative temporal aggregation of input labels.

Paper
Code

Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 • Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.

Speech Enhancement

179

Paper
Code

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model

no code implementations • 29 Aug 2019 • Yoshiaki Bando, Yoko SASAKI, Kazuyoshi Yoshii

This paper presents an unsupervised method that trains neural source separation by using only multichannel mixture signals.

Paper
Add Code

Musical Rhythm Transcription Based on Bayesian Piece-Specific Score Models Capturing Repetitions

no code implementations • 18 Aug 2019 • Eita Nakamura, Kazuyoshi Yoshii

Focusing on rhythm, we formulate several classes of Bayesian Markov models of musical scores that describe repetitions indirectly using the sparse transition probabilities of notes or note patterns.

Computational Efficiency Language Modelling +1

Paper
Add Code

Statistical Learning and Estimation of Piano Fingering

no code implementations • 23 Apr 2019 • Eita Nakamura, Yasuyuki Saito, Kazuyoshi Yoshii

We find that the methods based on high-order HMMs outperform the other methods in terms of estimation accuracies.

Paper
Add Code

Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

no code implementations • 22 Mar 2019 • Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we take an unsupervised approach that decomposes each TF bin into the sum of speech and noise by using multichannel nonnegative matrix factorization (MNMF).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

2 code implementations • European Association for Signal Processing (EUSIPCO) 2019 • Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain.

Speech Enhancement

179

Paper
Code

A Deep Generative Model of Speech Complex Spectrograms

no code implementations • 8 Mar 2019 • Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii

To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivatives, i. e., the group delay and the instantaneous frequency.

Paper
Add Code

Statistical Piano Reduction Controlling Performance Difficulty

no code implementations • 15 Aug 2018 • Eita Nakamura, Kazuyoshi Yoshii

We present a statistical-modelling method for piano reduction, i. e. converting an ensemble score into piano scores, that can control performance difficulty.

Paper
Add Code

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

no code implementations • 31 Oct 2017 • Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.

Speech Enhancement

Paper
Add Code

Generative Statistical Models with Self-Emergent Grammar of Chord Sequences

no code implementations • 7 Aug 2017 • Hiroaki Tsushima, Eita Nakamura, Katsutoshi Itoyama, Kazuyoshi Yoshii

Generative statistical models of chord sequences play crucial roles in music processing.

Paper
Add Code

Note Value Recognition for Piano Transcription Using Markov Random Fields

no code implementations • 23 Mar 2017 • Eita Nakamura, Kazuyoshi Yoshii, Simon Dixon

This paper presents a statistical method for use in music transcription that can estimate score times of note onsets and offsets from polyphonic MIDI performance signals.

Music Transcription

Paper
Add Code

Rhythm Transcription of Polyphonic Piano Music Based on Merged-Output HMM for Multiple Voices

no code implementations • 29 Jan 2017 • Eita Nakamura, Kazuyoshi Yoshii, Shigeki Sagayama

In a recent conference paper, we have reported a rhythm transcription method based on a merged-output hidden Markov model (HMM) that explicitly describes the multiple-voice structure of polyphonic music.

valid

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.