Search Results for author: Noboru Harada

Found 33 papers, 20 papers with code

Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

no code implementations12 Apr 2024 Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze events comprehensively.

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

2 code implementations9 Apr 2024 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals.

Denoising Self-Supervised Learning

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

no code implementations16 Mar 2024 Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Experimental results show that method (i) improves the audio-text retrieval performance by selecting the nearest image that aligns with the audio information and transferring the learned knowledge.

Image Retrieval Retrieval +2

6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human

no code implementations4 Mar 2024 Masahiro Yasuda, Shoichiro Saito, Akira Nakayama, Noboru Harada

A system trained only with a dataset using microphone arrays in a fixed position would be unable to adapt to the fast relative motion of sound events associated with self-motion, resulting in the degradation of SELD performance.

Sound Event Localization and Detection

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

no code implementations13 Feb 2024 Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino

With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding.

Speech Enhancement

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

1 code implementation23 Aug 2023 Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips.

Audio captioning Disentanglement

Deep sound-field denoiser: optically-measured sound-field denoising using deep neural network

1 code implementation27 Apr 2023 Kenji Ishikawa, Daiki Takeuchi, Noboru Harada, Takehiro Moriya

We compared the method with conventional ones, such as image filters, a spatiotemporal filter, and other DNN architectures, on numerical and experimental data.

Denoising

First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

1 code implementation1 Mar 2023 Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring.

Domain Generalization Task 2 +1

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

1 code implementation26 Oct 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

 Ranked #1 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Audio Tagging +5

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations25 Jul 2022 Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

1 code implementation20 Jul 2022 Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space.

Retrieval

Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

2 code implementations13 Jun 2022 Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''.

domain classification Domain Generalization +1

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

1 code implementation17 May 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This approach improves the utility of frequency and channel information in downstream processes, and combines the effectiveness of middle and late layer features for different tasks.

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

1 code implementation26 Apr 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this paper, we seek to learn audio representations from the input itself as supervision using a pretext task of auto-encoding of masked spectrogram patches, Masked Spectrogram Modeling (MSM, a variant of Masked Image Modeling applied to audio spectrogram).

Contrastive Learning Self-Supervised Learning

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

1 code implementation15 Apr 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound.

Self-Supervised Learning

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

1 code implementation18 Feb 2022 Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada

We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels.

Event Detection Sensor Fusion

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

2 code implementations11 Mar 2021 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

no code implementations24 Sep 2020 Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.

Audio captioning Data Augmentation +1

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

no code implementations1 Jul 2020 Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning.

Audio captioning Caption Generation +2

Phase reconstruction based on recurrent phase unwrapping with deep neural networks

no code implementations14 Feb 2020 Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

In the proposed method, DNNs estimate phase derivatives instead of phase itself, which allows us to avoid the sensitivity problem.

Audio Synthesis

Invertible DNN-based nonlinear time-frequency transform for speech enhancement

1 code implementation25 Nov 2019 Daiki Takeuchi, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand.

Audio and Speech Processing Sound

ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection

2 code implementations9 Aug 2019 Yuma Koizumi, Shoichiro Saito, Hisashi Uematsu, Noboru Harada, Keisuke Imoto

To build a large-scale dataset for ADMOS, we collected anomalous operating sounds of miniature machines (toys) by deliberately damaging them.

Anomaly Detection

Deep Griffin-Lim Iteration

no code implementations10 Mar 2019 Yoshiki Masuyama, Kohei Yatabe, Yuma Koizumi, Yasuhiro Oikawa, Noboru Harada

This paper presents a novel phase reconstruction method (only from a given amplitude spectrogram) by combining a signal-processing-based approach and a deep neural network (DNN).

AdaFlow: Domain-Adaptive Density Estimator with Application to Anomaly Detection and Unpaired Cross-Domain Translation

no code implementations14 Dec 2018 Masataka Yamaguchi, Yuma Koizumi, Noboru Harada

To address this difficulty, we propose AdaFlow, a new DNN-based density estimator that can be easily adapted to the change of the distribution.

Density Estimation Translation +1

Trainable Adaptive Window Switching for Speech Enhancement

no code implementations5 Nov 2018 Yuma Koizumi, Noboru Harada, Yoichi Haneda

To overcome this problem, we incorporate AWS into the speech enhancement procedure, and the windowing function of each time-frame is manipulated using a DNN depending on the input signal.

Speech Enhancement

Unsupervised Detection of Anomalous Sound based on Deep Learning and the Neyman-Pearson Lemma

1 code implementation22 Oct 2018 Yuma Koizumi, Shoichiro Saito, Hisashi Uematsum Yuta Kawachi, Noboru Harada

To calculate the TPR in the objective function, we consider that the set of anomalous sounds is the complementary set of normal sounds and simulate anomalous sounds by using a rejection sampling algorithm.

LEMMA Unsupervised Anomaly Detection +1

Cannot find the paper you are looking for? You can Submit a new open access paper.