Search Results for author: Yasunori Ohishi

Found 21 papers, 13 papers with code

Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

no code implementations • 12 Apr 2024 • Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

This is because the information obtained from a single sensor is often missing or fragmented in such an environment; observations from multiple locations and modalities should be integrated to analyze events comprehensively.

Paper
Add Code

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

2 code implementations • 9 Apr 2024 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals.

Denoising Self-Supervised Learning

152

Paper
Code

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

no code implementations • 16 Mar 2024 • Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Experimental results show that method (i) improves the audio-text retrieval performance by selecting the nearest image that aligns with the audio information and transferring the learned knowledge.

Image Retrieval Retrieval +2

Paper
Add Code

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

1 code implementation • 23 Aug 2023 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips.

Audio captioning Disentanglement

Paper
Code

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

1 code implementation • 23 May 2023 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks.

Denoising Knowledge Distillation +1

Paper
Code

First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

1 code implementation • 1 Mar 2023 • Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring.

Domain Generalization Task 2 +1

Paper
Code

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

1 code implementation • 26 Oct 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

Ranked #1 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Audio Tagging +5

Paper
Code

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations • 25 Jul 2022 • Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Paper
Add Code

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

1 code implementation • 20 Jul 2022 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space.

Retrieval

Paper
Code

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

1 code implementation • 17 May 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This approach improves the utility of frequency and channel information in downstream processes, and combines the effectiveness of middle and late layer features for different tasks.

Paper
Code

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

1 code implementation • 26 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this paper, we seek to learn audio representations from the input itself as supervision using a pretext task of auto-encoding of masked spectrogram patches, Masked Spectrogram Modeling (MSM, a variant of Masked Image Modeling applied to audio spectrogram).

Contrastive Learning Self-Supervised Learning

Paper
Code

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

1 code implementation • 15 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound.

Self-Supervised Learning

199

Paper
Code

SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning

no code implementations • 8 Apr 2022 • Marc Delcroix, Jorge Bennasar Vázquez, Tsubasa Ochiai, Keisuke Kinoshita, Yasunori Ohishi, Shoko Araki

We can achieve this with a neural network that extracts the target SEs by conditioning it on clues representing the target SE classes.

Target Sound Extraction

Paper
Add Code

Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion

1 code implementation • 18 Feb 2022 • Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito, Noboru Harada

We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels.

Event Detection Sensor Fusion

Paper
Code

Echo-aware Adaptation of Sound Event Localization and Detection in Unknown Environments

1 code implementation • 18 Feb 2022 • Masahiro Yasuda, Yasunori Ohishi, Shoichiro Saito

Our goal is to develop a sound event localization and detection (SELD) system that works robustly in unknown environments.

Domain Adaptation Sound Event Localization and Detection

Paper
Code

ToyADMOS2: Another dataset of miniature-machine operating sounds for anomalous sound detection under domain shift conditions

7 code implementations • 4 Jun 2021 • Noboru Harada, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Masahiro Yasuda, Shoichiro Saito

This paper proposes a new large-scale dataset called "ToyADMOS2" for anomaly detection in machine operating sounds (ADMOS).

Anomaly Detection

Paper
Code

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

2 code implementations • 11 Mar 2021 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning

199

Paper
Code

Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval

no code implementations • 14 Dec 2020 • Yuma Koizumi, Yasunori Ohishi, Daisuke Niizumi, Daiki Takeuchi, Masahiro Yasuda

Then, the caption of the audio input is generated by using a pre-trained language model while referring to the guidance captions.

Audio captioning Language Modelling +1

Paper
Add Code

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

no code implementations • 24 Sep 2020 • Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.

Audio captioning Data Augmentation +1

Paper
Add Code

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

no code implementations • 1 Jul 2020 • Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning.

Ranked #4 on Audio captioning on Clotho

Audio captioning Caption Generation +2

Paper
Add Code

Crossmodal Voice Conversion

no code implementations • 9 Apr 2019 • Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko

We use the latent code of an input face image encoded by the face encoder as the auxiliary input into the speech converter and train the speech converter so that the original latent code can be recovered from the generated speech by the voice encoder.

Voice Conversion

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.