Search Results for author: Daisuke Niizumi

Found 17 papers, 13 papers with code

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval

no code implementations16 Mar 2024 Shunsuke Tsubaki, Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Keisuke Imoto

Experimental results show that method (i) improves the audio-text retrieval performance by selecting the nearest image that aligns with the audio information and transferring the learned knowledge.

Image Retrieval Retrieval +2

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

1 code implementation23 Aug 2023 Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips.

Audio captioning Disentanglement

First-shot anomaly sound detection for machine condition monitoring: A domain generalization baseline

1 code implementation1 Mar 2023 Noboru Harada, Daisuke Niizumi, Yasunori Ohishi, Daiki Takeuchi, Masahiro Yasuda

This paper provides a baseline system for First-shot-compliant unsupervised anomaly detection (ASD) for machine condition monitoring.

Domain Generalization Task 2 +1

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

1 code implementation26 Oct 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

 Ranked #1 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Audio Tagging +5

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations25 Jul 2022 Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

1 code implementation20 Jul 2022 Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space.

Retrieval

Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

2 code implementations13 Jun 2022 Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei Kawaguchi

We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''.

domain classification Domain Generalization +1

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

1 code implementation17 May 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This approach improves the utility of frequency and channel information in downstream processes, and combines the effectiveness of middle and late layer features for different tasks.

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

1 code implementation26 Apr 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this paper, we seek to learn audio representations from the input itself as supervision using a pretext task of auto-encoding of masked spectrogram patches, Masked Spectrogram Modeling (MSM, a variant of Masked Image Modeling applied to audio spectrogram).

Contrastive Learning Self-Supervised Learning

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

1 code implementation15 Apr 2022 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound.

Self-Supervised Learning

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

2 code implementations11 Mar 2021 Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning

Acoustic Scene Classification: A Competition Review

no code implementations2 Aug 2018 Shayan Gharib, Honain Derrar, Daisuke Niizumi, Tuukka Senttula, Janne Tommola, Toni Heittola, Tuomas Virtanen, Heikki Huttunen

In this paper we study the problem of acoustic scene classification, i. e., categorization of audio sequences into mutually exclusive classes based on their spectral content.

Acoustic Scene Classification Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.