Search Results for author: Kunio Kashino

Found 22 papers, 13 papers with code

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

2 code implementations • 9 Apr 2024 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This study proposes Masked Modeling Duo (M2D), an improved masked prediction SSL, which learns by predicting representations of masked input signals that serve as training signals.

Denoising Self-Supervised Learning

152

Paper
Code

Deep Attentive Time Warping

1 code implementation • 13 Sep 2023 • Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida

Unlike other learnable models using DTW for warping, our model predicts all local correspondences between two time series and is trained based on metric learning, which enables it to learn the optimal data-dependent warping for the target task.

Dynamic Time Warping Metric Learning +2

Paper
Code

Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement

1 code implementation • 23 Aug 2023 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

We proposed Audio Difference Captioning (ADC) as a new extension task of audio captioning for describing the semantic differences between input pairs of similar but slightly different audio clips.

Audio captioning Disentanglement

Paper
Code

Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation

1 code implementation • 23 May 2023 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Self-supervised learning general-purpose audio representations have demonstrated high performance in a variety of tasks.

Denoising Knowledge Distillation +1

Paper
Code

Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input

1 code implementation • 26 Oct 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

We propose a new method, Masked Modeling Duo (M2D), that learns representations directly while obtaining training signals using only masked patches.

Ranked #1 on Speaker Identification on VoxCeleb1 (using extra training data)

Audio Classification Audio Tagging +5

Paper
Code

Reflectance-Guided, Contrast-Accumulated Histogram Equalization

1 code implementation • 14 Sep 2022 • Xiaomeng Wu, Takahito Kawanishi, Kunio Kashino

Existing image enhancement methods fall short of expectations because with them it is difficult to improve global and local image contrast simultaneously.

Density Estimation Image Enhancement

Paper
Code

Reflectance-Oriented Probabilistic Equalization for Image Enhancement

1 code implementation • 14 Sep 2022 • Xiaomeng Wu, Yongqing Sun, Akisato Kimura, Kunio Kashino

Despite recent advances in image enhancement, it remains difficult for existing approaches to adaptively improve the brightness and contrast for both low-light and normal-light images.

Density Estimation Image Enhancement

Paper
Code

ConceptBeam: Concept Driven Target Speech Extraction

no code implementations • 25 Jul 2022 • Yasunori Ohishi, Marc Delcroix, Tsubasa Ochiai, Shoko Araki, Daiki Takeuchi, Daisuke Niizumi, Akisato Kimura, Noboru Harada, Kunio Kashino

We use it to bridge modality-dependent information, i. e., the speech segments in the mixture, and the specified, modality-independent concept.

Metric Learning Speech Extraction

Paper
Add Code

Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval

1 code implementation • 20 Jul 2022 • Daiki Takeuchi, Yasunori Ohishi, Daisuke Niizumi, Noboru Harada, Kunio Kashino

While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space.

Retrieval

Paper
Code

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

1 code implementation • 17 May 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This approach improves the utility of frequency and channel information in downstream processes, and combines the effectiveness of middle and late layer features for different tasks.

Paper
Code

Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation

1 code implementation • 26 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this paper, we seek to learn audio representations from the input itself as supervision using a pretext task of auto-encoding of masked spectrogram patches, Masked Spectrogram Modeling (MSM, a variant of Masked Image Modeling applied to audio spectrogram).

Contrastive Learning Self-Supervised Learning

Paper
Code

BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations

1 code implementation • 15 Apr 2022 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

In this study, we hypothesize that representations effective for general audio tasks should provide multiple aspects of robust features of the input sound.

Self-Supervised Learning

199

Paper
Code

Attention to Warp: Deep Metric Learning for Multivariate Time Series

1 code implementation • 28 Mar 2021 • Shinnosuke Matsuo, Xiaomeng Wu, Gantugs Atarsaikhan, Akisato Kimura, Kunio Kashino, Brian Kenji Iwana, Seiichi Uchida

This approach adapts a parameterized attention model to time warping for greater and more adaptive temporal invariance.

Dynamic Time Warping Metric Learning +3

Paper
Code

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

2 code implementations • 11 Mar 2021 • Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Inspired by the recent progress in self-supervised learning for computer vision that generates supervision using data augmentations, we explore a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning

199

Paper
Code

Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning

no code implementations • 24 Sep 2020 • Daiki Takeuchi, Yuma Koizumi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

The system we used for Task 6 (Automated Audio Captioning)of the Detection and Classification of Acoustic Scenes and Events(DCASE) 2020 Challenge combines three elements, namely, dataaugmentation, multi-task learning, and post-processing, for audiocaptioning.

Audio captioning Data Augmentation +1

Paper
Add Code

The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation

no code implementations • 1 Jul 2020 • Yuma Koizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

This technical report describes the system participating to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2020 Challenge, Task 6: automated audio captioning.

Ranked #4 on Audio captioning on Clotho

Audio captioning Caption Generation +2

Paper
Add Code

Generative Adversarial Image Synthesis with Decision Tree Latent Controller

no code implementations • CVPR 2018 • Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino

This paper proposes the decision tree latent controller generative adversarial network (DTLC-GAN), an extension of a GAN that can learn hierarchically interpretable representations without relying on detailed supervision.

Generative Adversarial Network Image Generation +3

Paper
Add Code

Knowledge Discovery from Layered Neural Networks based on Non-negative Task Decomposition

no code implementations • 18 May 2018 • Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino

Interpretability has become an important issue in the machine learning field, along with the success of layered neural networks in various practical tasks.

Paper
Add Code

Understanding Community Structure in Layered Neural Networks

no code implementations • 13 Apr 2018 • Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino

We show experimentally that our proposed method can reveal the role of each part of a layered neural network by applying the neural networks to three types of data sets, extracting communities from the trained network, and applying the proposed method to the community structure.

Paper
Add Code

Generative Attribute Controller With Conditional Filtered Generative Adversarial Networks

no code implementations • CVPR 2017 • Takuhiro Kaneko, Kaoru Hiramatsu, Kunio Kashino

This controller is based on a novel generative model called the conditional filtered generative adversarial network (CFGAN), which is an extension of the conventional conditional GAN (CGAN) that incorporates a filtering architecture into the generator input.

Attribute Generative Adversarial Network +2

Paper
Add Code

Modular Representation of Layered Neural Networks

no code implementations • 1 Mar 2017 • Chihiro Watanabe, Kaoru Hiramatsu, Kunio Kashino

And (3) data analysis: in practical data it reveals the community structure in the input, hidden, and output layers, which serves as a clue for discovering knowledge from a trained neural network.

speech-recognition Speech Recognition

Paper
Add Code

Adaptive Dither Voting for Robust Spatial Verification

no code implementations • ICCV 2015 • Xiaomeng Wu, Kunio Kashino

Hough voting in a geometric transformation space allows us to realize spatial verification, but remains sensitive to feature detection errors because of the inflexible quantization of single feature correspondences.

Quantization Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.