Search Results for author: Kazuhito Koishida

Found 9 papers, 2 papers with code

Progressive Knowledge Distillation: Building Ensembles for Efficient Inference

no code implementations20 Feb 2023 Don Kurian Dennis, Abhishek Shetty, Anish Sevekari, Kazuhito Koishida, Virginia Smith

We study the problem of progressive distillation: Given a large, pre-trained teacher model $g$, we seek to decompose the model into an ensemble of smaller, low-inference cost student models $f_i$.

Knowledge Distillation

Training Robust Zero-Shot Voice Conversion Models with Self-supervised Features

no code implementations8 Dec 2021 Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida

Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.

Self-Supervised Learning Voice Conversion

Interspeech 2021 Deep Noise Suppression Challenge

2 code implementations6 Jan 2021 Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.


Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"

no code implementations ICML 2020 Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida

To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.

Graph Generation Question Answering +5

MMTM: Multimodal Transfer Module for CNN Fusion

1 code implementation CVPR 2020 Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida

In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end.

Action Recognition In Videos Hand Gesture Recognition +3

Sound Event Detection in Multichannel Audio using Convolutional Time-Frequency-Channel Squeeze and Excitation

no code implementations4 Aug 2019 Wei Xia, Kazuhito Koishida

In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.

Event Detection Sound Event Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.