no code implementations • 20 Feb 2023 • Don Kurian Dennis, Abhishek Shetty, Anish Sevekari, Kazuhito Koishida, Virginia Smith
We study the problem of progressive distillation: Given a large, pre-trained teacher model $g$, we seek to decompose the model into an ensemble of smaller, low-inference cost student models $f_i$.
no code implementations • 26 Oct 2022 • Vasily Zadorozhnyy, Qiang Ye, Kazuhito Koishida
In recent years, Generative Adversarial Networks (GANs) have produced significantly improved results in speech enhancement (SE) tasks.
Ranked #1 on Speech Enhancement on VoiceBank + DEMAND
no code implementations • 21 Dec 2021 • Melikasadat Emami, Dung Tran, Kazuhito Koishida
Improving generalization is a major challenge in audio classification due to labeled data scarcity.
no code implementations • 9 Dec 2021 • Bahareh Tolooshams, Kazuhito Koishida
Deep learning-based speech enhancement has shown unprecedented performance in recent years.
no code implementations • 8 Dec 2021 • Trung Dang, Dung Tran, Peter Chin, Kazuhito Koishida
Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker characteristic of an utterance to match an unseen target speaker without relying on parallel training data.
2 code implementations • 6 Jan 2021 • Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
In this version of the challenge organized at INTERSPEECH 2021, we are expanding both our training and test datasets to accommodate full band scenarios.
no code implementations • ICML 2020 • Saeed Amizadeh, Hamid Palangi, Oleksandr Polozov, Yichen Huang, Kazuhito Koishida
To address this, we propose (1) a framework to isolate and evaluate the reasoning aspect of VQA separately from its perception, and (2) a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.
1 code implementation • CVPR 2020 • Hamid Reza Vaezi Joze, Amirreza Shaban, Michael L. Iuzzolino, Kazuhito Koishida
In late fusion, each modality is processed in a separate unimodal Convolutional Neural Network (CNN) stream and the scores of each modality are fused at the end.
Ranked #2 on Hand Gesture Recognition on NVGesture
no code implementations • 4 Aug 2019 • Wei Xia, Kazuhito Koishida
In this study, we introduce a convolutional time-frequency-channel "Squeeze and Excitation" (tfc-SE) module to explicitly model inter-dependencies between the time-frequency domain and multiple channels.