no code implementations • 6 Sep 2023 • Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser
Finally, content information extracted from the source speech and content-dependent target style embeddings are fed into a diffusion-based decoder to generate the converted speech mel-spectrogram.
no code implementations • 6 Sep 2023 • Arvind Krishna Sridhar, Yinyi Guo, Erik Visser, Rehana Mahfuz
Then, we propose a parameter efficient inference time faithful decoding algorithm that enables smaller audio captioning models with performance equivalent to larger models trained with more data.
no code implementations • 6 Dec 2022 • Arvind Krishna Sridhar, Erik Visser
In this paper, we investigate the use of the Natural Language Inference (NLI) entailment metric to detect and prevent hallucinations in summary generation.
no code implementations • 29 Oct 2022 • Mine Kerpicci, Van Nguyen, Shuhua Zhang, Erik Visser
Model architectures such as wav2vec 2. 0 and HuBERT have been proposed to learn speech representations from audio waveforms in a self-supervised manner.
no code implementations • 9 Sep 2022 • Ravi Choudhary, Arvind Krishna Sridhar, Erik Visser
Depending on the context and the type of question asked, a question answering (QA) system would need to automatically determine whether the answer covers single-span or multi-span text components.
no code implementations • 3 Oct 2021 • Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser
Finally, we extend our framework to perform multi-task learning by jointly optimizing the network parameters on multiple voice activated tasks using a shared transformer backbone.
Ranked #6 on Speaker Verification on VoxCeleb
no code implementations • 26 Mar 2020 • Eunjeong Koh, Fatemeh Saki, Yinyi Guo, Cheng-Yu Hung, Erik Visser
The neural adapter layer facilitates the target model to learn new sound events with minimal training data and maintaining the performance of the previously learned sound events similar to the source model.