1 code implementation • 11 Mar 2023 • Ge Zhu, Yujia Yan, Juan-Pablo Caceres, Zhiyao Duan
Non-linguistic filler words, such as "uh" or "um", are prevalent in spontaneous speech and serve as indicators for expressing hesitation or uncertainty.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 18 Jan 2023 • Ge Zhu, Jinbao Li, Yahong Guo
In the OS branch, we first aggregate multi-level features to adaptively select complementary components, and then feed the saliency features with expanded boundary into aggregated features to guide the network obtain complete prediction.
1 code implementation • 19 Apr 2022 • Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan
Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art.
1 code implementation • 28 Mar 2022 • Ge Zhu, Juan-Pablo Caceres, Justin Salamon
In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions.
1 code implementation • 10 Feb 2022 • You Zhang, Ge Zhu, Zhiyao Duan
We further propose fusion strategies for direct inference and fine-tuning to predict the SASV score based on the framework.
1 code implementation • 8 Oct 2021 • Ge Zhu, Frank Cwitkowitz, Zhiyao Duan
In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments.
2 code implementations • 26 Jul 2021 • Xinhui Chen, You Zhang, Ge Zhu, Zhiyao Duan
Different from previous ASVspoof challenges, the LA task this year presents codec and transmission channel variability, while the new task DF presents general audio compression.
3 code implementations • 3 Apr 2021 • You Zhang, Ge Zhu, Fei Jiang, Zhiyao Duan
Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials.
1 code implementation • 24 Oct 2020 • Ge Zhu, Fei Jiang, Zhiyao Duan
State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.