2 code implementations • 7 Jan 2024 • Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao
In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 6 Nov 2023 • Beltrán Labrador, Pai Zhu, Guanlong Zhao, Angelo Scorza Scarpati, Quan Wang, Alicia Lozano-Diez, Alex Park, Ignacio López Moreno
Keyword spotting systems often struggle to generalize to a diverse population with various accents and age groups.
no code implementations • 15 Sep 2023 • Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang
Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 14 Sep 2023 • Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang
We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.
no code implementations • 11 Nov 2022 • Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno
Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.
no code implementations • 11 Nov 2022 • Beltrán Labrador, Guanlong Zhao, Ignacio López Moreno, Angelo Scorza Scarpati, Liam Fowl, Quan Wang
In this paper, we present a novel approach to adapt a sequence-to-sequence Transformer-Transducer ASR system to the keyword spotting (KWS) task.
1 code implementation • 25 Oct 2022 • Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno
While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.
no code implementations • 13 Aug 2020 • Arindrima Datta, Guanlong Zhao, Bhuvana Ramabhadran, Eugene Weinstein
Automated speech recognition coverage of the world's languages continues to expand.
1 code implementation • 30 Jun 2018 • Yu Liu, Guanlong Zhao, Boyuan Gong, Yang Li, Ritu Raj, Niraj Goel, Satya Kesav, Sandeep Gottimukkala, Zhangyang Wang, Wenqi Ren, DaCheng Tao
Here we explore two related but important tasks based on the recently released REalistic Single Image DEhazing (RESIDE) benchmark dataset: (i) single image dehazing as a low-level image restoration problem; and (ii) high-level visual understanding (e. g., object detection) of hazy images.
1 code implementation • 8 May 2018 • Yu Liu, Guanlong Zhao
In this work, we investigate the possibility of replacing the $\ell_2$ loss with perceptually derived loss functions (SSIM, MS-SSIM, etc.)