no code implementations • 13 Mar 2024 • Jiafu Chen, Wei Xing, Jiakai Sun, Tianyi Chu, Yiling Huang, Boyan Ji, Lei Zhao, Huaizhong Lin, Haibo Chen, Zhizhong Wang
3D scene stylization refers to transform the appearance of a 3D scene to match a given style image, ensuring that images rendered from different viewpoints exhibit the same style as the given style image, while maintaining the 3D consistency of the stylized scene.
2 code implementations • 7 Jan 2024 • Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao
In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 11 Dec 2023 • Zhanjie Zhang, Quanwei Zhang, Guangyuan Li, Wei Xing, Lei Zhao, Jiakai Sun, Zehua Lan, Junsheng Luan, Yiling Huang, Huaizhong Lin
To address the above issues, we propose ArtBank, a novel artistic style transfer framework, to generate highly realistic stylized images while preserving the content structure of the content images.
no code implementations • 15 Sep 2023 • Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang
Whether it is the conventional modularized approach or the more recent end-to-end neural diarization (EEND), an additional automatic speech recognition (ASR) model and an orchestration algorithm are required to associate the speaker labels with recognized words.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 14 Sep 2023 • Guanlong Zhao, Yongqiang Wang, Jason Pelecanos, Yu Zhang, Hank Liao, Yiling Huang, Han Lu, Quan Wang
We show that the USM-SCD model can achieve more than 75% average speaker change detection F1 score across a test set that consists of data from 96 languages.
no code implementations • 24 Jun 2023 • Yiling Huang, Sarah Pirenne, Snigdha Panigrahi, Gerda Claeskens
Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions.
no code implementations • 11 Nov 2022 • Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno
Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.
1 code implementation • 25 Oct 2022 • Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno
While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.
1 code implementation • 10 Mar 2022 • Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno
This paper presents a novel study of parameter-free attentive scoring for speaker verification.
1 code implementation • 24 Feb 2022 • Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno
In this paper, we introduce a novel language identification system based on conformer layers.
1 code implementation • 23 Sep 2021 • Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Yiling Huang, Ignacio Lopez Moreno, Hasim Sak
In this paper, we present a novel speaker diarization system for streaming on-device applications.
no code implementations • 24 Nov 2020 • Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang
In recent years, Text-To-Speech (TTS) has been used as a data augmentation technique for speech recognition to help complement inadequacies in the training data.