no code implementations • 10 Jan 2025 • Dabing Cheng, Haosen Zhan, Xingchen Zhao, Guisheng Liu, Zemin Li, Jinghui Xie, Zhao Song, Weiguo Feng, Bingyue Peng
The exponential growth of short-video content has ignited a surge in the necessity for efficient, automated solutions to video editing, with challenges arising from the need to understand videos and tailor the editing according to user requirements.
1 code implementation • 12 Dec 2024 • Chunyu Li, Chao Zhang, Weikai Xu, Jinghui Xie, Weiguo Feng, Bingyue Peng, Weiwei Xing
Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.
no code implementations • 10 May 2021 • Pengwei Wang, Xin Ye, Xiaohuan Zhou, Jinghui Xie, Hao Wang
In contrast to conventional pipeline Spoken Language Understanding (SLU) which consists of automatic speech recognition (ASR) and natural language understanding (NLU), end-to-end SLU infers the semantic meaning directly from speech and overcomes the error propagation caused by ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +8
no code implementations • 24 Sep 2019 • Pengwei Wang, Liang-Chen Wei, Yong Cao, Jinghui Xie, Yuji Cao, Zaiqing Nie
End-to-end Spoken Language Understanding (SLU) is proposed to infer the semantic meaning directly from audio features without intermediate text representation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • EMNLP 2018 • Yimeng Zhuang, Jinghui Xie, Yinhe Zheng, Xuan Zhu
Most models for learning word embeddings are trained based on the context information of words, more precisely first order co-occurrence relations.