Search Results for author: Jinghui Xie

Found 5 papers, 1 papers with code

Text-to-Edit: Controllable End-to-End Video Ad Creation via Multimodal LLMs

no code implementations10 Jan 2025 Dabing Cheng, Haosen Zhan, Xingchen Zhao, Guisheng Liu, Zemin Li, Jinghui Xie, Zhao Song, Weiguo Feng, Bingyue Peng

The exponential growth of short-video content has ignited a surge in the necessity for efficient, automated solutions to video editing, with challenges arising from the need to understand videos and tailor the editing according to user requirements.

Video Editing

LatentSync: Audio Conditioned Latent Diffusion Models for Lip Sync

1 code implementation12 Dec 2024 Chunyu Li, Chao Zhang, Weikai Xu, Jinghui Xie, Weiguo Feng, Bingyue Peng, Weiwei Xing

Since we did not change the overall training framework of SyncNet, our experience can also be applied to other lip sync and audio-driven portrait animation methods that utilize SyncNet.

Portrait Animation

Speech2Slot: An End-to-End Knowledge-based Slot Filling from Speech

no code implementations10 May 2021 Pengwei Wang, Xin Ye, Xiaohuan Zhou, Jinghui Xie, Hao Wang

In contrast to conventional pipeline Spoken Language Understanding (SLU) which consists of automatic speech recognition (ASR) and natural language understanding (NLU), end-to-end SLU infers the semantic meaning directly from speech and overcomes the error propagation caused by ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Understanding Semantics from Speech Through Pre-training

no code implementations24 Sep 2019 Pengwei Wang, Liang-Chen Wei, Yong Cao, Jinghui Xie, Yuji Cao, Zaiqing Nie

End-to-end Spoken Language Understanding (SLU) is proposed to infer the semantic meaning directly from audio features without intermediate text representation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Quantifying Context Overlap for Training Word Embeddings

no code implementations EMNLP 2018 Yimeng Zhuang, Jinghui Xie, Yinhe Zheng, Xuan Zhu

Most models for learning word embeddings are trained based on the context information of words, more precisely first order co-occurrence relations.

Dimensionality Reduction Language Modeling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.