1 code implementation • 21 Jul 2024 • Yiyang Jiang, WengYu Zhang, Xulu Zhang, XiaoYong Wei, Chang Wen Chen, Qing Li
Through a feasibility study, we demonstrate that LLM encoders effectively refine inter-concept relations in multimodal embeddings, even without being trained on textual embeddings.
Ranked #4 on Natural Language Moment Retrieval on TACoS
1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang
While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.