1 code implementation • 20 Aug 2023 • Zihan Zhao, Yiyang Jiang, Heyang Liu, Yanfeng Wang, Yu Wang
While Large Language Models (LLMs) have demonstrated commendable performance across a myriad of domains and tasks, existing LLMs still exhibit a palpable deficit in handling multimodal functionalities, especially for the Spoken Question Answering (SQA) task which necessitates precise alignment and deep interaction between speech and text features.