no code implementations • 11 Jun 2024 • Kai Wang, Shijian Deng, Jing Shi, Dimitrios Hatzinakos, Yapeng Tian
This shared backbone facilitates both audio and video generation.
no code implementations • 22 Mar 2024 • Shijian Deng, Erin E. Kosloski, Siddhi Patel, Zeke A. Barnett, Yiyang Nan, Alexander Kaplan, Sisira Aarukapalli, William T. Doan, Matthew Wang, Harsh Singh, Pamela R. Rollins, Yapeng Tian
To pave the way for further research on this new problem, we intensively explored leveraging foundation models and multimodal large language models across different modalities.
no code implementations • 18 Oct 2023 • Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu
The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.
no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.