1 code implementation • 27 Feb 2024 • Nguyen Nguyen, Jing Bi, Ali Vosoughi, Yapeng Tian, Pooyan Fazli, Chenliang Xu
To address these challenges, in this paper, we introduce the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
1 code implementation • 10 Jan 2024 • Ali Vosoughi, Luca Bondi, Ho-Hsiang Wu, Chenliang Xu
Conventional audio classification relied on predefined classes, lacking the ability to learn from free-form text.
1 code implementation • 29 Dec 2023 • Yunlong Tang, Jing Bi, Siting Xu, Luchuan Song, Susan Liang, Teng Wang, Daoan Zhang, Jie An, Jingyang Lin, Rongyi Zhu, Ali Vosoughi, Chao Huang, Zeliang Zhang, Pinxin Liu, Mingqian Feng, Feng Zheng, JianGuo Zhang, Ping Luo, Jiebo Luo, Chenliang Xu
With the burgeoning growth of online video platforms and the escalating volume of video content, the demand for proficient video understanding tools has intensified markedly.
no code implementations • 18 Oct 2023 • Yiyang Su, Ali Vosoughi, Shijian Deng, Yapeng Tian, Chenliang Xu
The audio-visual sound separation field assumes visible sources in videos, but this excludes invisible sounds beyond the camera's view.
1 code implementation • 18 Oct 2023 • Jing Bi, Nguyen Manh Nguyen, Ali Vosoughi, Chenliang Xu
Augmented reality (AR) requires the seamless integration of visual, auditory, and linguistic channels for optimized human-computer interaction.
no code implementations • 31 May 2023 • Ali Vosoughi, Shijian Deng, Songyang Zhang, Yapeng Tian, Chenliang Xu, Jiebo Luo
In this paper, we first model a confounding effect that causes language and vision bias simultaneously, then propose a counterfactual inference to remove the influence of this effect.
no code implementations • 26 Jan 2023 • Nathan Hadjiyski, Ali Vosoughi, Axel Wismueller
Extensive experiments confirm consistent results for classifying lung pathologies using the multimodal global local representations of language and vision information.
no code implementations • 2 May 2022 • Axel Wismueller, Larry Stockmaster, Ali Vosoughi
Using AQUARIUS with NLP on final radiology reports and targeted expert neuroradiology review of only 29 discordantly classified cases reduced the human QA effort by 98. 5%, where we found a total of six non-reported true ICH+ cases, with radiologists' missed ICH detection rates of 0. 52% and 2. 5% for flagged and non-flagged cases, respectively.