Search Results for author: Jiangyong Huang

Found 4 papers, 3 papers with code

Multi-modal Situated Reasoning in 3D Scenes

no code implementations4 Sep 2024 Xiongkun Linghu, Jiangyong Huang, Xuesong Niu, Xiaojian Ma, Baoxiong Jia, Siyuan Huang

Comprehensive evaluations on MSQA and MSNN highlight the limitations of existing vision-language models and underscore the importance of handling multi-modal interleaved inputs and situation modeling.

Diversity Question Answering

An Embodied Generalist Agent in 3D World

1 code implementation18 Nov 2023 Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently defined in 3D world, e. g., 3D grounding, embodied reasoning and acting.

3D dense captioning Question Answering +3

Cannot find the paper you are looking for? You can Submit a new open access paper.