1 code implementation • 9 Jun 2024 • Mengfei Du, Binhao Wu, Zejun Li, Xuanjing Huang, Zhongyu Wei
The recent rapid development of Large Vision-Language Models (LVLMs) has indicated their potential for embodied tasks. However, the critical skill of spatial understanding in embodied environments has not been thoroughly evaluated, leaving the gap between current LVLMs and qualified embodied intelligence unknown.
1 code implementation • 2 Apr 2024 • Mengfei Du, Binhao Wu, Jiwen Zhang, Zhihao Fan, Zejun Li, Ruipu Luo, Xuanjing Huang, Zhongyu Wei
For task completion, the agent needs to align and integrate various navigation modalities, including instruction, observation and navigation history.
1 code implementation • 28 Feb 2024 • Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Ceyao Zhang, Chenxing Wei, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zhibin Gou, Zongze Xu, Chenglin Wu
On InfiAgent-DABench, it achieves a 25% performance boost, raising accuracy from 75. 9% to 94. 9%.
1 code implementation • 4 Oct 2023 • Zejun Li, Ye Wang, Mengfei Du, Qingwen Liu, Binhao Wu, Jiwen Zhang, Chengxing Zhou, Zhihao Fan, Jie Fu, Jingjing Chen, Xuanjing Huang, Zhongyu Wei
Recent years have witnessed remarkable progress in the development of large vision-language models (LVLMs).