1 code implementation • 19 Jul 2022 • Yusheng Zhao, Jinyu Chen, Chen Gao, Wenguan Wang, Lirong Yang, Haibing Ren, Huaxia Xia, Si Liu
Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions.
1 code implementation • CVPR 2022 • Junyu Luo, Jiahui Fu, Xianghao Kong, Chen Gao, Haibing Ren, Hao Shen, Huaxia Xia, Si Liu
3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.
8 code implementations • NeurIPS 2021 • Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen
Very recently, a variety of vision transformer architectures for dense prediction tasks have been proposed and they show that the design of spatial attention is critical to their success in these tasks.
Ranked #46 on
Semantic Segmentation
on ADE20K val
2 code implementations • CVPR 2021 • Yuqing Wang, Zhaoliang Xu, Xinlong Wang, Chunhua Shen, Baoshan Cheng, Hao Shen, Huaxia Xia
Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.
Ranked #21 on
Video Instance Segmentation
on YouTube-VIS validation
no code implementations • 27 May 2020 • Yanliang Zhu, Dongchun Ren, Mingyu Fan, Deheng Qian, Xin Li, Huaxia Xia
Trajectory forecasting, or trajectory prediction, of multiple interacting agents in dynamic scenes, is an important problem for many applications, such as robotic systems and autonomous driving.
no code implementations • 8 Jan 2020 • Yanliang Zhu, Deheng Qian, Dongchun Ren, Huaxia Xia
To further advance the performance, we propose an interactive loss to guide the generation of the drivable spaces.
no code implementations • 5 Jun 2019 • Yanliang Zhu, Deheng Qian, Dongchun Ren, Huaxia Xia
The hub network takes observed trajectories of all pedestrians to produce a comprehensive description of the interpersonal interactions.