no code implementations • 13 Mar 2024 • Feng Xiao, Hongbin Xu, Qiuxia Wu, Wenxiong Kang
3D visual grounding aims to automatically locate the 3D region of the specified object given the corresponding textual description.
2 code implementations • 22 Dec 2023 • Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, ZongYuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang
Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios.
1 code implementation • 12 Apr 2021 • Hongbin Xu, Zhipeng Zhou, Yu Qiao, Wenxiong Kang, Qiuxia Wu
Recent studies have witnessed that self-supervised methods based on view synthesis obtain clear progress on multi-view stereo (MVS).