1 code implementation • 5 Nov 2023 • Zeren Chen, Ziqin Wang, Zhen Wang, Huayang Liu, Zhenfei Yin, Si Liu, Lu Sheng, Wanli Ouyang, Yu Qiao, Jing Shao
While this phenomenon has been overlooked in previous work, we propose a novel and extensible framework, called Octavius, for comprehensive studies and experimentation on multimodal learning with Multimodal Large Language Models (MLLMs).
1 code implementation • CVPR 2023 • Ziqin Wang, Bowen Cheng, Lichen Zhao, Dong Xu, Yang Tang, Lu Sheng
Since 2D images provide rich semantics and scene graphs are in nature coped with languages, in this study, we propose Visual-Linguistic Semantics Assisted Training (VL-SAT) scheme that can significantly empower 3DSSG prediction models with discrimination about long-tailed and ambiguous semantic relations.
Ranked #1 on 3d scene graph generation on 3DSSG (using extra training data)
1 code implementation • 9 Apr 2020 • Lin-Zhuo Chen, Zheng Lin, Ziqin Wang, Yong-Liang Yang, Ming-Ming Cheng
S-Conv is competent to infer the sampling offset of the convolution kernel guided by the 3D spatial information, helping the convolutional layer adjust the receptive field and adapt to geometric transformations.
Ranked #20 on Semantic Segmentation on SUN-RGBD
2 code implementations • ICCV 2019 • Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao
Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner.