1 code implementation • CVPR 2022 • Jinsheng Wang, Yinchao Ma, Shaofei Huang, Tianrui Hui, Fei Wang, Chen Qian, Tianzhu Zhang
Earlier works follow a top-down roadmap to regress predefined anchors into various shapes of lane lines, which lacks enough flexibility to fit complex shapes of lanes due to the fixed anchor shapes.
Ranked #4 on Lane Detection on TuSimple (F1 score metric)
1 code implementation • CVPR 2023 • Shaofei Huang, Zhenwei Shen, Zehao Huang, Zi-han Ding, Jiao Dai, Jizhong Han, Naiyan Wang, Si Liu
An attempt has been made to get rid of BEV and predict 3D lanes from FV representations directly, while it still underperforms other BEV-based methods given its lack of structured representation for 3D lanes.
Ranked #3 on 3D Lane Detection on Apollo Synthetic 3D Lane
1 code implementation • CVPR 2020 • Shaofei Huang, Tianrui Hui, Si Liu, Guanbin Li, Yunchao Wei, Jizhong Han, Luoqi Liu, Bo Li
In addition to the CMPC module, we further leverage a simple yet effective TGFE module to integrate the reasoned multimodal features from different levels with the guidance of textual information.
Ranked #14 on Referring Expression Segmentation on RefCOCO testB
1 code implementation • 15 May 2021 • Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li
In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models.
Ranked #7 on Referring Expression Segmentation on J-HMDB
1 code implementation • ECCV 2020 • Tianrui Hui, Si Liu, Shaofei Huang, Guanbin Li, Sansi Yu, Faxi Zhang, Jizhong Han
Referring image segmentation aims to predict the foreground mask of the object referred by a natural language sentence.
no code implementations • 11 Jan 2021 • Shaofei Huang, Si Liu, Tianrui Hui, Jizhong Han, Bo Li, Jiashi Feng, Shuicheng Yan
Our ORDNet is able to extract more comprehensive context information and well adapt to complex spatial variance in scene images.
no code implementations • CVPR 2021 • Tianrui Hui, Shaofei Huang, Si Liu, Zihan Ding, Guanbin Li, Wenguan Wang, Jizhong Han, Fei Wang
Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation.
Ranked #8 on Referring Expression Segmentation on J-HMDB
no code implementations • 5 Aug 2021 • Dailan He, Yusheng Zhao, Junyu Luo, Tianrui Hui, Shaofei Huang, Aixi Zhang, Si Liu
Existing works usually adopt dynamic graph networks to indirectly model the intra/inter-modal interactions, making the model difficult to distinguish the referred object from distractors due to the monolithic representations of visual and linguistic contents.
no code implementations • 6 Oct 2022 • Yuanbin Wang, Leyan Zhu, Shaofei Huang, Tianrui Hui, Xiaojie Li, Fei Wang, Si Liu
To better bridge the domain gap between source domain (synthetic data) and target domain (real-world data), we also propose a Selective Feature Alignment (SFA) module which only aligns the features of consistent foreground area between the two domains, thus realizing inter-domain intra-modality adaptation.
no code implementations • 18 Sep 2023 • Shaofei Huang, Han Li, Yuqing Wang, Hongji Zhu, Jiao Dai, Jizhong Han, Wenge Rong, Si Liu
Explicit object-level semantic correspondence between audio and visual modalities is established by gathering object information from visual features with predefined audio queries.
no code implementations • 4 Dec 2023 • Runze He, Shaofei Huang, Xuecheng Nie, Tianrui Hui, Luoqi Liu, Jiao Dai, Jizhong Han, Guanbin Li, Si Liu
In this paper, we target the adaptive source driven 3D scene editing task by proposing a CustomNeRF model that unifies a text description or a reference image as the editing prompt.
no code implementations • 12 Dec 2023 • Yuanbin Wang, Shaofei Huang, Yulu Gao, Zhen Wang, Rui Wang, Kehua Sheng, Bo Zhang, Si Liu
In this work, we focus on zero-shot point cloud semantic segmentation and propose a simple yet effective baseline to transfer the visual-linguistic knowledge implied in CLIP to point cloud encoder at both feature and output levels.
no code implementations • 9 Mar 2024 • Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu
Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning.