no code implementations • 21 Mar 2024 • Yong He, Hongshan Yu, Muhammad Ibrahim, Xiaoyan Liu, Tongjia Chen, Anwaar Ulhaq, Ajmal Mian
This strategy allows various transformer blocks to share the same position information over the same resolution points, thereby reducing network parameters and training time without compromising accuracy. Experimental comparisons with existing methods on multiple datasets demonstrate the efficacy of SMTransformer and skip-attention-based up-sampling for point cloud processing tasks, including semantic segmentation and classification.
1 code implementation • CVPR 2024 • Tongjia Chen, Hongshan Yu, Zhengeng Yang, Zechuan Li, Wei Sun, Chen Chen
Due to the resource-intensive nature of training vision-language models on expansive video data, a majority of studies have centered on adapting pre-trained image-language models to the video domain.
Ranked #3 on Zero-Shot Action Recognition on Kinetics
1 code implementation • 23 Jun 2023 • Tom Tongjia Chen, Hongshan Yu, Zhengeng Yang, Ming Li, Zechuan Li, Jingwen Wang, Wei Miao, Wei Sun, Chen Chen
Affordance-Centric Question-driven Task Completion (AQTC) has been proposed to acquire knowledge from videos to furnish users with comprehensive and systematic instructions.
no code implementations • 8 Mar 2023 • Yong He, Hongshan Yu, Zhengeng Yang, Wei Sun, Mingtao Feng, Ajmal Mian
Local features and contextual dependencies are crucial for 3D point cloud analysis.
no code implementations • 8 Mar 2023 • Yong He, Hongshan Yu, Zhengeng Yang, Xiaoyan Liu, Wei Sun, Ajmal Mian
In particular, we achieve state-of-the-art semantic segmentation results of 76% mIoU on S3DIS 6-fold and 72. 2% on S3DIS Area5.
1 code implementation • CVPR 2023 • Zechuan Li, Hongshan Yu, Zhengeng Yang, Tongjia Chen, Naveed Akhtar
In this work, we propose AShapeFormer, a semantics-guided object-level shape encoding module for 3D object detection.
no code implementations • 12 Aug 2022 • Zhengeng Yang, Hongshan Yu, Wei Sun, Li-Cheng, Ajmal Mian
In this paper, we present an easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation.
1 code implementation • 28 Apr 2022 • Mingtao Feng, Kendong Liu, Liang Zhang, Hongshan Yu, Yaonan Wang, Ajmal Mian
Saliency detection with light field images is becoming attractive given the abundant cues available, however, this comes at the expense of large-scale pixel level annotated data which is expensive to generate.
no code implementations • 7 Apr 2022 • Qiang Fu, Hongshan Yu, Islam Ali, Hong Zhang
To achieve this goal, an efficient two endpoint tracking (TET) method is presented: first, describe a given line feature with its two endpoints; next, track the two endpoints based on SOF to obtain two new tracked endpoints by minimizing a pixel-level grayscale residual function; finally, connect the two tracked endpoints to generate a new line feature.
1 code implementation • CVPR 2022 • Mingtao Feng, Kendong Liu, Liang Zhang, Hongshan Yu, Yaonan Wang, Ajmal Mian
Saliency detection with light field images is becoming attractive given the abundant cues available, however, this comes at the expense of large-scale pixel level annotated data which is expensive to generate.
no code implementations • 9 Mar 2021 • Yong He, Hongshan Yu, Xiaoyan Liu, Zhengeng Yang, Wei Sun, Ajmal Mian
This paper fills the gap and provides a comprehensive survey of the recent progress made in deep learning based 3D segmentation.
no code implementations • 18 Dec 2020 • Zhengeng Yang, Hongshan Yu, Yong He, Zhi-Hong Mao, Ajmal Mian
By learning to solve a Jigsaw Puzzle problem with 25 patches and transferring the learned features to semantic segmentation task on Cityscapes dataset, we achieve a 5. 8 percentage point improvement over the baseline model that initialized from random values.
1 code implementation • 16 Sep 2020 • Qiang Fu, Jialong Wang, Hongshan Yu, Islam Ali, Feng Guo, Yijia He, Hong Zhang
This paper presents PL-VINS, a real-time optimization-based monocular VINS method with point and line features, developed based on the state-of-the-art point-based VINS-Mono \cite{vins}.
no code implementations • 22 Aug 2020 • Qiang Fu, Hongshan Yu, Xiaolong Wang, Zhengeng Yang, Hong Zhang, Ajmal Mian
ORB-SLAM2 \cite{orbslam2} is a benchmark method in this domain, however, it consumes significant time for computing descriptors that never get reused unless a frame is selected as a keyframe.
Robotics Computational Geometry I.4.0; I.4.9
no code implementations • 16 Mar 2019 • Zhengeng Yang, Hongshan Yu, Qiang Fu, Wei Sun, Wenyan Jia, Mingui Sun, Zhi-Hong Mao
The rapid development of autonomous driving in recent years presents lots of challenges for scene understanding.