no code implementations • CVPR 2025 • Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, xiangyang xue
By incorporating 2D segmentation masks from the SAM and multi-view CLIP embeddings, ReasonGrounder selects Gaussian groups based on object scale, enabling accurate localization through both explicit and implicit language understanding, even in novel, occluded views.
no code implementations • 11 Feb 2025 • Sixiao Zheng, Zimian Peng, Yanpeng Zhou, Yi Zhu, Hang Xu, Xiangru Huang, Yanwei Fu
Recent image-to-video generation methods have demonstrated success in enabling control over one or two visual elements, such as camera motion or object motion.
1 code implementation • 13 Jul 2024 • Sixiao Zheng, Yanwei Fu
Visual storytelling involves generating a sequence of coherent frames from a textual storyline while maintaining consistency in characters and scenes.
Ranked #1 on
Story Visualization
on Pororo
no code implementations • 24 Feb 2024 • Sixiao Zheng, Jingyang Huo, Yu Wang, Yanwei Fu
We propose an Intelligent Director framework, utilizing LENS to generate descriptions for images and video frames and combining ChatGPT to generate coherent captions while recommending appropriate music names.
3 code implementations • 19 Jul 2022 • Li Zhang, Jiachen Lu, Sixiao Zheng, Xinxuan Zhao, Xiatian Zhu, Yanwei Fu, Tao Xiang, Jianfeng Feng, Philip H. S. Torr
Extensive experiments show that our methods achieve appealing performance on a variety of dense prediction tasks (e. g., object detection and instance segmentation and semantic segmentation) as well as image classification.
1 code implementation • 20 Feb 2022 • Sixiao Zheng, Ke Fan, Yanxi Hou, Jianfeng Feng, Yanwei Fu
In contrast, the GPD fits the distribution of distance to the centroid exceeding a sufficiently large threshold, leading to a more stable performance of GPD k-means.
1 code implementation • 4 Jun 2021 • Zekun Luo, Zheng Fang, Sixiao Zheng, Yabiao Wang, Yanwei Fu
Non-Maximum Suppression (NMS) is essential for object detection and affects the evaluation results by incorporating False Positives (FP) and False Negatives (FN), especially in crowd occlusion scenes.
Ranked #6 on
Pedestrian Detection
on Caltech
no code implementations • 23 Mar 2021 • Sixiao Zheng, Yanwei Fu, Yanxi Hou
However, zero-shot learning models assume that all seen classes should be known beforehand, while incremental learning models cannot recognize unseen classes.
5 code implementations • CVPR 2021 • Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip H. S. Torr, Li Zhang
In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task.
Ranked #2 on
Semantic Segmentation
on FoodSeg103
(using extra training data)
no code implementations • 25 Sep 2019 • Sixiao Zheng, Yanxi Hou, Yanwei Fu, Jianfeng Feng
We thus propose a novel algorithm called Extreme Value k-means (EV k-means), including GEV k-means and GPD k-means.