no code implementations • ECCV 2020 • Yifan Yang, Guorong Li, Zhe Wu, Li Su, Qingming Huang, Nicu Sebe
We propose a soft-label sorting network along with the counting network, which sorts the given images by their crowd numbers.
no code implementations • 1 Jan 2025 • Chenlong Xu, Bineng Zhong, Qihua Liang, Yaozong Zheng, Guorong Li, Shuxiang Song
Recently, several studies have shown that utilizing contextual information to perceive target states is crucial for object tracking.
1 code implementation • 18 Dec 2024 • Xiaohai Li, Bineng Zhong, Qihua Liang, Guorong Li, Zhiyi Mo, Shuxiang Song
To address this issue, we propose MambaLCT, which constructs and utilizes target variation cues from the first frame to the current frame for robust tracking.
no code implementations • 12 Dec 2024 • Zhiyang Dou, Zipeng Wang, Xumeng Han, Chenhui Qiang, Kuiran Wang, Guorong Li, Zhibei Huang, Zhenjun Han
Global geolocation, which seeks to predict the geographical location of images captured anywhere in the world, is one of the most challenging tasks in the field of computer vision.
no code implementations • 20 Nov 2024 • Kuiran Wang, Xuehui Yu, Wenwen Yu, Guorong Li, Xiangyuan Lan, Qixiang Ye, Jianbin Jiao, Zhenjun Han
The bounding box will be used as input of single object trackers.
no code implementations • 11 May 2024 • Yunchuan Ma, Laiyun Qing, Guorong Li, Yuankai Qi, Amin Beheshti, Quan Z. Sheng, Qingming Huang
Specifically, we bridge video and text using four key models: a general video-text retrieval model XCLIP, a general image-text matching model CLIP, a text alignment model AnglE, and a text generation model GPT-2, due to their source-code availability.
2 code implementations • 30 Jan 2024 • Xuehui Yu, Pengfei Chen, Kuiran Wang, Xumeng Han, Guorong Li, Zhenjun Han, Qixiang Ye, Jianbin Jiao
CPR reduces the semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
1 code implementation • CVPR 2024 • Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang
To provide a more realistic reflection of the underlying practical challenge we introduce a weakly supervised VIC task wherein trajectory labels are not provided.
1 code implementation • CVPR 2024 • Zhaoyang Wei, Pengfei Chen, Xuehui Yu, Guorong Li, Jianbin Jiao, Zhenjun Han
In this paper, we introduce a cost-effective category-specific segmenter using SAM.
no code implementations • 20 Dec 2023 • Chang Teng, Yunchuan Ma, Guorong Li, Yuankai Qi, Laiyu Qing, Qingming Huang
To address this problem, we propose a new video captioning task, Subject-Oriented Video Captioning (SOVC), which aims to allow users to specify the describing target via a bounding box.
1 code implementation • 10 Dec 2023 • Xinyan Liu, Guorong Li, Yuankai Qi, Ziheng Yan, Zhenjun Han, Anton Van Den Hengel, Ming-Hsuan Yang, Qingming Huang
% To provide a more realistic reflection of the underlying practical challenge, we introduce a weakly supervised VIC task, wherein trajectory labels are not provided.
1 code implementation • 4 Dec 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Hanhua Ye, Laiyun Qing, Ming-Hsuan Yang, Qingming Huang
To address these limitations, we propose a Dynamic Erasing Network (DE-Net) for weakly supervised video anomaly detection, which learns multi-scale temporal features.
no code implementations • 22 Nov 2023 • Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han
In this study, we introduce P2RBox, which employs point prompt to generate rotated box (RBox) annotation for oriented object detection.
1 code implementation • 27 Aug 2023 • Yaozong Zheng, Bineng Zhong, Qihua Liang, Guorong Li, Rongrong Ji, Xianxian Li
In this paper, we present a simple, flexible and effective vision-language (VL) tracking pipeline, termed \textbf{MMTrack}, which casts VL tracking as a token generation task.
1 code implementation • ICCV 2023 • Di wu, Pengfei Chen, Xuehui Yu, Guorong Li, Zhenjun Han, Jianbin Jiao
Object detection via inaccurate bounding boxes supervision has boosted a broad interest due to the expensive high-quality annotation data or the occasional inevitability of low annotation quality (\eg tiny objects).
no code implementations • CVPR 2023 • Chen Zhang, Guorong Li, Yuankai Qi, Shuhui Wang, Laiyun Qing, Qingming Huang, Ming-Hsuan Yang
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels.
1 code implementation • 8 Dec 2022 • Ziheng Yan, Yuankai Qi, Guorong Li, Xinyan Liu, Weigang Zhang, Qingming Huang, Ming-Hsuan Yang
Crowd counting is usually handled in a density map regression fashion, which is supervised via a L2 loss between the predicted density map and ground truth.
no code implementations • 8 Dec 2022 • Xinyan Liu, Guorong Li, Yuankai Qi, Zhenjun Han, Qingming Huang, Ming-Hsuan Yang, Nicu Sebe
Crowd localization aims to predict the spatial position of humans in a crowd scenario.
1 code implementation • 13 Sep 2022 • Ke Ma, Qianqian Xu, Jinshan Zeng, Guorong Li, Xiaochun Cao, Qingming Huang
From the perspective of the dynamical system, the attack behavior with a target ranking list is a fixed point belonging to the composition of the adversary and the victim.
1 code implementation • 26 Jul 2022 • Weidong Chen, Dexiang Hong, Yuankai Qi, Zhenjun Han, Shuhui Wang, Laiyun Qing, Qingming Huang, Guorong Li
To address this problem, we propose a multi-attention network which consists of dual-path dual-attention module and a query-based cross-modal Transformer module.
Ranked #5 on
Referring Expression Segmentation
on A2D Sentences
2 code implementations • CVPR 2022 • Xuehui Yu, Pengfei Chen, Di wu, Najmul Hassan, Guorong Li, Junchi Yan, Humphrey Shi, Qixiang Ye, Zhenjun Han
In this study, we propose a POL method using coarse point annotations, relaxing the supervision signals from accurate key points to freely spotted points.
1 code implementation • CVPR 2022 • Hanhua Ye, Guorong Li, Yuankai Qi, Shuhui Wang, Qingming Huang, Ming-Hsuan Yang
(II) Predicate level, which learns the actions conditioned on highlighted objects and is supervised by the predicate in captions.
2 code implementations • 7 Jul 2021 • Xumeng Han, Xuehui Yu, Guorong Li, Jian Zhao, Gang Pan, Qixiang Ye, Jianbin Jiao, Zhenjun Han
While extensive research has focused on the framework design and loss function, this paper shows that sampling strategy plays an equally important role.
1 code implementation • CVPR 2021 • Siyuan Cheng, Bineng Zhong, Guorong Li, Xin Liu, Zhenjun Tang, Xianxian Li, Jing Wang
RD performs in a meta-learning way to obtain a learning ability to filter the distractors from the background while RM aims to effectively integrate the proposed RD into the Siamese framework to generate accurate tracking result.
1 code implementation • 21 Jan 2021 • Nan Jiang, Kuiran Wang, Xiaoke Peng, Xuehui Yu, Qiang Wang, Junliang Xing, Guorong Li, Jian Zhao, Guodong Guo, Zhenjun Han
The releasing of such a large-scale dataset could be a useful initial step in research of tracking UAVs.
no code implementations • ICCV 2021 • Xinyan Liu, Guorong Li, Zhenjun Han, Weigang Zhang, Yifan Yang, Qingming Huang, Nicu Sebe
Specifically, we propose a task-driven similarity metric based on sample's mutual enhancement, referred as co-fine-tune similarity, which can find a more efficient subset of data for training the expert network.
2 code implementations • CVPR 2020 • Zedu Chen, Bineng Zhong, Guorong Li, Shengping Zhang, Rongrong Ji
Most of the existing trackers usually rely on either a multi-scale searching scheme or pre-defined anchor boxes to accurately estimate the scale and aspect ratio of a target.
no code implementations • 26 Jul 2019 • Qi Feng, Vitaly Ablavsky, Qinxun Bai, Guorong Li, Stan Sclaroff
In benchmarks, our method is competitive with state of the art trackers, while it outperforms all other trackers on targets with unambiguous and precise language annotations.
1 code implementation • CVPR 2019 • Kai Xu, Longyin Wen, Guorong Li, Liefeng Bo, Qingming Huang
Specifically, the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data, is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation.
Ranked #2 on
Semi-Supervised Video Object Segmentation
on YouTube
no code implementations • ECCV 2018 • Dawei Du, Yuankai Qi, Hongyang Yu, Yifan Yang, Kaiwen Duan, Guorong Li, Weigang Zhang, Qingming Huang, Qi Tian
Selected from 10 hours raw videos, about 80, 000 representative frames are fully annotated with bounding boxes as well as up to 14 kinds of attributes (e. g., weather condition, flying altitude, camera view, vehicle category, and occlusion) for three fundamental computer vision tasks: object detection, single object tracking, and multiple object tracking.
Ranked #5 on
Object Detection
on UAVDT