LaMOT: Language-Guided Multi-Object Tracking

1 code implementation12 Jun 2024 Yunhao Li, Xiaoqiong Liu, Luke Liu, Heng Fan, Libo Zhang

In this paper, we address this challenge by introducing Language-Guided MOT, a unified task framework, along with a corresponding large-scale benchmark, termed LaMOT, which encompasses diverse scenarios and language descriptions.

Descriptive Multi-Object Tracking +1

ProMotion: Prototypes As Motion Learners

no code implementations CVPR 2024 Yawen Lu, Dongfang Liu, Qifan Wang, Cheng Han, Yiming Cui, Zhiwen Cao, Xueling Zhang, Yingjie Victor Chen, Heng Fan

We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion.

LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking

no code implementations27 May 2024 Shaohua Dong, Yunhe Feng, Qing Yang, Yuewei Lin, Heng Fan

In this paper, we aim to mitigate such information loss to boost the performance of the low-resolution Transformer tracking via dual knowledge distillation from a frozen high-resolution (but not a larger) Transformer tracker.

Knowledge Distillation

Benchmarking the Robustness of UAV Tracking Against Common Corruptions

1 code implementation18 Mar 2024 Xiaoqiong Liu, Yunhe Feng, Shu Hu, Xiaohui Yuan, Heng Fan

Addressing this, we propose UAV-C, a large-scale benchmark for assessing robustness of UAV trackers under common corruptions.


S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document

1 code implementation15 Mar 2024 Kareem Shaik, Dali Wang, Weijian Zheng, Qinglei Cao, Heng Fan, Peter Schwartz, Yunhe Feng

S3LLM demonstrates the potential of using locally deployed open-source LLMs for the rapid understanding of large-scale scientific computing software, eliminating the need for extensive coding expertise, and thereby making the process more efficient and effective.

Natural Language Queries RAG

Beyond MOT: Semantic Multi-Object Tracking

no code implementations8 Mar 2024 Yunhao Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang

Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e.,"where") in videos.

Multi-Object Tracking Object +1

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

no code implementations8 Mar 2024 Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.

Inductive Bias Position +1

VastTrack: Vast Category Visual Object Tracking

1 code implementation6 Mar 2024 Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.

Object Visual Object Tracking +1

Neural Radiance Fields in Medical Imaging: Challenges and Next Steps

no code implementations26 Feb 2024 Xin Wang, Shu Hu, Heng Fan, Hongtu Zhu, Xin Li

Neural Radiance Fields (NeRF), as a pioneering technique in computer vision, offer great potential to revolutionize medical imaging by synthesizing three-dimensional representations from the projected two-dimensional image data.

Context-Guided Spatio-Temporal Video Grounding

1 code implementation CVPR 2024 Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang

The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.

Object Spatio-Temporal Video Grounding +1

Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction

no code implementations CVPR 2024 Jinzhi Zheng, Heng Fan, Libo Zhang

To address these issues this paper proposes a simple and effective scene text detection method the Kernel Adaptive Convolution which is designed with a Kernel Adaptive Convolution Module for scene text detection via predicting the distance map.

Foreground Segmentation Scene Text Detection +1

SSPNet: Scale and Spatial Priors Guided Generalizable and Interpretable Pedestrian Attribute Recognition

1 code implementation11 Dec 2023 Jifeng Shen, Teng Guo, Xin Zuo, Heng Fan, Wankou Yang

The AFSS module learns to provide reasonable scale prior information for different attribute groups, allowing the model to focus on different levels of feature maps with varying semantic granularity.

Attribute Pedestrian Attribute Recognition

SiCP: Simultaneous Individual and Cooperative Perception for 3D Object Detection in Connected and Automated Vehicles

1 code implementation8 Dec 2023 Deyuan Qu, Qi Chen, Tianyu Bai, Andy Qin, HongSheng Lu, Heng Fan, Song Fu, Qing Yang

Cooperative perception for connected and automated vehicles is traditionally achieved through the fusion of feature maps from two or more vehicles.

3D Object Detection object-detection

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

1 code implementation1 Dec 2023 Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan

Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion.

Ranked #4 on Semantic Segmentation on SUN-RGBD (using extra training data)

Decoder object-detection +7

Flow-Guided Diffusion for Video Inpainting

1 code implementation26 Nov 2023 Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang

Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.

Denoising Image Generation +2

Local Compressed Video Stream Learning for Generic Event Boundary Detection

1 code implementation27 Sep 2023 Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan

Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.

Boundary Detection Generic Event Boundary Detection +1

Accurate and Fast Compressed Video Captioning

1 code implementation ICCV 2023 Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang

Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.

Video Captioning

Collaborative Three-Stream Transformers for Video Captioning

no code implementations18 Sep 2023 Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo

Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.

Sentence Video Captioning

Unsupervised Domain Adaptive Detection with Network Stability Analysis

1 code implementation ICCV 2023 Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang

In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.

Domain Adaptation

Divert More Attention to Vision-Language Object Tracking

1 code implementation19 Jul 2023 Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan

To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i. e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT.

Attribute Object +1

Deficiency-Aware Masked Transformer for Video Inpainting

1 code implementation17 Jul 2023 Yongsheng Yu, Heng Fan, Libo Zhang

Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.

Hallucination Image Inpainting +2

MaGIC: Multi-modality Guided Image Completion

no code implementations19 May 2023 Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang

In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.

CCTV-Gun: Benchmarking Handgun Detection in CCTV Images

1 code implementation19 Mar 2023 Srikar Yellapragada, Zhenghong Li, Kevin Bhadresh Doshi, Purva Makarand Mhasakar, Heng Fan, Jie Wei, Erik Blasch, Bin Zhang, Haibin Ling

In this paper, we present a meticulously crafted and annotated benchmark, called \textbf{CCTV-Gun}, which addresses the challenges of detecting handguns in real-world CCTV images.

Benchmarking object-detection +1

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

no code implementations1 Jan 2023 Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling

To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.

Domain Adaptation object-detection +1

PIDray: A Large-scale X-ray Benchmark for Real-World Prohibited Item Detection

3 code implementations19 Nov 2022 Libo Zhang, Lutao Jiang, Ruyi Ji, Heng Fan

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion.

Binary Classification Instance Segmentation +4

High-Fidelity Image Inpainting with GAN Inversion

no code implementations25 Aug 2022 Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo

Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.

Image Inpainting Vocal Bursts Intensity Prediction

Divert More Attention to Vision-Language Tracking

1 code implementation3 Jul 2022 Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing

By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.

Object Tracking

AnimalTrack: A Benchmark for Multi-Animal Tracking in the Wild

no code implementations30 Apr 2022 Libo Zhang, Junyuan Gao, Zhen Xiao, Heng Fan

Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is crucial for animal motion and behavior analysis and has many crucial applications such as biology, ecology and animal conservation.

Multi-Object Tracking

Learning Target-aware Representation for Visual Tracking via Informative Interactions

no code implementations7 Jan 2022 Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu

The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.

Representation Learning Visual Tracking

Osteoporosis Prescreening using Panoramic Radiographs through a Deep Convolutional Neural Network with Attention Mechanism

no code implementations19 Oct 2021 Heng Fan, Jiaxiang Ren, Jie Yang, Yi-Xian Qin, Haibin Ling

The aim of this study was to investigate whether a deep convolutional neural network (CNN) with an attention module can detect osteoporosis on panoramic radiographs.

CRACT: Cascaded Regression-Align-Classification for Robust Visual Tracking

no code implementations25 Nov 2020 Heng Fan, Haibin Ling

The key is to bridge box regression and classification via an alignment step, which leads to more accurate features for proposal classification with improved robustness.

Classification General Classification +3

LaSOT: A High-quality Large-scale Single Object Tracking Benchmark

1 code implementation8 Sep 2020 Heng Fan, Hexin Bai, Liting Lin, Fan Yang, Peng Chu, Ge Deng, Sijia Yu, Harshit, Mingzhen Huang, Juehuan Liu, Yong Xu, Chunyuan Liao, Lin Yuan, Haibin Ling

The average video length of LaSOT is around 2, 500 frames, where each video contains various challenge factors that exist in real world video footage, such as the targets disappearing and re-appearing.

Object Tracking Visual Tracking +1

Detection and Tracking Meet Drones Challenge

2 code implementations16 Jan 2020 Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, QinGhua Hu, Haibin Ling

We provide a large-scale drone captured dataset, VisDrone, which includes four tracks, i. e., (1) image object detection, (2) video object detection, (3) single object tracking, and (4) multi-object tracking.

Multi-Object Tracking Object +2

Semantic-Aware Label Placement for Augmented Reality in Street View

no code implementations15 Dec 2019 Jianqing Jia, Semir Elezovikj, Heng Fan, Shuojin Yang, Jing Liu, Wei Guo, Chiu C. Tan, Haibin Ling

Our solution encodes the constraints for placing labels in an optimization problem to obtain the final label layout, and the labels will be placed in appropriate positions to reduce the chances of overlaying important real-world objects in street view AR scenarios.

TracKlinic: Diagnosis of Challenge Factors in Visual Tracking

no code implementations18 Nov 2019 Heng Fan, Fan Yang, Peng Chu, Lin Yuan, Haibin Ling

For the analysis component, given the tracking results on all sequences, it investigates the behavior of the tracker under each individual factor and generates the report automatically.

Visual Tracking

ClsGAN: Selective Attribute Editing Model Based On Classification Adversarial Network

1 code implementation25 Oct 2019 Liu Ying, Heng Fan, Fuchuan Ni, Jinhai Xiang

In addition, to further improve the transfer accuracy of generated images, an attribute adversarial classifier (referred to as Atta-cls) is introduced to guide the generator from the perspective of attribute through learning the defects of attribute transfer images.

Attribute Classification +3

Clustered Object Detection in Aerial Images

1 code implementation ICCV 2019 Fan Yang, Heng Fan, Peng Chu, Erik Blasch, Haibin Ling

The key components in ClusDet include a cluster proposal sub-network (CPNet), a scale estimation sub-network (ScaleNet), and a dedicated detection network (DetecNet).

Clustering Object +2

Online Multi-Object Tracking with Instance-Aware Tracker and Dynamic Model Refreshment

no code implementations21 Feb 2019 Peng Chu, Heng Fan, Chiu C. Tan, Haibin Ling

To address this issue, in this paper we propose an instance-aware tracker to integrate SOT techniques for MOT by encoding awareness both within and between target models.

Multi-Object Tracking Online Multi-Object Tracking

Scene Parsing via Dense Recurrent Neural Networks with Attentional Selection

no code implementations9 Nov 2018 Heng Fan, Peng Chu, Longin Jan Latecki, Haibin Ling

Recurrent neural networks (RNNs) have shown the ability to improve scene parsing through capturing long-range dependencies among image units.

Scene Labeling

Robust and Efficient Graph Correspondence Transfer for Person Re-identification

no code implementations15 May 2018 Qin Zhou, Heng Fan, Hua Yang, Hang Su, Shibao Zheng, Shuang Wu, Haibin Ling

To address this problem, in this paper, we present a robust and efficient graph correspondence transfer (REGCT) approach for explicit spatial alignment in Re-ID.

Graph Matching Person Re-Identification

Weighted Bilinear Coding over Salient Body Parts for Person Re-identification

no code implementations22 Mar 2018 Zhigang Chang, Qin Zhou, Heng Fan, Hang Su, Hua Yang, Shibao Zheng, Haibin Ling

Meanwhile, a weighting scheme is applied on the bilinear coding to adaptively adjust the weights of local features at different locations based on their importance in recognition, further improving the discriminability of feature aggregation.

Person Re-Identification

Parallel Tracking and Verifying

no code implementations30 Jan 2018 Heng Fan, Haibin Ling

Being intensively studied, visual object tracking has witnessed great advances in either speed (e. g., with correlation filters) or accuracy (e. g., with deep features).

Visual Object Tracking

Dense Recurrent Neural Networks for Scene Labeling

no code implementations21 Jan 2018 Heng Fan, Haibin Ling

Recently recurrent neural networks (RNNs) have demonstrated the ability to improve scene labeling through capturing long-range dependencies among image units.

Scene Labeling

Scalable Quantum Tomography with Fidelity Estimation

1 code implementation8 Dec 2017 Jun Wang, Zhao-Yu Han, Song-Bo Wang, Zeyang Li, Liang-Zhu Mu, Heng Fan, Lei Wang

We propose a quantum tomography scheme for pure qudit systems which adopts random base measurements and generative learning methods, along with a built-in fidelity estimation approach to assess the reliability of the tomographic states.

Quantum Physics

Unsupervised Generative Modeling Using Matrix Product States

1 code implementation6 Sep 2017 Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang

Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence.

BIG-bench Machine Learning

Parallel Tracking and Verifying: A Framework for Real-Time and High Accuracy Visual Tracking

no code implementations ICCV 2017 Heng Fan, Haibin Ling

In this paper we study the problem from a new perspective and present a novel parallel tracking and verifying (PTAV) framework, by taking advantage of the ubiquity of multi-thread techniques and borrowing from the success of parallel tracking and mapping in visual SLAM.

Visual Tracking

SANet: Structure-Aware Network for Visual Tracking

no code implementations21 Nov 2016 Heng Fan, Haibin Ling

Convolutional neural network (CNN) has drawn increasing interest in visual tracking owing to its powerfulness in feature extraction.

General Classification Object +1

Multi-level Contextual RNNs with Attention Model for Scene Labeling

no code implementations8 Jul 2016 Heng Fan, Xue Mei, Danil Prokhorov, Haibin Ling

Context in image is crucial for scene labeling while existing methods only exploit local context generated from a small surrounding area of an image patch or a pixel, by contrast long-range and global contextual information is ignored.

Scene Labeling

