1 code implementation • 23 Jan 2025 • Haomiao Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu
Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks.
1 code implementation • 14 Jan 2025 • Haomiao Xiong, Yunzhi Zhuge, Jiawen Zhu, Lu Zhang, Huchuan Lu
Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representations.
1 code implementation • 26 Dec 2024 • Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu
It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session.
1 code implementation • 14 Dec 2024 • Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua
Second, Target Prompt Generation creates dynamic prompts by attending to masked source prompts, enabling seamless adaptation to unseen domains and classes.
1 code implementation • 20 Oct 2024 • Haiwen Diao, Ying Zhang, Shang Gao, Jiawen Zhu, Long Chen, Huchuan Lu
Cross-modal metric learning is a prominent research topic that bridges the semantic heterogeneity between vision and language.
1 code implementation • 14 Oct 2024 • Jiawen Zhu, Yew-Soon Ong, Chunhua Shen, Guansong Pang
To this end, we introduce a novel compound abnormality prompting module in FAPrompt to learn a set of complementary, decomposed abnormality prompts, where each abnormality prompt is formed by a compound of shared normal tokens and a few learnable abnormal tokens.
no code implementations • 15 Aug 2024 • Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu
Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture.
Ranked #1 on
Rgb-T Tracking
on GTOT
1 code implementation • 13 May 2024 • Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Paolo Gamba, Lin Feng
Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures.
no code implementations • 26 Mar 2024 • Jiawen Zhu, Xin Chen, Haiwen Diao, Shuai Li, Jun-Yan He, Chenyang Li, Bin Luo, Dong Wang, Huchuan Lu
For instance, DyTrack obtains 64. 9% AUC on LaSOT with a speed of 256 fps.
2 code implementations • CVPR 2024 • Jiawen Zhu, Guansong Pang
In this work, we propose to train a GAD model with few-shot normal images as sample prompts for AD on diverse datasets on the fly.
no code implementations • 11 Feb 2024 • Pengcheng An, Jiawen Zhu, Zibo Zhang, Yifei Yin, Qingyuan Ma, Che Yan, Linghao Du, Jian Zhao
We introduce EmoWear, a smartwatch voice messaging system enabling users to apply 30 animation teasers on message bubbles to reflect emotions.
1 code implementation • 29 Dec 2023 • Jiawen Zhu, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Huchuan Lu, Yifeng Geng, Xuansong Xie
The perception component then generates the tracking results based on the embeddings.
Ranked #5 on
Referring Video Object Segmentation
on ReVOS
1 code implementation • CVPR 2024 • Jiawen Zhu, Choubo Ding, Yu Tian, Guansong Pang
Extensive experiments on nine real-world anomaly detection datasets show that AHL can 1) substantially enhance different state-of-the-art OSAD models in detecting seen and unseen anomalies, and 2) effectively generalize to unseen anomalies in new domains.
1 code implementation • 19 Sep 2023 • Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu
To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.
1 code implementation • 26 Jul 2023 • Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li
To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.
Ranked #5 on
Semi-Supervised Video Object Segmentation
on YouTube-VOS 2019
(using extra training data)
1 code implementation • 4 Jun 2023 • Shijie Chang, Zeqi Hao, Ben Kang, Xiaoqi Zhao, Jiawen Zhu, Zhenyu Chen, Lihe Zhang, Lu Zhang, Huchuan Lu
In this paper, we introduce 3rd place solution for PVUW2023 VSS track.
no code implementations • 15 May 2023 • Michael Valancius, Herb Pang, Jiawen Zhu, Stephen R Cole, Michele Jonsson Funk, Michael R Kosorok
We consider the challenges associated with causal inference in settings where data from a randomized trial is augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE).
1 code implementation • CVPR 2023 • Xin Chen, Ben Kang, Jiawen Zhu, Dong Wang, Houwen Peng, Huchuan Lu
In this paper, we introduce a new sequence-to-sequence learning framework for RGB-based and multi-modal object tracking.
Ranked #1 on
Rgb-T Tracking
on LasHeR
1 code implementation • ICCV 2023 • Tri Cao, Jiawen Zhu, Guansong Pang
Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data.
1 code implementation • CVPR 2023 • Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu
To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters.
Ranked #27 on
Rgb-T Tracking
on LasHeR
no code implementations • 10 Jul 2022 • Jiawen Zhu, Xin Chen, Pengyu Zhang, Xinying Wang, Dong Wang, Wenda Zhao, Huchuan Lu
The dominant trackers generate a fixed-size rectangular region based on the previous prediction or initial bounding box as the model input, i. e., search region.
1 code implementation • 25 Mar 2022 • Xin Chen, Bin Yan, Jiawen Zhu, Huchuan Lu, Xiang Ruan, Dong Wang
First, we present a transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head.
no code implementations • RANLP 2021 • Jiawen Zhu, Jinye Ran, Roy Ka-Wei Lee, Kenny Choo, Zhi Li
The analytical description of charts is an exciting and important research area with many applications in academia and industry.
1 code implementation • CVPR 2021 • Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, Huchuan Lu
The correlation operation is a simple fusion manner to consider the similarity between the template and the search region.
Ranked #5 on
Visual Tracking
on TNL2K