Search Results for author: Jiawen Zhu

Found 24 papers, 18 papers with code

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

1 code implementation23 Jan 2025 Haomiao Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu

Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks.

Scheduling Video Understanding

3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding

1 code implementation14 Jan 2025 Haomiao Xiong, Yunzhi Zhuge, Jiawen Zhu, Lu Zhang, Huchuan Lu

Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representations.

Language Modeling Language Modelling +3

SUTrack: Towards Simple and Unified Single Object Tracking

1 code implementation26 Dec 2024 Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu, Dong Wang, Huchuan Lu

It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session.

Object Tracking

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

1 code implementation14 Dec 2024 Haoyu Jiang, Zhi-Qi Cheng, Gabriel Moreira, Jiawen Zhu, Jingdong Sun, Bukun Ren, Jun-Yan He, Qi Dai, Xian-Sheng Hua

Second, Target Prompt Generation creates dynamic prompts by attending to masked source prompts, enabling seamless adaptation to unseen domains and classes.

Retrieval

GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning

1 code implementation20 Oct 2024 Haiwen Diao, Ying Zhang, Shang Gao, Jiawen Zhu, Long Chen, Huchuan Lu

Cross-modal metric learning is a prominent research topic that bridges the semantic heterogeneity between vision and language.

Image Retrieval Image-text Retrieval +4

Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection

1 code implementation14 Oct 2024 Jiawen Zhu, Yew-Soon Ong, Chunhua Shen, Guansong Pang

To this end, we introduce a novel compound abnormality prompting module in FAPrompt to learn a set of complementary, decomposed abnormality prompts, where each abnormality prompt is formed by a compound of shared normal tokens and a few learnable abnormal tokens.

Anomaly Detection zero-shot anomaly detection

MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

no code implementations15 Aug 2024 Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture.

Mamba Rgb-T Tracking

GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images

1 code implementation13 May 2024 Xinying Wang, Zhixiong Huang, Sifan Zhang, Jiawen Zhu, Paolo Gamba, Lin Feng

Mainstream approaches to spectral reconstruction (SR) primarily focus on designing Convolution- and Transformer-based architectures.

Computational Efficiency Mamba +1

Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts

2 code implementations CVPR 2024 Jiawen Zhu, Guansong Pang

In this work, we propose to train a GAD model with few-shot normal images as sample prompts for AD on diverse datasets on the fly.

Anomaly Detection

EmoWear: Exploring Emotional Teasers for Voice Message Interaction on Smartwatches

no code implementations11 Feb 2024 Pengcheng An, Jiawen Zhu, Zibo Zhang, Yifei Yin, Qingyuan Ma, Che Yan, Linghao Du, Jian Zhao

We introduce EmoWear, a smartwatch voice messaging system enabling users to apply 30 animation teasers on message bubbles to reflect emotions.

Retrieval

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

1 code implementation CVPR 2024 Jiawen Zhu, Choubo Ding, Yu Tian, Guansong Pang

Extensive experiments on nine real-world anomaly detection datasets show that AHL can 1) substantially enhance different state-of-the-art OSAD models in detecting seen and unseen anomalies, and 2) effectively generalize to unseen anomalies in new domains.

Supervised Anomaly Detection

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs

1 code implementation19 Sep 2023 Jiawen Zhu, Huayi Tang, Zhi-Qi Cheng, Jun-Yan He, Bin Luo, Shihao Qiu, Shengming Li, Huchuan Lu

To address this, we propose a novel architecture called Darkness Clue-Prompted Tracking (DCPT) that achieves robust UAV tracking at night by efficiently learning to generate darkness clue prompts.

Tracking Anything in High Quality

1 code implementation26 Jul 2023 Jiawen Zhu, Zhenyu Chen, Zeqi Hao, Shijie Chang, Lu Zhang, Dong Wang, Huchuan Lu, Bin Luo, Jun-Yan He, Jin-Peng Lan, Hanyuan Chen, Chenyang Li

To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.

Object Semantic Segmentation +3

A Causal Inference Framework for Leveraging External Controls in Hybrid Trials

no code implementations15 May 2023 Michael Valancius, Herb Pang, Jiawen Zhu, Stephen R Cole, Michele Jonsson Funk, Michael R Kosorok

We consider the challenges associated with causal inference in settings where data from a randomized trial is augmented with control data from an external source to improve efficiency in estimating the average treatment effect (ATE).

Causal Inference

Anomaly Detection under Distribution Shift

1 code implementation ICCV 2023 Tri Cao, Jiawen Zhu, Guansong Pang

Anomaly detection (AD) is a crucial machine learning task that aims to learn patterns from a set of normal training samples to identify abnormal samples in test data.

Anomaly Detection

Visual Prompt Multi-Modal Tracking

1 code implementation CVPR 2023 Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu

To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters.

Object Tracking Rgb-T Tracking

SRRT: Exploring Search Region Regulation for Visual Object Tracking

no code implementations10 Jul 2022 Jiawen Zhu, Xin Chen, Pengyu Zhang, Xinying Wang, Dong Wang, Wenda Zhao, Huchuan Lu

The dominant trackers generate a fixed-size rectangular region based on the previous prediction or initial bounding box as the model input, i. e., search region.

Visual Object Tracking

High-Performance Transformer Tracking

1 code implementation25 Mar 2022 Xin Chen, Bin Yan, Jiawen Zhu, Huchuan Lu, Xiang Ruan, Dong Wang

First, we present a transformer tracking (named TransT) method based on the Siamese-like feature extraction backbone, the designed attention-based fusion mechanism, and the classification and regression head.

Vocal Bursts Intensity Prediction

AutoChart: A Dataset for Chart-to-Text Generation Task

no code implementations RANLP 2021 Jiawen Zhu, Jinye Ran, Roy Ka-Wei Lee, Kenny Choo, Zhi Li

The analytical description of charts is an exciting and important research area with many applications in academia and industry.

Text Generation

Transformer Tracking

1 code implementation CVPR 2021 Xin Chen, Bin Yan, Jiawen Zhu, Dong Wang, Xiaoyun Yang, Huchuan Lu

The correlation operation is a simple fusion manner to consider the similarity between the template and the search region.

Video Object Tracking Visual Object Tracking +1

Cannot find the paper you are looking for? You can Submit a new open access paper.