Search Results for author: Zhipeng Zhang

Found 25 papers, 13 papers with code

Self-Explainable Affordance Learning with Embodied Caption

no code implementations • 8 Apr 2024 • Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc van Gool

In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks.

Paper
Add Code

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

no code implementations • 8 Mar 2024 • Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.

Inductive Bias Position +1

Paper
Add Code

VastTrack: Vast Category Visual Object Tracking

1 code implementation • 6 Mar 2024 • Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.

Object Visual Object Tracking +1

Paper
Code

Image Fusion via Vision-Language Model

no code implementations • 3 Feb 2024 • Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool

Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information in different source images to guide image fusion.

Language Modelling

Paper
Add Code

Orion-14B: Open-source Multilingual Large Language Models

1 code implementation • 20 Jan 2024 • Du Chen, Yi Huang, Xiaopu Li, Yongqiang Li, Yongqiang Liu, Haihui Pan, Leichao Xu, Dacheng Zhang, Zhipeng Zhang, Kun Han

In this study, we introduce Orion-14B, a collection of multilingual large language models with 14 billion parameters.

Scheduling

751

Paper
Code

Language-Enhanced Session-Based Recommendation with Decoupled Contrastive Learning

no code implementations • 20 Jul 2023 • Zhipeng Zhang, Piao Tong, Yingwei Ma, Qiao Liu, Xujiang Liu, Xu Luo

Furthermore, we introduce a novel Decoupled Contrastive Learning method to enhance the effectiveness of the language representation.

Contrastive Learning Retrieval +1

Paper
Add Code

Divert More Attention to Vision-Language Object Tracking

1 code implementation • 19 Jul 2023 • Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan

To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i. e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT.

Attribute Object +1

464

Paper
Code

Augment and Criticize: Exploring Informative Samples for Semi-Supervised Monocular 3D Object Detection

no code implementations • 20 Mar 2023 • Zhenyu Li, Zhipeng Zhang, Heng Fan, Yuan He, Ke Wang, Xianming Liu, Junjun Jiang

In this paper, we improve the challenging monocular 3D object detection problem with a general semi-supervised framework.

Monocular 3D Object Detection object-detection +1

Paper
Add Code

AUNet: Learning Relations Between Action Units for Face Forgery Detection

no code implementations • CVPR 2023 • Weiming Bai, Yufan Liu, Zhipeng Zhang, Bing Li, Weiming Hu

Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection.

DeepFake Detection Face Swapping +1

Paper
Add Code

Structured Knowledge Distillation Towards Efficient and Compact Multi-View 3D Detection

no code implementations • 14 Nov 2022 • Linfeng Zhang, Yukang Shi, Hung-Shuo Tai, Zhipeng Zhang, Yuan He, Ke Wang, Kaisheng Ma

Detecting 3D objects from multi-view images is a fundamental problem in 3D computer vision.

Knowledge Distillation

Paper
Add Code

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

no code implementations • 31 Jul 2022 • Zhipeng Zhang, Zhimin Wei, Zhongzhen Huang, Rui Niu, Peng Wang

However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions.

Referring Expression Referring Expression Comprehension +2

Paper
Add Code

Divert More Attention to Vision-Language Tracking

1 code implementation • 3 Jul 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing

By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.

Object Tracking

464

Paper
Code

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

no code implementations • 7 May 2022 • Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang

Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation.

Text Generation Video Captioning

Paper
Add Code

Learning Target-aware Representation for Visual Tracking via Informative Interactions

no code implementations • 7 Jan 2022 • Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu

The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.

Representation Learning Visual Tracking

Paper
Add Code

SwinTrack: A Simple and Strong Baseline for Transformer Tracking

1 code implementation • 2 Dec 2021 • Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling

The potential of Transformer in representation learning remains under-explored.

Ranked #9 on Visual Object Tracking on TrackingNet

Representation Learning Visual Object Tracking +1

232

Paper
Code

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

2 code implementations • 11 Aug 2021 • Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, YaoWei Wang, Yonghong Tian, Feng Wu

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency.

Ranked #1 on Object Tracking on VisEvent

Object Tracking

101

Paper
Code

Learn to Match: Automatic Matching Network Design for Visual Tracking

1 code implementation • ICCV 2021 • Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu

Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants.

Visual Tracking

464

Paper
Code

Discovering Distinctive "Semantics" in Super-Resolution Networks

no code implementations • 1 Aug 2021 • Yihao Liu, Anran Liu, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, Chao Dong

We show that a well-trained deep SR network is naturally a good descriptor of degradation information.

Dimensionality Reduction Image Super-Resolution

Paper
Add Code

One More Check: Making "Fake Background" Be Tracked Again

1 code implementation • 19 Apr 2021 • Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Weiming Hu

Eventually, it helps to reload the ``fake background'' and repair the broken tracklets.

Motion Forecasting Multi-Object Tracking +2

464

Paper
Code

Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark

2 code implementations • CVPR 2021 • Xiao Wang, Xiujun Shu, Zhipeng Zhang, Bo Jiang, YaoWei Wang, Yonghong Tian, Feng Wu

We believe this benchmark will greatly boost related researches on natural language guided tracking.

Ranked #3 on Visual Object Tracking on TNL2K (precision metric)

2k Object +3

358

Paper
Code

Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization

no code implementations • ICCV 2021 • Xiuli Bi, Zhipeng Zhang, Bin Xiao

For detecting the tampered regions, a forgery localization generator GM is proposed based on a multi-decoder-single-task strategy.

Style Transfer

Paper
Add Code

Rethinking the competition between detection and ReID in Multi-Object Tracking

4 code implementations • 23 Oct 2020 • Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, Weiming Hu

However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm.

Ranked #1 on Multi-Object Tracking on HiEve (using extra training data)

Multi-Object Tracking

2,350

Paper
Code

Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

1 code implementation • 6 Aug 2020 • Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng

We first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieves the LUT to obtain an attention map for spatial constraints.

Object Object Tracking +2