Search Results for author: Zhipeng Zhang

Found 25 papers, 13 papers with code

Self-Explainable Affordance Learning with Embodied Caption

no code implementations8 Apr 2024 Zhipeng Zhang, Zhimin Wei, Guolei Sun, Peng Wang, Luc van Gool

In the field of visual affordance learning, previous methods mainly used abundant images or videos that delineate human behavior patterns to identify action possibility regions for object manipulation, with a variety of applications in robotic tasks.

Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance

no code implementations8 Mar 2024 Liting Lin, Heng Fan, Zhipeng Zhang, YaoWei Wang, Yong Xu, Haibin Ling

The shared embeddings, which describe the absolute coordinates of multi-resolution images (namely, the template and search images), are inherited from the pre-trained backbones.

Inductive Bias Position +1

VastTrack: Vast Category Visual Object Tracking

1 code implementation6 Mar 2024 Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.

Object Visual Object Tracking +1

Image Fusion via Vision-Language Model

no code implementations3 Feb 2024 Zixiang Zhao, Lilun Deng, Haowen Bai, Yukun Cui, Zhipeng Zhang, Yulun Zhang, Haotong Qin, Dongdong Chen, Jiangshe Zhang, Peng Wang, Luc van Gool

Therefore, we introduce a novel fusion paradigm named image Fusion via vIsion-Language Model (FILM), for the first time, utilizing explicit textual information in different source images to guide image fusion.

Language Modelling

Orion-14B: Open-source Multilingual Large Language Models

1 code implementation20 Jan 2024 Du Chen, Yi Huang, Xiaopu Li, Yongqiang Li, Yongqiang Liu, Haihui Pan, Leichao Xu, Dacheng Zhang, Zhipeng Zhang, Kun Han

In this study, we introduce Orion-14B, a collection of multilingual large language models with 14 billion parameters.

Scheduling

Language-Enhanced Session-Based Recommendation with Decoupled Contrastive Learning

no code implementations20 Jul 2023 Zhipeng Zhang, Piao Tong, Yingwei Ma, Qiao Liu, Xujiang Liu, Xu Luo

Furthermore, we introduce a novel Decoupled Contrastive Learning method to enhance the effectiveness of the language representation.

Contrastive Learning Retrieval +1

Divert More Attention to Vision-Language Object Tracking

1 code implementation19 Jul 2023 Mingzhe Guo, Zhipeng Zhang, Liping Jing, Haibin Ling, Heng Fan

To thoroughly evidence the effectiveness of our method, we integrate the proposed framework on three tracking methods with different designs, i. e., the CNN-based SiamCAR, the Transformer-based OSTrack, and the hybrid structure TransT.

Attribute Object +1

AUNet: Learning Relations Between Action Units for Face Forgery Detection

no code implementations CVPR 2023 Weiming Bai, Yufan Liu, Zhipeng Zhang, Bing Li, Weiming Hu

Observing that face manipulation may alter the relation between different facial action units (AU), we propose the Action Units Relation Learning framework to improve the generality of forgery detection.

DeepFake Detection Face Swapping +1

One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning

no code implementations31 Jul 2022 Zhipeng Zhang, Zhimin Wei, Zhongzhen Huang, Rui Niu, Peng Wang

However, one unsolved issue of these models is that the number of reasoning steps needs to be pre-defined and fixed before inference, ignoring the varying complexity of expressions.

Referring Expression Referring Expression Comprehension +2

Divert More Attention to Vision-Language Tracking

1 code implementation3 Jul 2022 Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing

By revealing the potential of VL representation, we expect the community to divert more attention to VL tracking and hope to open more possibilities for future tracking beyond Transformer.

Object Tracking

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

no code implementations7 May 2022 Zhipeng Zhang, Xinglin Hou, Kai Niu, Zhongzhen Huang, Tiezheng Ge, Yuning Jiang, Qi Wu, Peng Wang

Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation.

Text Generation Video Captioning

Learning Target-aware Representation for Visual Tracking via Informative Interactions

no code implementations7 Jan 2022 Mingzhe Guo, Zhipeng Zhang, Heng Fan, Liping Jing, Yilin Lyu, Bing Li, Weiming Hu

The proposed GIM module and InBN mechanism are general and applicable to different backbone types including CNN and Transformer for improvements, as evidenced by our extensive experiments on multiple benchmarks.

Representation Learning Visual Tracking

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

2 code implementations11 Aug 2021 Xiao Wang, Jianing Li, Lin Zhu, Zhipeng Zhang, Zhe Chen, Xin Li, YaoWei Wang, Yonghong Tian, Feng Wu

Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency.

Object Tracking

Learn to Match: Automatic Matching Network Design for Visual Tracking

1 code implementation ICCV 2021 Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, Weiming Hu

Siamese tracking has achieved groundbreaking performance in recent years, where the essence is the efficient matching operator cross-correlation and its variants.

Visual Tracking

Reality Transform Adversarial Generators for Image Splicing Forgery Detection and Localization

no code implementations ICCV 2021 Xiuli Bi, Zhipeng Zhang, Bin Xiao

For detecting the tampered regions, a forgery localization generator GM is proposed based on a multi-decoder-single-task strategy.

Style Transfer

Rethinking the competition between detection and ReID in Multi-Object Tracking

4 code implementations23 Oct 2020 Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, Weiming Hu

However, the inherent differences and relations between detection and re-identification (ReID) are unconsciously overlooked because of treating them as two isolated tasks in the one-shot tracking paradigm.

 Ranked #1 on Multi-Object Tracking on HiEve (using extra training data)

Multi-Object Tracking

Towards Accurate Pixel-wise Object Tracking by Attention Retrieval

1 code implementation6 Aug 2020 Zhipeng Zhang, Bing Li, Weiming Hu, Houwen Peng

We first build a look-up-table (LUT) with the ground-truth mask in the starting frame, and then retrieves the LUT to obtain an attention map for spatial constraints.

Object Object Tracking +2

Cannot find the paper you are looking for? You can Submit a new open access paper.