no code implementations • 25 Dec 2024 • Libo Zhang, Zhaoning Zhang, Baizhou Xu, Songzhu Mei, Dongsheng Li
Therefore, we propose Dovetail, an approach that deploys the draft model on the GPU to generate draft tokens while allowing the target model to perform parallel verification on the CPU, thereby improving the utilization of all available hardware resources and occupying less inter-device communication bandwidth.
1 code implementation • 3 Dec 2024 • Yifan Jiao, Yunhao Li, Junhua Ding, Qing Yang, Song Fu, Heng Fan, Libo Zhang
In this paper, we present a novel benchmark, GSOT3D, that aims at facilitating development of generic 3D single object tracking (SOT) in the wild.
no code implementations • 6 Nov 2024 • Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu
1) we first design a quantitative metric system based on best-in-class LVLM (Large Vision Language Model), i. e., GPT-4o in our case, to evaluate the generation quality from 3 perspectives, namely, instruction following, detail preserving, and generation quality.
1 code implementation • 13 Sep 2024 • Yaojie Shen, Xinyao Wang, Yulei Niu, Ying Zhou, Lexin Tang, Libo Zhang, Fan Chen, Longyin Wen
Despite its success, our study shows that the length exploitation issue present in PO is even more severe in Iterative Preference Optimization (IPO) due to the iterative nature of the process.
1 code implementation • 10 Aug 2024 • Libo Zhang, Yuxuan Han, Wenbin Lin, Jingwang Ling, Feng Xu
We present PRTGaussian, a realtime relightable novel-view synthesis method made possible by combining 3D Gaussians and Precomputed Radiance Transfer (PRT).
1 code implementation • 15 Jun 2024 • Libo Zhang, Yue Ning
Specifically, we develop multiple prompt templates to frame the object prediction (OP) task as a standard question-answering (QA) task, suitable for instruction fine-tuning with an encoder-decoder LLM.
1 code implementation • 12 Jun 2024 • Yunhao Li, Xiaoqiong Liu, Luke Liu, Heng Fan, Libo Zhang
In this paper, we address this challenge by introducing Language-Guided MOT, a unified task framework, along with a corresponding large-scale benchmark, termed LaMOT, which encompasses diverse scenarios and language descriptions.
no code implementations • 24 Mar 2024 • Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, YuFei Wang, Tiejian Luo, Sijie Zhu
Each video in our dataset is rendered by various image/video materials with a single editing component, which supports atomic visual understanding of different editing components.
1 code implementation • 8 Mar 2024 • Yunhao Li, Qin Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang
Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e., ''where'') in videos.
1 code implementation • 6 Mar 2024 • Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang
The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.
no code implementations • 18 Jan 2024 • Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao
Segmentation-based scene text detection algorithms can handle arbitrary shape scene texts and have strong robustness and adaptability, so it has attracted wide attention.
no code implementations • 18 Jan 2024 • Jinzhi Zheng, Ruyi Ji, Libo Zhang, Yanjun Wu, Chen Zhao
However, the guidance of visual cues is ignored in the process of semantic mining, which limits the performance of the algorithm in recognizing irregular scene text.
no code implementations • 18 Jan 2024 • Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao
Arbitrary shape scene text detection is of great importance in scene understanding tasks.
1 code implementation • 16 Jan 2024 • Xin Ming, Jiawei Li, Jingwang Ling, Libo Zhang, Feng Xu
Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes.
2 code implementations • CVPR 2024 • Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang
The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.
Ranked #1 on
Spatio-Temporal Video Grounding
on HC-STVG1
no code implementations • CVPR 2024 • Jinzhi Zheng, Heng Fan, Libo Zhang
To address these issues this paper proposes a simple and effective scene text detection method the Kernel Adaptive Convolution which is designed with a Kernel Adaptive Convolution Module for scene text detection via predicting the distance map.
1 code implementation • 26 Nov 2023 • Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang
Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.
1 code implementation • 27 Sep 2023 • Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan
Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.
1 code implementation • ICCV 2023 • Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang
Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.
Ranked #8 on
Video Captioning
on VATEX
no code implementations • 18 Sep 2023 • Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo
Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.
1 code implementation • ICCV 2023 • Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang
In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.
no code implementations • 15 Aug 2023 • Yunhao Li, Zhen Xiao, Lin Yang, Dan Meng, Xin Zhou, Heng Fan, Libo Zhang
To the best of our knowledge, AttMOT is the first MOT dataset with semantic attributes.
1 code implementation • 17 Jul 2023 • Yongsheng Yu, Heng Fan, Libo Zhang
Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.
Ranked #1 on
Video Inpainting
on DAVIS
no code implementations • 19 May 2023 • Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang
In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.
1 code implementation • ICCV 2023 • Bohai Gu, Heng Fan, Libo Zhang
Current arbitrary style transfer models are limited to either image or video domains.
no code implementations • CVPR 2023 • Xin Gu, Guang Chen, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
Meanwhile, the internal stream is designed to exploit the multi-modality information in videos (e. g., the appearance of video frames, speech transcripts, and video captions) to ensure the quality of caption results.
Ranked #7 on
Video Captioning
on YouCook2
no code implementations • ICCV 2023 • Xinran Liu, Xiaoqiong Liu, Ziruo Yi, Xin Zhou, Thanh Le, Libo Zhang, Yan Huang, Qing Yang, Heng Fan
In addition, we further derive a variant named PlanarTrack$_{\mathbf{BB}}$ for generic object tracking from PlanarTrack.
no code implementations • 13 Mar 2023 • Lutao Jiang, Ruyi Ji, Libo Zhang
We apply SDF for higher quality representation of 3D object in space and design a new SDF neural renderer, which has higher efficiency and higher accuracy.
no code implementations • 16 Feb 2023 • Libo Zhang, Yang Chen, Toru Takisaka, Bakh Khoussainov, Michael Witbrock, Jiamou Liu
In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness.
1 code implementation • 1 Jan 2023 • Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling
To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
3 code implementations • 19 Nov 2022 • Libo Zhang, Lutao Jiang, Ruyi Ji, Heng Fan
Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion.
no code implementations • 25 Aug 2022 • Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo
Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.
1 code implementation • 25 Aug 2022 • Yongsheng Yu, Dawei Du, Libo Zhang, Tiejian Luo
Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.
1 code implementation • 27 Jul 2022 • Yaojie Shen, Libo Zhang, Kai Xu, Xiaojie Jin
First we learn the embedding of video transitions through a video transition classification task.
1 code implementation • 7 Jul 2022 • Xin Gu, Hanhua Ye, Guang Chen, YuFei Wang, Libo Zhang, Longyin Wen
This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition.
no code implementations • 7 Jun 2022 • CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.
no code implementations • 30 Apr 2022 • Libo Zhang, Junyuan Gao, Zhen Xiao, Heng Fan
Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is crucial for animal motion and behavior analysis and has many crucial applications such as biology, ecology and animal conservation.
1 code implementation • CVPR 2022 • Wenzhang Zhou, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu
Domain adaptive object detection is challenging due to distinctive data distribution between source domain and target domain.
no code implementations • CVPR 2022 • CongCong Li, Xinyao Wang, Longyin Wen, Dexiang Hong, Tiejian Luo, Libo Zhang
Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
no code implementations • 13 Feb 2022 • Yang Chen, Libo Zhang, Jiamou Liu, Shuyue Hu
This limitation invalidates existing IRL methods on MFGs with non-cooperative environments.
1 code implementation • CVPR 2022 • Dan Liu, Libo Zhang, Yanjun Wu
The dataset and experimental results presented in this paper are expected to boost the research of long-distance gesture recognition.
1 code implementation • ICCV 2021 • Renshuai Tao, Yanlu Wei, Xiangjian Jiang, Hainan Li, Haotong Qin, Jiakai Wang, Yuqing Ma, Libo Zhang, Xianglong Liu
In this work, we first present a High-quality X-ray (HiXray) security inspection image dataset, which contains 102, 928 common prohibited items of 8 categories.
1 code implementation • ICCV 2021 • Boying Wang, Libo Zhang, Longyin Wen, Xianglong Liu, Yanjun Wu
Towards real-world prohibited item detection, we collect a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items.
1 code implementation • 1 Jul 2021 • Dexiang Hong, CongCong Li, Longyin Wen, Xinyao Wang, Libo Zhang
In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module.
Ranked #1 on
Boundary Detection
on Kinetics-400
no code implementations • 29 Apr 2021 • Yang Chen, Libo Zhang, Jiamou Liu, Michael Witbrock
However, existing IRL methods for MFGs are powerless to reason about uncertainties in demonstrated behaviours of individual agents.
2 code implementations • 18 Apr 2020 • Yanlu Wei, Renshuai Tao, Zhangjie Wu, Yuqing Ma, Libo Zhang, Xianglong Liu
Furthermore, to deal with the occlusion in X-ray images detection, we propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.
no code implementations • ECCV 2020 • Cong-Cong Li, Dawei Du, Libo Zhang, Longyin Wen, Tiejian Luo, Yanjun Wu, Pengfei Zhu
Specifically, we first build the spatial pyramid representation to capture context information of objects at different scales.
1 code implementation • 18 Mar 2020 • Yuan-Qiang Cai, Longyin Wen, Libo Zhang, Dawei Du, Weiqiang Wang
In this paper, we propose a new task, ie, simultaneously object localization and counting, abbreviated as Locount, which requires algorithms to localize groups of objects of interest with the number of instances.
no code implementations • 13 Jan 2020 • Dan Liu, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu
The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions.
no code implementations • ECCV 2020 • Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, Siwei Lyu
In this paper, we design a novel semantic neural tree for human parsing, which uses a tree architecture to encode physiological structure of human body, and designs a coarse to fine process in a cascade manner to generate accurate results.
no code implementations • 11 Dec 2019 • Wenzhang Zhou, Longyin Wen, Libo Zhang, Dawei Du, Tiejian Luo, Yanjun Wu
To reduce the impact of manually designed anchor boxes to adapt to different target motion patterns, we design the localization branch, which aims to coarsely localize the target to help the regression branch to generate accurate results.
2 code implementations • CVPR 2020 • Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, Feiyue Huang
Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree.
Ranked #37 on
Fine-Grained Image Classification
on Stanford Cars
Fine-Grained Image Classification
Fine-Grained Visual Categorization
no code implementations • 25 Sep 2019 • Yuan-Qiang Cai, Dawei Du, Libo Zhang, Longyin Wen, Weiqiang Wang, Yanjun Wu, Siwei Lyu
Object detection and counting are related but challenging problems, especially for drone based scenes with small objects and cluttered background.
no code implementations • 10 Sep 2019 • Chamin Hewa Koneputugodage, Rhys Healy, Sean Lamont, Ian Mallett, Matt Brown, Matt Walters, Ushini Attanayake, Libo Zhang, Roger T. Dean, Alexander Hunter, Charles Gretton, Christian Walder
We address the problem of combining sequence models of symbolic music with user defined constraints.
no code implementations • 11 Jun 2019 • Dan Liu, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Feiyue Huang, Siwei Lyu
Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i. e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection.
no code implementations • 10 Apr 2019 • Cong-Cong Li, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Qi Tian, Longyin Wen, Siwei Lyu
In this paper, we propose a new data priming method to solve the domain adaptation problem.