Search Results for author: Libo Zhang

Found 49 papers, 20 papers with code

Edit3K: Universal Representation Learning for Video Editing Components

no code implementations24 Mar 2024 Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, YuFei Wang, Tiejian Luo, Sijie Zhu

Each video in our dataset is rendered by various image/video materials with a single editing component, which supports atomic visual understanding of different editing components.

Representation Learning Retrieval +1

Beyond MOT: Semantic Multi-Object Tracking

no code implementations8 Mar 2024 Yunhao Li, Hao Wang, Xue Ma, Jiali Yao, Shaohua Dong, Heng Fan, Libo Zhang

Current multi-object tracking (MOT) aims to predict trajectories of targets (i. e.,"where") in videos.

Multi-Object Tracking Object +1

VastTrack: Vast Category Visual Object Tracking

1 code implementation6 Mar 2024 Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

The rich annotations of VastTrack enables development of both the vision-only and the vision-language tracking.

Object Visual Object Tracking +1

Text Region Multiple Information Perception Network for Scene Text Detection

no code implementations18 Jan 2024 Jinzhi Zheng, Libo Zhang, Yanjun Wu, Chen Zhao

Segmentation-based scene text detection algorithms can handle arbitrary shape scene texts and have strong robustness and adaptability, so it has attracted wide attention.

Scene Text Detection Segmentation +1

CMFN: Cross-Modal Fusion Network for Irregular Scene Text Recognition

no code implementations18 Jan 2024 Jinzhi Zheng, Ruyi Ji, Libo Zhang, Yanjun Wu, Chen Zhao

However, the guidance of visual cues is ignored in the process of semantic mining, which limits the performance of the algorithm in recognizing irregular scene text.

Position Scene Text Recognition

High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering

no code implementations16 Jan 2024 Xin Ming, Jiawei Li, Jingwang Ling, Libo Zhang, Feng Xu

Experiments demonstrate that, with the flexible input of single or sparse multi-view videos, we reconstruct personalized high-fidelity blendshapes.

Inverse Rendering

Context-Guided Spatio-Temporal Video Grounding

1 code implementation3 Jan 2024 Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang

The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.

Object Spatio-Temporal Video Grounding +1

Flow-Guided Diffusion for Video Inpainting

1 code implementation26 Nov 2023 Bohai Gu, Yongsheng Yu, Heng Fan, Libo Zhang

Video inpainting has been challenged by complex scenarios like large movements and low-light conditions.

Denoising Image Generation +2

Local Compressed Video Stream Learning for Generic Event Boundary Detection

1 code implementation27 Sep 2023 Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan

Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.

Boundary Detection Generic Event Boundary Detection +1

Accurate and Fast Compressed Video Captioning

1 code implementation ICCV 2023 Yaojie Shen, Xin Gu, Kai Xu, Heng Fan, Longyin Wen, Libo Zhang

Addressing this, we study video captioning from a different perspective in compressed domain, which brings multi-fold advantages over the existing pipeline: 1) Compared to raw images from the decoded video, the compressed video, consisting of I-frames, motion vectors and residuals, is highly distinguishable, which allows us to leverage the entire video for learning without manual sampling through a specialized model design; 2) The captioning model is more efficient in inference as smaller and less redundant information is processed.

Video Captioning

Collaborative Three-Stream Transformers for Video Captioning

no code implementations18 Sep 2023 Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo

Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.

Sentence Video Captioning

Unsupervised Domain Adaptive Detection with Network Stability Analysis

1 code implementation ICCV 2023 Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang

In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.

Domain Adaptation

Deficiency-Aware Masked Transformer for Video Inpainting

1 code implementation17 Jul 2023 Yongsheng Yu, Heng Fan, Libo Zhang

Firstly, we pretrain a image inpainting model DMT_img serve as a prior for distilling the video model DMT_vid, thereby benefiting the hallucination of deficiency cases.

Hallucination Image Inpainting +2

MaGIC: Multi-modality Guided Image Completion

no code implementations19 May 2023 Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang

In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.

Text with Knowledge Graph Augmented Transformer for Video Captioning

no code implementations CVPR 2023 Xin Gu, Guang Chen, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen

Meanwhile, the internal stream is designed to exploit the multi-modality information in videos (e. g., the appearance of video frames, speech transcripts, and video captions) to ensure the quality of caption results.

Video Captioning

SDF-3DGAN: A 3D Object Generative Method Based on Implicit Signed Distance Function

no code implementations13 Mar 2023 Lutao Jiang, Ruyi Ji, Libo Zhang

We apply SDF for higher quality representation of 3D object in space and design a new SDF neural renderer, which has higher efficiency and higher accuracy.

3D-Aware Image Synthesis Object

Learning Density-Based Correlated Equilibria for Markov Games

no code implementations16 Feb 2023 Libo Zhang, Yang Chen, Toru Takisaka, Bakh Khoussainov, Michael Witbrock, Jiamou Liu

In real-world multi-agent systems, in addition to being in an equilibrium, agents' policies are often expected to meet requirements with respect to safety, and fairness.

Fairness

Robust Domain Adaptive Object Detection with Unified Multi-Granularity Alignment

no code implementations1 Jan 2023 Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling

To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.

Domain Adaptation object-detection +1

PIDray: A Large-scale X-ray Benchmark for Real-World Prohibited Item Detection

3 code implementations19 Nov 2022 Libo Zhang, Lutao Jiang, Ruyi Ji, Heng Fan

Automatic security inspection relying on computer vision technology is a challenging task in real-world scenarios due to many factors, such as intra-class variance, class imbalance, and occlusion.

Binary Classification Instance Segmentation +4

High-Fidelity Image Inpainting with GAN Inversion

no code implementations25 Aug 2022 Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo

Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.

Image Inpainting Vocal Bursts Intensity Prediction

Unbiased Multi-Modality Guidance for Image Inpainting

1 code implementation25 Aug 2022 Yongsheng Yu, Dawei Du, Libo Zhang, Tiejian Luo

Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.

Image Inpainting Semantic Segmentation

AutoTransition: Learning to Recommend Video Transition Effects

1 code implementation27 Jul 2022 Yaojie Shen, Libo Zhang, Kai Xu, Xiaojie Jin

First we learn the embedding of video transitions through a video transition classification task.

Retrieval Video Editing

Dual-Stream Transformer for Generic Event Boundary Captioning

1 code implementation7 Jul 2022 Xin Gu, Hanhua Ye, Guang Chen, YuFei Wang, Libo Zhang, Longyin Wen

This paper describes our champion solution for the CVPR2022 Generic Event Boundary Captioning (GEBC) competition.

Boundary Captioning

Structured Context Transformer for Generic Event Boundary Detection

no code implementations7 Jun 2022 CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen

To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.

Boundary Detection Generic Event Boundary Detection

AnimalTrack: A Benchmark for Multi-Animal Tracking in the Wild

no code implementations30 Apr 2022 Libo Zhang, Junyuan Gao, Zhen Xiao, Heng Fan

Multi-animal tracking (MAT), a multi-object tracking (MOT) problem, is crucial for animal motion and behavior analysis and has many crucial applications such as biology, ecology and animal conservation.

Multi-Object Tracking

Multi-Granularity Alignment Domain Adaptation for Object Detection

1 code implementation CVPR 2022 Wenzhang Zhou, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu

Domain adaptive object detection is challenging due to distinctive data distribution between source domain and target domain.

Domain Adaptation Object +2

LD-ConGR: A Large RGB-D Video Dataset for Long-Distance Continuous Gesture Recognition

1 code implementation CVPR 2022 Dan Liu, Libo Zhang, Yanjun Wu

The dataset and experimental results presented in this paper are expected to boost the research of long-distance gesture recognition.

Gesture Recognition Sign Language Recognition

Towards Real-world X-ray Security Inspection: A High-Quality Benchmark and Lateral Inhibition Module for Prohibited Items Detection

1 code implementation ICCV 2021 Renshuai Tao, Yanlu Wei, Xiangjian Jiang, Hainan Li, Haotong Qin, Jiakai Wang, Yuqing Ma, Libo Zhang, Xianglong Liu

In this work, we first present a High-quality X-ray (HiXray) security inspection image dataset, which contains 102, 928 common prohibited items of 8 categories.

Towards Real-World Prohibited Item Detection: A Large-Scale X-ray Benchmark

1 code implementation ICCV 2021 Boying Wang, Libo Zhang, Longyin Wen, Xianglong Liu, Yanjun Wu

Towards real-world prohibited item detection, we collect a large-scale dataset, named as PIDray, which covers various cases in real-world scenarios for prohibited item detection, especially for deliberately hidden items.

Generic Event Boundary Detection Challenge at CVPR 2021 Technical Report: Cascaded Temporal Attention Network (CASTANET)

1 code implementation1 Jul 2021 Dexiang Hong, CongCong Li, Longyin Wen, Xinyao Wang, Libo Zhang

In this work, we design a Cascaded Temporal Attention Network (CASTANET) for GEBD, which is formed by three parts, the backbone network, the temporal attention module, and the classification module.

Boundary Detection Generic Event Boundary Detection

Adversarial Inverse Reinforcement Learning for Mean Field Games

no code implementations29 Apr 2021 Yang Chen, Libo Zhang, Jiamou Liu, Michael Witbrock

However, existing IRL methods for MFGs are powerless to reason about uncertainties in demonstrated behaviours of individual agents.

reinforcement-learning Reinforcement Learning (RL)

Occluded Prohibited Items Detection: an X-ray Security Inspection Benchmark and De-occlusion Attention Module

2 code implementations18 Apr 2020 Yanlu Wei, Renshuai Tao, Zhangjie Wu, Yuqing Ma, Libo Zhang, Xianglong Liu

Furthermore, to deal with the occlusion in X-ray images detection, we propose the De-occlusion Attention Module (DOAM), a plug-and-play module that can be easily inserted into and thus promote most popular detectors.

Autonomous Driving object-detection +2

Rethinking Object Detection in Retail Stores

1 code implementation18 Mar 2020 Yuan-Qiang Cai, Longyin Wen, Libo Zhang, Dawei Du, Weiqiang Wang

In this paper, we propose a new task, ie, simultaneously object localization and counting, abbreviated as Locount, which requires algorithms to localize groups of objects of interest with the number of instances.

Object object-detection +2

Towards Interpretable and Robust Hand Detection via Pixel-wise Prediction

no code implementations13 Jan 2020 Dan Liu, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu

The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions.

Hand Detection

Learning Semantic Neural Tree for Human Parsing

no code implementations ECCV 2020 Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, Siwei Lyu

In this paper, we design a novel semantic neural tree for human parsing, which uses a tree architecture to encode physiological structure of human body, and designs a coarse to fine process in a cascade manner to generate accurate results.

Human Parsing Semantic Segmentation

SiamMan: Siamese Motion-aware Network for Visual Tracking

no code implementations11 Dec 2019 Wenzhang Zhou, Longyin Wen, Libo Zhang, Dawei Du, Tiejian Luo, Yanjun Wu

To reduce the impact of manually designed anchor boxes to adapt to different target motion patterns, we design the localization branch, which aims to coarsely localize the target to help the regression branch to generate accurate results.

General Classification regression +1

Attention Convolutional Binary Neural Tree for Fine-Grained Visual Categorization

2 code implementations CVPR 2020 Ruyi Ji, Longyin Wen, Libo Zhang, Dawei Du, Yanjun Wu, Chen Zhao, Xianglong Liu, Feiyue Huang

Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree.

Fine-Grained Image Classification Fine-Grained Visual Categorization

Guided Attention Network for Object Detection and Counting on Drones

no code implementations25 Sep 2019 Yuan-Qiang Cai, Dawei Du, Libo Zhang, Longyin Wen, Weiqiang Wang, Yanjun Wu, Siwei Lyu

Object detection and counting are related but challenging problems, especially for drone based scenes with small objects and cluttered background.

Object object-detection +1

Scale Invariant Fully Convolutional Network: Detecting Hands Efficiently

no code implementations11 Jun 2019 Dan Liu, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Feiyue Huang, Siwei Lyu

Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i. e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection.

Hand Detection Region Proposal

Cannot find the paper you are looking for? You can Submit a new open access paper.