no code implementations • 6 Nov 2024 • Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu
1) we first design a quantitative metric system based on best-in-class LVLM (Large Vision Language Model), i. e., GPT-4o in our case, to evaluate the generation quality from 3 perspectives, namely, instruction following, detail preserving, and generation quality.
no code implementations • 24 Mar 2024 • Xin Gu, Libo Zhang, Fan Chen, Longyin Wen, YuFei Wang, Tiejian Luo, Sijie Zhu
Each video in our dataset is rendered by various image/video materials with a single editing component, which supports atomic visual understanding of different editing components.
2 code implementations • CVPR 2024 • Xin Gu, Heng Fan, Yan Huang, Tiejian Luo, Libo Zhang
The key of CG-STVG lies in two specially designed modules, including instance context generation (ICG), which focuses on discovering visual context information (in both appearance and motion) of the instance, and instance context refinement (ICR), which aims to improve the instance context from ICG by eliminating irrelevant or even harmful information from the context.
Ranked #1 on
Spatio-Temporal Video Grounding
on HC-STVG1
1 code implementation • 27 Sep 2023 • Libo Zhang, Xin Gu, CongCong Li, Tiejian Luo, Heng Fan
Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information with bidirectional information flow.
no code implementations • 18 Sep 2023 • Hao Wang, Libo Zhang, Heng Fan, Tiejian Luo
Meanwhile, we propose a cross-granularity attention module to align the interactions modeled by the three branches of transformers, then the three branches of transformers can support each other to exploit the most discriminative semantic information of different granularities for accurate predictions of captions.
1 code implementation • ICCV 2023 • Wenzhang Zhou, Heng Fan, Tiejian Luo, Libo Zhang
In this work, drawing inspiration from the concept of stability from the control theory that a robust system requires to remain consistent both externally and internally regardless of disturbances, we propose a novel framework that achieves unsupervised domain adaptive detection through stability analysis.
no code implementations • 19 May 2023 • Yongsheng Yu, Hao Wang, Tiejian Luo, Heng Fan, Libo Zhang
In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e. g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i. e., arbitrary multi-modality) for image completion.
no code implementations • CVPR 2023 • Xin Gu, Guang Chen, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
Meanwhile, the internal stream is designed to exploit the multi-modality information in videos (e. g., the appearance of video frames, speech transcripts, and video captions) to ensure the quality of caption results.
Ranked #7 on
Video Captioning
on YouCook2
1 code implementation • 1 Jan 2023 • Libo Zhang, Wenzhang Zhou, Heng Fan, Tiejian Luo, Haibin Ling
To reduce discrepancy in feature distributions between two domains, recent approaches achieve domain adaption through feature alignment in different granularities via adversarial learning.
no code implementations • 25 Aug 2022 • Yongsheng Yu, Libo Zhang, Heng Fan, Tiejian Luo
Addressing this problem, in this paper, we devise a novel GAN inversion model for image inpainting, dubbed InvertFill, mainly consisting of an encoder with a pre-modulation module and a GAN generator with F&W+ latent space.
1 code implementation • 25 Aug 2022 • Yongsheng Yu, Dawei Du, Libo Zhang, Tiejian Luo
Image inpainting is an ill-posed problem to recover missing or damaged image content based on incomplete images with masks.
no code implementations • 7 Jun 2022 • CongCong Li, Xinyao Wang, Dexiang Hong, YuFei Wang, Libo Zhang, Tiejian Luo, Longyin Wen
To capture temporal context information of each frame, we design the structure context transformer (SC-Transformer) by re-partitioning input frame sequence.
1 code implementation • CVPR 2022 • Wenzhang Zhou, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu
Domain adaptive object detection is challenging due to distinctive data distribution between source domain and target domain.
no code implementations • CVPR 2022 • CongCong Li, Xinyao Wang, Longyin Wen, Dexiang Hong, Tiejian Luo, Libo Zhang
Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks.
no code implementations • 3 Sep 2020 • Zhaoqing Peng, Junqi Jin, Lan Luo, Yaodong Yang, Rui Luo, Jun Wang, Wei-Nan Zhang, Haiyang Xu, Miao Xu, Chuan Yu, Tiejian Luo, Han Li, Jian Xu, Kun Gai
To drive purchase in online advertising, it is of the advertiser's great interest to optimize the sequential advertising strategy whose performance and interpretability are both important.
no code implementations • ECCV 2020 • Cong-Cong Li, Dawei Du, Libo Zhang, Longyin Wen, Tiejian Luo, Yanjun Wu, Pengfei Zhu
Specifically, we first build the spatial pyramid representation to capture context information of objects at different scales.
no code implementations • 13 Jan 2020 • Dan Liu, Libo Zhang, Tiejian Luo, Lili Tao, Yanjun Wu
The lack of interpretability of existing CNN-based hand detection methods makes it difficult to understand the rationale behind their predictions.
no code implementations • 11 Dec 2019 • Wenzhang Zhou, Longyin Wen, Libo Zhang, Dawei Du, Tiejian Luo, Yanjun Wu
To reduce the impact of manually designed anchor boxes to adapt to different target motion patterns, we design the localization branch, which aims to coarsely localize the target to help the regression branch to generate accurate results.
no code implementations • 11 Jun 2019 • Dan Liu, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Feiyue Huang, Siwei Lyu
Existing hand detection methods usually follow the pipeline of multiple stages with high computation cost, i. e., feature extraction, region proposal, bounding box regression, and additional layers for rotated region detection.
no code implementations • 10 Apr 2019 • Cong-Cong Li, Dawei Du, Libo Zhang, Tiejian Luo, Yanjun Wu, Qi Tian, Longyin Wen, Siwei Lyu
In this paper, we propose a new data priming method to solve the domain adaptation problem.