Search Results for author: Taian Guo

Found 6 papers, 3 papers with code

Unified and Dynamic Graph for Temporal Character Grouping in Long Videos

no code implementations • 27 Aug 2023 • Xiujun Shu, Wei Wen, Liangsheng Xu, Mingbao Lin, Ruizhi Qiao, Taian Guo, Hanjun Li, Bei Gan, Xiao Wang, Xing Sun

In this paper, we present a unified and dynamic graph (UniDG) framework for temporal character grouping.

Clustering Graph Clustering

Paper
Add Code

D3G: Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation

1 code implementation • ICCV 2023 • Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, Xing Sun

Under this setup, we propose a Dynamic Gaussian prior based Grounding framework with Glance annotation (D3G), which consists of a Semantic Alignment Group Contrastive Learning module (SA-GCL) and a Dynamic Gaussian prior Adjustment module (DGA).

Ranked #10 on Temporal Sentence Grounding on Charades-STA

Contrastive Learning Sentence +1

Paper
Code

VLMAE: Vision-Language Masked Autoencoder

no code implementations • 19 Aug 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Chen Wu, Xiujun Shu, Bo Ren

Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data.

Language Modelling Question Answering +4

Paper
Add Code

Exploiting Feature Diversity for Make-up Temporal Video Grounding

no code implementations • 12 Aug 2022 • Xiujun Shu, Wei Wen, Taian Guo, Sunan He, Chen Wu, Ruizhi Qiao

This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022.

Video Grounding

Paper
Add Code

Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer

1 code implementation • 5 Jul 2022 • Sunan He, Taian Guo, Tao Dai, Ruizhi Qiao, Bo Ren, Shu-Tao Xia

Specifically, our method exploits multi-modal knowledge of image-text pairs based on a vision and language pre-training (VLP) model.

Ranked #1 on Multi-label zero-shot learning on Open Images V4

Image-text matching Knowledge Distillation +7

111

Paper
Code

MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution

1 code implementation • ECCV 2020 • Wenbo Li, Xin Tao, Taian Guo, Lu Qi, Jiangbo Lu, Jiaya Jia

Motivated by these findings, we propose a temporal multi-correspondence aggregation strategy to leverage similar patches across frames, and a cross-scale nonlocal-correspondence aggregation scheme to explore self-similarity of images across scales.

Optical Flow Estimation Video Super-Resolution

223

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.