Search Results for author: Weibo Gu

Found 4 papers, 2 papers with code

Beyond Intermediate States: Explaining Visual Redundancy through Language

1 code implementation26 Mar 2025 Dingchen Yang, Bowen Cao, Anran Zhang, Weibo Gu, Winston Hu, Guang Chen

Multi-modal Large Langue Models (MLLMs) often process thousands of visual tokens, which consume a significant portion of the context window and impose a substantial computational burden.

Video-Language Alignment via Spatio-Temporal Graph Transformer

1 code implementation16 Jul 2024 Shi-Xue Zhang, Hongfa Wang, Xiaobin Zhu, Weibo Gu, Tianjin Zhang, Chun Yang, Wei Liu, Xu-Cheng Yin

In this paper, we propose a novel Spatio-Temporal Graph Transformer module to uniformly learn spatial and temporal contexts for video-language alignment pre-training (dubbed STGT).

Contrastive Learning Question Answering +3

Adaptive Perception Transformer for Temporal Action Localization

no code implementations25 Aug 2022 Yizheng Ouyang, Tianjin Zhang, Weibo Gu, Hongfa Wang

Besides, their multi-stage designs cannot generate action boundaries and categories straightforwardly.

Temporal Action Localization

Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

no code implementations15 Jul 2022 Mengyin Liu, Chao Zhu, Hongyu Gao, Weibo Gu, Hongfa Wang, Wei Liu, Xu-Cheng Yin

2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model.

Attribute Attribute Value Extraction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.