Search Results for author: Gangyan Zeng

Found 7 papers, 3 papers with code

VidText: Towards Comprehensive Evaluation for Video Text Understanding

1 code implementation28 May 2025 Zhoufaran Yang, Yan Shu, Zhifei Yang, Yan Zhang, Yu Li, Keyang Lu, Gangyan Zeng, Shaohui Liu, Yu Zhou, Nicu Sebe

We hope VidText will fill the current gap in video understanding benchmarks and serve as a foundation for future research on multimodal reasoning with video text in dynamic environments.

Multimodal Reasoning Optical Character Recognition (OCR) +1

Track the Answer: Extending TextVQA from Image to Video with Spatio-Temporal Clues

1 code implementation17 Dec 2024 Yan Zhang, Gangyan Zeng, Huawen Shen, Daiqing Wu, Yu Zhou, Can Ma

Video text-based visual question answering (Video TextVQA) is a practical task that aims to answer questions by jointly reasoning textual and visual information in a given video.

Language Modeling Language Modelling +4

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

no code implementations15 Mar 2024 Jiahao Lyu, Jin Wei, Gangyan Zeng, Zeng Li, Enze Xie, Wei Wang, Yu Zhou

Taking advantage of the fine-tuned language model on scene recognition benchmarks and the paradigm of text block detection, extensive experiments demonstrate the superior performance of our scene text spotter across multiple public benchmarks.

Language Modeling Language Modelling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.