Search Results for author: Sibo Song

Found 9 papers, 4 papers with code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation28 Mar 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

1 code implementation CVPR 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently visually-situated text parsing (VsTP) has experienced notable advancements driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations CVPR 2023 Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations CVPR 2022 Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

Deep Adaptive Temporal Pooling for Activity Recognition

no code implementations22 Aug 2018 Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal

Specifically, using frame-level features, DATP regresses importance of different temporal segments and generates weights for them.

Human Activity Recognition

Defense Against Adversarial Attacks with Saak Transform

no code implementations6 Aug 2018 Sibo Song, Yueru Chen, Ngai-Man Cheung, C. -C. Jay Kuo

Therefore, we propose a Saak transform based preprocessing method with three steps: 1) transforming an input image to a joint spatial-spectral representation via the forward Saak transform, 2) apply filtering to its high-frequency components, and, 3) reconstructing the image via the inverse Saak transform.

Adversarial Defense

Egocentric Activity Recognition with Multimodal Fisher Vector

no code implementations25 Jan 2016 Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal, Jie Lin

With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently.

Egocentric Activity Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.