Search Results for author: Sibo Song

Found 8 papers, 3 papers with code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation • 28 Mar 2024 • Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

document understanding Key Information Extraction +3

926

Paper
Code

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations • CVPR 2023 • Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

Paper
Add Code

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations • CVPR 2022 • Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

930

Paper
Code

Deep Adaptive Temporal Pooling for Activity Recognition

no code implementations • 22 Aug 2018 • Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal

Specifically, using frame-level features, DATP regresses importance of different temporal segments and generates weights for them.

Human Activity Recognition

Paper
Add Code

Defense Against Adversarial Attacks with Saak Transform

no code implementations • 6 Aug 2018 • Sibo Song, Yueru Chen, Ngai-Man Cheung, C. -C. Jay Kuo

Therefore, we propose a Saak transform based preprocessing method with three steps: 1) transforming an input image to a joint spatial-spectral representation via the forward Saak transform, 2) apply filtering to its high-frequency components, and, 3) reconstructing the image via the inverse Saak transform.

Adversarial Defense

Paper
Add Code

Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text

1 code implementation • 17 Jun 2017 • Zhe Wang, Kingsley Kuan, Mathieu Ravaut, Gaurav Manek, Sibo Song, Yuan Fang, Seokhwan Kim, Nancy Chen, Luis Fernando D'Haro, Luu Anh Tuan, Hongyuan Zhu, Zeng Zeng, Ngai Man Cheung, Georgios Piliouras, Jie Lin, Vijay Chandrasekhar

Beyond that, we extend the original competition by including text information in the classification, making this a truly multi-modal approach with vision, audio and text.

Classification General Classification +1

Paper
Code

On Classification of Distorted Images with Deep Convolutional Neural Networks

no code implementations • 8 Jan 2017 • Yiren Zhou, Sibo Song, Ngai-Man Cheung

Image blur and image noise are common distortions during image acquisition.

Classification General Classification

Paper
Add Code

Egocentric Activity Recognition with Multimodal Fisher Vector

no code implementations • 25 Jan 2016 • Sibo Song, Ngai-Man Cheung, Vijay Chandrasekhar, Bappaditya Mandal, Jie Lin

With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently.

Egocentric Activity Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.