Search Results for author: Zhibo Yang

Found 32 papers, 17 papers with code

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction

no code implementations2 Nov 2024 Rujiao Long, Pengfei Wang, Zhibo Yang, Cong Yao

End-to-end visual information extraction (VIE) aims at integrating the hierarchical subtasks of VIE, including text spotting, word grouping, and entity labeling, into a unified framework.

Image Reconstruction Optical Character Recognition (OCR) +1

VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer

no code implementations18 Sep 2024 Humen Zhong, Zhibo Yang, Zhaohai Li, Peng Wang, Jun Tang, Wenqing Cheng, Cong Yao

Text recognition is an inherent integration of vision and language, encompassing the visual texture in stroke patterns and the semantic context among the character sequences.

Decoder Scene Text Recognition

Platypus: A Generalized Specialist Model for Reading Text in Various Forms

1 code implementation27 Aug 2024 Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao

Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency.

Handwritten Text Recognition Optical Character Recognition (OCR) +1

Look Hear: Gaze Prediction for Speech-directed Human Attention

no code implementations28 Jul 2024 Sounak Mondal, Seoyoung Ahn, Zhibo Yang, Niranjan Balasubramanian, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

To predict the gaze scanpaths in this incremental object referral task, we developed the Attention in Referral Transformer model or ART, which predicts the human fixations spurred by each word in a referring expression.

Decoder Gaze Prediction +2

Visual Text Generation in the Wild

1 code implementation19 Jul 2024 Yuanzhi Zhu, Jiawei Liu, Feiyu Gao, Wenyu Liu, Xinggang Wang, Peng Wang, Fei Huang, Cong Yao, Zhibo Yang

However, it is still challenging to render high-quality text images in real-world scenarios, as three critical criteria should be satisfied: (1) Fidelity: the generated text images should be photo-realistic and the contents are expected to be the same as specified in the given conditions; (2) Reasonability: the regions and contents of the generated text should cohere with the scene; (3) Utility: the generated text images can facilitate related tasks (e. g., text detection and recognition).

Language Modelling Large Language Model +3

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation28 Mar 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

no code implementations20 Mar 2024 Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.

Zero-Shot Learning

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations3 Jan 2024 Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

1 code implementation CVPR 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently visually-situated text parsing (VsTP) has experienced notable advancements driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

Decoder document understanding +4

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

no code implementations4 Aug 2023 Jinyu Long, Jetic Gū, Binhao Bai, Zhibo Yang, Ping Wei, Junli Li

Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels.

Speech Enhancement

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

1 code implementation CVPR 2023 Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem.

Decoder Gaze Prediction +3

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations CVPR 2023 Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

1 code implementation CVPR 2024 Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

Most models of visual attention aim at predicting either top-down or bottom-up control, as studied using different visual search and free-viewing tasks.

Scanpath prediction

Unraveling Key Elements Underlying Molecular Property Prediction: A Systematic Study

1 code implementation26 Sep 2022 Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang

Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature.

Drug Discovery Molecular Property Prediction +3

Target-absent Human Attention

1 code implementation4 Jul 2022 Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

In this paper, we propose the first data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images.

Imitation Learning

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations CVPR 2022 Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

Revisiting Document Image Dewarping by Grid Regularization

no code implementations CVPR 2022 Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization.

Local Distortion Optical Flow Estimation

Parsing Table Structures in the Wild

3 code implementations ICCV 2021 Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia

In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts from scanned PDF documents, we aim to establish a practical table structure parsing system for real-world scenarios where tabular input images are taken or scanned with severe deformation, bending or occlusions.

Object Detection

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation2 May 2021 Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations CVPR 2021 Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Hierarchical Proxy-based Loss for Deep Metric Learning

no code implementations25 Mar 2021 Zhibo Yang, Muhammet Bastan, Xinliang Zhu, Doug Gray, Dimitris Samaras

In this paper, we present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss.

Image Retrieval Metric Learning +1

Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks

1 code implementation1 Jan 2021 Yunzhe Xue, Meiyan Xie, Zhibo Yang, Usman Roshan

The non-transferability in our ensemble also makes it a powerful defense to substitute model black box attacks that we show require a much greater distortion than binary and full precision networks to bring our model to zero adversarial accuracy.

Adversarial Defense

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations ECCV 2020 Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Towards Better Opioid Antagonists Using Deep Reinforcement Learning

no code implementations26 Mar 2020 Jianyuan Deng, Zhibo Yang, Yao Li, Dimitris Samaras, Fusheng Wang

Naloxone, an opioid antagonist, has been widely used to save lives from opioid overdose, a leading cause for death in the opioid epidemic.

Deep Reinforcement Learning Drug Discovery +3

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning

no code implementations31 Jan 2020 Gregory J. Zelinsky, Yupei Chen, Seoyoung Ahn, Hossein Adeli, Zhibo Yang, Lihan Huang, Dimitrios Samaras, Minh Hoai

Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.

BIG-bench Machine Learning reinforcement-learning +2

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

1 code implementation4 Dec 2018 Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai

Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.

Scene Text Detection Text Detection

Robust and Fast Decoding of High-Capacity Color QR Codes for Mobile Applications

1 code implementation21 Apr 2017 Zhibo Yang, Huanle Xu, Jianyuan Deng, Chen Change Loy, Wing Cheong Lau

Particularly, we further discover a new type of chromatic distortion in high-density color QR codes, cross-module color interference, caused by the high density which also makes the geometric distortion correction more challenging.

Similar Handwritten Chinese Character Discrimination by Weakly Supervised Learning

no code implementations19 Sep 2015 Zhibo Yang, Huanle Xu, Keda Fu, Yong Xia

The unconstrained property makes our method well adapted to high variance in the size and position of discriminative regions in similar handwritten Chinese characters.

Weakly-supervised Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.