Search Results for author: Zhibo Yang

Found 25 papers, 14 papers with code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation • 28 Mar 2024 • Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

document understanding Key Information Extraction +3

926

Paper
Code

HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition

no code implementations • 20 Mar 2024 • Yuyi Zhang, Yuanzhi Zhu, Dezhi Peng, Peirong Zhang, Zhenhua Yang, Zhibo Yang, Cong Yao, Lianwen Jin

Text recognition, especially for complex scripts like Chinese, faces unique challenges due to its intricate character structures and vast vocabulary.

Zero-Shot Learning

Paper
Add Code

LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training

no code implementations • 3 Jan 2024 • Rujiao Long, Hangdi Xing, Zhibo Yang, Qi Zheng, Zhi Yu, Cong Yao, Fei Huang

We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time regresses logical location as well as spatial location of table cells in a unified network.

regression

Paper
Add Code

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

no code implementations • 4 Aug 2023 • Jinyu Long, Jetic Gū, Binhao Bai, Zhibo Yang, Ping Wei, Junli Li

Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels.

Speech Enhancement

Paper
Add Code

Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention

1 code implementation • CVPR 2023 • Sounak Mondal, Zhibo Yang, Seoyoung Ahn, Dimitris Samaras, Gregory Zelinsky, Minh Hoai

In response, we pose a new task called ZeroGaze, a new variant of zero-shot learning where gaze is predicted for never-before-searched objects, and we develop a novel model, Gazeformer, to solve the ZeroGaze problem.

Gaze Prediction Language Modelling +2

Paper
Code

Modeling Entities as Semantic Points for Visual Information Extraction in the Wild

no code implementations • CVPR 2023 • Zhibo Yang, Rujiao Long, Pengfei Wang, Sibo Song, Humen Zhong, Wenqing Cheng, Xiang Bai, Cong Yao

As the first contribution of this work, we curate and release a new dataset for VIE, in which the document images are much more challenging in that they are taken from real applications, and difficulties such as blur, partial occlusion, and printing shift are quite common.

Text Spotting

Paper
Add Code

Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers

1 code implementation • 16 Mar 2023 • Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Ruoyu Xue, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

Most models of visual attention aim at predicting either top-down or bottom-up control, as studied using different visual search and free-viewing tasks.

Scanpath prediction

Paper
Code

Unraveling Key Elements Underlying Molecular Property Prediction: A Systematic Study

1 code implementation • 26 Sep 2022 • Jianyuan Deng, Zhibo Yang, Hehe Wang, Iwao Ojima, Dimitris Samaras, Fusheng Wang

Herein, we conduct an extensive evaluation of representative models using various representations on the MoleculeNet datasets, a suite of opioids-related datasets and two additional activity datasets from the literature.

Drug Discovery Molecular Property Prediction +3

Paper
Code

Target-absent Human Attention

1 code implementation • 4 Jul 2022 • Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai, Dimitris Samaras

In this paper, we propose the first data-driven computational model that addresses the search-termination problem and predicts the scanpath of search fixations made by people searching for targets that do not appear in images.

Imitation Learning

Paper
Code

Vision-Language Pre-Training for Boosting Scene Text Detectors

2 code implementations • CVPR 2022 • Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang, Wenqing Cheng, Xiang Bai, Cong Yao

In this paper, we specifically adapt vision-language joint learning for scene text detection, a task that intrinsically involves cross-modal interaction between the two modalities: vision and language, since text is the written form of language.

Contrastive Learning Language Modelling +4

926

Paper
Code

Revisiting Document Image Dewarping by Grid Regularization

no code implementations • CVPR 2022 • Xiangwei Jiang, Rujiao Long, Nan Xue, Zhibo Yang, Cong Yao, Gui-Song Xia

This paper addresses the problem of document image dewarping, which aims at eliminating the geometric distortion in document images for document digitization.

Ranked #3 on Local Distortion on DocUNet

Local Distortion Optical Flow Estimation

Paper
Add Code

ARTS: Eliminating Inconsistency between Text Detection and Recognition with Auto-Rectification Text Spotter

no code implementations • 20 Oct 2021 • Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, Tong Lu

Recent approaches for end-to-end text spotting have achieved promising results.

Text Detection Text Spotting

Paper
Add Code

Parsing Table Structures in the Wild

2 code implementations • ICCV 2021 • Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, Gui-Song Xia

In contrast to existing studies that mainly focus on parsing well-aligned tabular images with simple layouts from scanned PDF documents, we aim to establish a practical table structure parsing system for real-world scenarios where tabular input images are taken or scanned with severe deformation, bending or occlusions.

Object Detection

146

Paper
Code

Artificial Intelligence in Drug Discovery: Applications and Techniques

1 code implementation • 9 Jun 2021 • Jianyuan Deng, Zhibo Yang, Iwao Ojima, Dimitris Samaras, Fusheng Wang

Artificial intelligence (AI) has been transforming the practice of drug discovery in the past decade.

Drug Discovery Molecular Property Prediction +1

283

Paper
Code

PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text

1 code implementation • 2 May 2021 • Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen

By systematically comparing with existing scene text representations, we show that our kernel representation can not only describe arbitrarily-shaped text but also well distinguish adjacent text.

Scene Text Detection Text Detection +1

433

Paper
Code

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Paper
Add Code

Hierarchical Proxy-based Loss for Deep Metric Learning

no code implementations • 25 Mar 2021 • Zhibo Yang, Muhammet Bastan, Xinliang Zhu, Doug Gray, Dimitris Samaras

In this paper, we present a framework that leverages this implicit hierarchy by imposing a hierarchical structure on the proxies and can be used with any existing proxy-based loss.

Image Retrieval Metric Learning +1

Paper
Add Code

Defending against black-box adversarial attacks with gradient-free trained sign activation neural networks

1 code implementation • 1 Jan 2021 • Yunzhe Xue, Meiyan Xie, Zhibo Yang, Usman Roshan

The non-transferability in our ensemble also makes it a powerful defense to substitute model black box attacks that we show require a much greater distortion than binary and full precision networks to bring our model to zero adversarial accuracy.

Adversarial Defense

Paper
Code

AE TextSpotter: Learning Visual and Linguistic Representation for Ambiguous Text Spotting

2 code implementations • ECCV 2020 • Wenhai Wang, Xuebo Liu, Xiaozhong Ji, Enze Xie, Ding Liang, Zhibo Yang, Tong Lu, Chunhua Shen, Ping Luo

Unlike previous works that merely employed visual features for text detection, this work proposes a novel text spotter, named Ambiguity Eliminating Text Spotter (AE TextSpotter), which learns both visual and linguistic features to significantly reduce ambiguity in text detection.

Language Modelling Sentence +2

Paper
Code

Predicting Goal-directed Human Attention Using Inverse Reinforcement Learning

2 code implementations • CVPR 2020 • Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory Zelinsky, Dimitris Samaras, Minh Hoai

These maps were learned by IRL and then used to predict behavioral scanpaths for multiple target categories.

Object reinforcement-learning +1

Paper
Code

Towards Better Opioid Antagonists Using Deep Reinforcement Learning

no code implementations • 26 Mar 2020 • Jianyuan Deng, Zhibo Yang, Yao Li, Dimitris Samaras, Fusheng Wang

Naloxone, an opioid antagonist, has been widely used to save lives from opioid overdose, a leading cause for death in the opioid epidemic.

Drug Discovery reinforcement-learning +2

Paper
Add Code

Predicting Goal-directed Attention Control Using Inverse-Reinforcement Learning

no code implementations • 31 Jan 2020 • Gregory J. Zelinsky, Yupei Chen, Seoyoung Ahn, Hossein Adeli, Zhibo Yang, Lihan Huang, Dimitrios Samaras, Minh Hoai

Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.

BIG-bench Machine Learning reinforcement-learning +1

Paper
Add Code

TextField: Learning A Deep Direction Field for Irregular Scene Text Detection

1 code implementation • 4 Dec 2018 • Yongchao Xu, Yukang Wang, Wei Zhou, Yongpan Wang, Zhibo Yang, Xiang Bai

Experimental results show that the proposed TextField outperforms the state-of-the-art methods by a large margin (28% and 8%) on two curved text datasets: Total-Text and CTW1500, respectively, and also achieves very competitive performance on multi-oriented datasets: ICDAR 2015 and MSRA-TD500.

Ranked #22 on Scene Text Detection on Total-Text

Scene Text Detection Text Detection

Paper
Code

Robust and Fast Decoding of High-Capacity Color QR Codes for Mobile Applications

1 code implementation • 21 Apr 2017 • Zhibo Yang, Huanle Xu, Jianyuan Deng, Chen Change Loy, Wing Cheong Lau

Particularly, we further discover a new type of chromatic distortion in high-density color QR codes, cross-module color interference, caused by the high density which also makes the geometric distortion correction more challenging.

Paper
Code

Similar Handwritten Chinese Character Discrimination by Weakly Supervised Learning

no code implementations • 19 Sep 2015 • Zhibo Yang, Huanle Xu, Keda Fu, Yong Xia

The unconstrained property makes our method well adapted to high variance in the size and position of discriminative regions in similar handwritten Chinese characters.

Weakly-supervised Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.