Search Results for author: Minghui Liao

Found 24 papers, 15 papers with code

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

1 code implementation • 14 Apr 2024 • Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

Paper
Code

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation • 5 Mar 2024 • Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model

Paper
Code

Sequential Visual and Semantic Consistency for Semi-supervised Text Recognition

no code implementations • 24 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai

Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.

Scene Text Recognition Semantic Similarity +1

Paper
Add Code

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

1 code implementation • 21 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai

By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.

Scene Text Recognition

Paper
Code

Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

1 code implementation • 22 Dec 2023 • Minghui Liao, Guojia Wan, Bo Du

Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results.

Paper
Code

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

1 code implementation • 17 Aug 2023 • Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang, Wei Peng

It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models.

Paper
Code

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.

Contrastive Learning Scene Text Recognition

Paper
Code

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

no code implementations • 23 Mar 2022 • Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.

Benchmarking Scene Text Detection +1

Paper
Add Code

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

5 code implementations • 21 Feb 2022 • Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Ranked #3 on Scene Text Detection on MSRA-TD500

Binarization Model Optimization +3

38,291

Paper
Code

SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network

no code implementations • 15 Oct 2021 • Ziyi Liu, Minghui Liao, Fulin Luo, Bo Du

This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot.

Dimensionality Reduction Graph Embedding

Paper
Add Code

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Paper
Add Code

Scene Text Detection with Scribble Lines

no code implementations • 9 Dec 2020 • Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai

It is a general labeling method for texts with various shapes and requires low labeling costs.

Scene Text Detection Text Detection

Paper
Add Code

Mask TextSpotter v3: Segmentation Proposal Network for Robust Scene Text Spotting

1 code implementation • ECCV 2020 • Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai

Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.

Ranked #11 on Text Spotting on Total-Text

Region Proposal Text Spotting

604

Paper
Code

ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar

21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.

Line Detection Task 2

Paper
Add Code

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Ranked #6 on Scene Text Detection on MSRA-TD500

Binarization Optical Character Recognition (OCR) +3

38,291

Paper
Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

413

Paper
Code

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

Paper
Add Code

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation • 13 Jul 2019 • Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection +1

141

Paper
Code

Scene Text Recognition from Two-Dimensional Perspective

no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Ranked #30 on Scene Text Recognition on SVT

Scene Text Recognition Semantic Segmentation +4

Paper
Add Code

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai

Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.

Ranked #3 on Scene Text Detection on ICDAR 2013

Scene Text Detection Semantic Segmentation +2

261

Paper
Code

Rotation-Sensitive Regression for Oriented Scene Text Detection

no code implementations • CVPR 2018 • Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-Song Xia, Xiang Bai

Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks.

Ranked #14 on Scene Text Detection on MSRA-TD500

Classification General Classification +6

Paper
Add Code

TextBoxes++: A Single-Shot Oriented Scene Text Detector

3 code implementations • 9 Jan 2018 • Minghui Liao, Baoguang Shi, Xiang Bai

In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.

Ranked #2 on Scene Text Detection on COCO-Text

object-detection Object Detection +3

947

Paper
Code

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17)

5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai

This report introduces RCTW, a new competition that focuses on Chinese text reading.

valid

2,891

Paper
Code

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

3 code implementations • 21 Nov 2016 • Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.

629

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.