Search Results for author: Minghui Liao

Found 24 papers, 15 papers with code

TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models

1 code implementation14 Apr 2024 Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng

We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.

Android in the Zoo: Chain-of-Action-Thought for GUI Agents

1 code implementation5 Mar 2024 Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.

Language Modelling Large Language Model

Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition

1 code implementation21 Feb 2024 Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai

By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.

Scene Text Recognition

Joint Learning Neuronal Skeleton and Brain Circuit Topology with Permutation Invariant Encoders for Neuron Classification

1 code implementation22 Dec 2023 Minghui Liao, Guojia Wan, Bo Du

Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results.

Self-distillation Regularized Connectionist Temporal Classification Loss for Text Recognition: A Simple Yet Effective Approach

1 code implementation17 Aug 2023 Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang, Wei Peng

It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models.

Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition

1 code implementation1 Jul 2022 Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai

Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.

Contrastive Learning Scene Text Recognition

Comprehensive Benchmark Datasets for Amharic Scene Text Detection and Recognition

no code implementations23 Mar 2022 Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai

Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.

Benchmarking Scene Text Detection +1

Real-Time Scene Text Detection with Differentiable Binarization and Adaptive Scale Fusion

5 code implementations21 Feb 2022 Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai

By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.

Binarization Model Optimization +3

SGEN: Single-cell Sequencing Graph Self-supervised Embedding Network

no code implementations15 Oct 2021 Ziyi Liu, Minghui Liao, Fulin Luo, Bo Du

This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot.

Dimensionality Reduction Graph Embedding

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

no code implementations CVPR 2021 Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai

Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.

Scene Text Detection Text Detection

Scene Text Detection with Scribble Lines

no code implementations9 Dec 2020 Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai

It is a general labeling method for texts with various shapes and requires low labeling costs.

Scene Text Detection Text Detection

Real-time Scene Text Detection with Differentiable Binarization

15 code implementations20 Nov 2019 Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.

Binarization Optical Character Recognition (OCR) +3

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

1 code implementation ECCV 2018 Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai

Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.

Scene Text Recognition Semantic Segmentation +2

Symmetry-constrained Rectification Network for Scene Text Recognition

no code implementations ICCV 2019 MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai

Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.

Scene Text Recognition

SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds

1 code implementation13 Jul 2019 Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai

Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.

Image Generation Scene Text Detection +1

Scene Text Recognition from Two-Dimensional Perspective

no code implementations18 Sep 2018 Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.

Scene Text Recognition Semantic Segmentation +4

TextBoxes++: A Single-Shot Oriented Scene Text Detector

3 code implementations9 Jan 2018 Minghui Liao, Baoguang Shi, Xiang Bai

In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.

object-detection Object Detection +3

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

3 code implementations21 Nov 2016 Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.

Cannot find the paper you are looking for? You can Submit a new open access paper.