1 code implementation • 14 Apr 2024 • Ya-Qi Yu, Minghui Liao, Jihao Wu, Yongxin Liao, Xiaoyu Zheng, Wei Zeng
We conduct extensive experiments on both general and document-oriented MLLM benchmarks, and show that TextHawk outperforms the state-of-the-art methods, demonstrating its effectiveness and superiority in fine-grained document perception and general abilities.
1 code implementation • 5 Mar 2024 • Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang
To address this, this work presents Chain-of-Action-Thought (dubbed CoAT), which takes the description of the previous actions, the current screen, and more importantly the action thinking of what actions should be performed and the outcomes led by the chosen action.
no code implementations • 24 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
1 code implementation • 21 Feb 2024 • Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
By enhancing the alignment between the canonical mask feature and the text feature, the module ensures more effective fusion, ultimately leading to improved recognition performance.
1 code implementation • 22 Dec 2023 • Minghui Liao, Guojia Wan, Bo Du
Skeleton Encoder integrates the local information of neurons in a bottom-up manner, with a one-dimensional convolution in neural skeleton's point data; Connectome Encoder uses a graph neural network to capture the topological information of neural circuit; finally, Readout Layer fuses the above two information and outputs classification results.
1 code implementation • 17 Aug 2023 • Ziyin Zhang, Ning Lu, Minghui Liao, Yongshuai Huang, Cheng Li, Min Wang, Wei Peng
It incorporates a framewise regularization term in CTC loss to emphasize individual supervision, and leverages the maximizing-a-posteriori of latent alignment to solve the inconsistency problem that arises in distillation between CTC-based models.
1 code implementation • 1 Jul 2022 • Mingkun Yang, Minghui Liao, Pu Lu, Jing Wang, Shenggao Zhu, Hualin Luo, Qi Tian, Xiang Bai
Inspired by the observation that humans learn to recognize the texts through both reading and writing, we propose to learn discrimination and generation by integrating contrastive learning and masked image modeling in our self-supervised method.
no code implementations • 23 Mar 2022 • Wondimu Dikubab, Dingkang Liang, Minghui Liao, Xiang Bai
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages (e. g., Amharic, Tigrinya) in East Africa for more than 120 million people.
5 code implementations • 21 Feb 2022 • Minghui Liao, Zhisheng Zou, Zhaoyi Wan, Cong Yao, Xiang Bai
By incorporating the proposed DB and ASF with the segmentation network, our proposed scene text detector consistently achieves state-of-the-art results, in terms of both detection accuracy and speed, on five standard benchmarks.
Ranked #3 on Scene Text Detection on MSRA-TD500
no code implementations • 15 Oct 2021 • Ziyi Liu, Minghui Liao, Fulin Luo, Bo Du
This method constructs the graph by the similarity relationship between cells and adopts GCN to analyze the neighbor embedding information of samples, which makes the similar cell closer to each other on the 2D scatter plot.
no code implementations • CVPR 2021 • Minghang He, Minghui Liao, Zhibo Yang, Humen Zhong, Jun Tang, Wenqing Cheng, Cong Yao, Yongpan Wang, Xiang Bai
Over the past few years, the field of scene text detection has progressed rapidly that modern text detectors are able to hunt text in various challenging scenarios.
no code implementations • 9 Dec 2020 • Wenqing Zhang, Yang Qiu, Minghui Liao, Rui Zhang, Xiaolin Wei, Xiang Bai
It is a general labeling method for texts with various shapes and requires low labeling costs.
1 code implementation • ECCV 2020 • Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, Xiang Bai
Recent end-to-end trainable methods for scene text spotting, integrating detection and recognition, showed much progress.
Ranked #11 on Text Spotting on Total-Text
no code implementations • 20 Dec 2019 • Xi Liu, Rui Zhang, Yongsheng Zhou, Qianyi Jiang, Qi Song, Nan Li, Kai Zhou, Lei Wang, Dong Wang, Minghui Liao, Mingkun Yang, Xiang Bai, Baoguang Shi, Dimosthenis Karatzas, Shijian Lu, C. V. Jawahar
21 teams submit results for Task 1, 23 teams submit results for Task 2, 24 teams submit results for Task 3, and 13 teams submit results for Task 4.
15 code implementations • 20 Nov 2019 • Minghui Liao, Zhaoyi Wan, Cong Yao, Kai Chen, Xiang Bai
Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text.
Ranked #6 on Scene Text Detection on MSRA-TD500
1 code implementation • ECCV 2018 • Minghui Liao, Pengyuan Lyu, Minghang He, Cong Yao, Wenhao Wu, Xiang Bai
Moreover, we further investigate the recognition module of our method separately, which significantly outperforms state-of-the-art methods on both regular and irregular text datasets for scene text recognition.
no code implementations • ICCV 2019 • MingKun Yang, Yushuo Guan, Minghui Liao, Xin He, Kaigui Bian, Song Bai, Cong Yao, Xiang Bai
Reading text in the wild is a very challenging task due to the diversity of text instances and the complexity of natural scenes.
1 code implementation • 13 Jul 2019 • Minghui Liao, Boyu Song, Shangbang Long, Minghang He, Cong Yao, Xiang Bai
Different from the previous methods which paste the rendered text on static 2D images, our method can render the 3D virtual scene and text instances as an entirety.
no code implementations • 18 Sep 2018 • Minghui Liao, Jian Zhang, Zhaoyi Wan, Fengming Xie, Jiajun Liang, Pengyuan Lyu, Cong Yao, Xiang Bai
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem.
Ranked #32 on Scene Text Recognition on SVT
1 code implementation • ECCV 2018 • Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, Xiang Bai
Recently, models based on deep neural networks have dominated the fields of scene text detection and recognition.
Ranked #3 on Scene Text Detection on ICDAR 2013
no code implementations • CVPR 2018 • Minghui Liao, Zhen Zhu, Baoguang Shi, Gui-Song Xia, Xiang Bai
Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks.
Ranked #14 on Scene Text Detection on MSRA-TD500
3 code implementations • 9 Jan 2018 • Minghui Liao, Baoguang Shi, Xiang Bai
In this paper, we present an end-to-end trainable fast scene text detector, named TextBoxes++, which detects arbitrary-oriented scene text with both high accuracy and efficiency in a single network forward pass.
Ranked #2 on Scene Text Detection on COCO-Text
5 code implementations • 31 Aug 2017 • Baoguang Shi, Cong Yao, Minghui Liao, Mingkun Yang, Pei Xu, Linyan Cui, Serge Belongie, Shijian Lu, Xiang Bai
This report introduces RCTW, a new competition that focuses on Chinese text reading.
3 code implementations • 21 Nov 2016 • Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, Wenyu Liu
This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression.