Search Results for author: Yuliang Liu

Found 52 papers, 37 papers with code

TextSquare: Scaling up Text-Centric Visual Instruction Tuning

no code implementations19 Apr 2024 Jingqun Tang, Chunhui Lin, Zhen Zhao, Shu Wei, Binghong Wu, Qi Liu, Hao Feng, Yang Li, Siqi Wang, Lei Liao, Wei Shi, Yuliang Liu, Hao liu, Yuan Xie, Xiang Bai, Can Huang

Text-centric visual question answering (VQA) has made great strides with the development of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of leading models like GPT4V and Gemini, partly due to a lack of extensive, high-quality instruction tuning data.

Hallucination Hallucination Evaluation +2

Bridging the Gap Between End-to-End and Two-Step Text Spotting

3 code implementations6 Apr 2024 Mingxin Huang, Hongliang Li, Yuliang Liu, Xiang Bai, Lianwen Jin

Subsequently, we introduce a Bridge that connects the locked detector and recognizer through a zero-initialized neural network.

Text Spotting

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

1 code implementation28 Mar 2024 Jianqiang Wan, Sibo Song, Wenwen Yu, Yuliang Liu, Wenqing Cheng, Fei Huang, Xiang Bai, Cong Yao, Zhibo Yang

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

document understanding Key Information Extraction +3

Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization

1 code implementation5 Feb 2024 Yang Jin, Zhicheng Sun, Kun Xu, Liwei Chen, Hao Jiang, Quzhe Huang, Chengru Song, Yuliang Liu, Di Zhang, Yang song, Kun Gai, Yadong Mu

In light of recent advances in multimodal Large Language Models (LLMs), there is increasing attention to scaling them from image-text data to more informative real-world videos.

Video Understanding Visual Question Answering

An open dataset for the evolution of oracle bone characters: EVOBC

no code implementations23 Jan 2024 Haisu Guan, Jinpeng Wan, Yuliang Liu, Pengjie Wang, Kaile Zhang, Zhebin Kuang, Xinyu Wang, Xiang Bai, Lianwen Jin

We conducted validation and simulated deciphering on the constructed dataset, and the results demonstrate its high efficacy in aiding the study of oracle bone script.

Decipherment

SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting

no code implementations15 Jan 2024 Mingxin Huang, Dezhi Peng, Hongliang Li, Zhenghao Peng, Chongyu Liu, Dahua Lin, Yuliang Liu, Xiang Bai, Lianwen Jin

In this paper, we propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2, which seeks to find a better synergy between text detection and recognition.

Text Detection Text Spotting

Progressive Evolution from Single-Point to Polygon for Scene Text

no code implementations21 Dec 2023 Linger Deng, Mingxin Huang, Xudong Xie, Yuliang Liu, Lianwen Jin, Xiang Bai

We demonstrate the accuracy of the generated polygons through extensive experiments: 1) By creating polygons from ground truth points, we achieved an accuracy of 82. 0% on ICDAR 2015; 2) In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.

Text Detection

Toward Real Text Manipulation Detection: New Dataset and New Solution

no code implementations12 Dec 2023 Dongliang Luo, Yuliang Liu, Rui Yang, Xianjin Liu, Jishen Zeng, Yu Zhou, Xiang Bai

With the surge in realistic text tampering, detecting fraudulent text in images has gained prominence for maintaining information security.

Contrastive Learning

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

no code implementations28 Nov 2023 Ling Fu, Zijie Wu, Yingying Zhu, Yuliang Liu, Xiang Bai

We contend that one main limitation of existing generation methods is the insufficient integration of foreground text with the background.

Image Generation Scene Text Detection +1

ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks

1 code implementation16 Nov 2023 Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

While Large Language Models (LLMs) have demonstrated proficiency in code generation benchmarks, translating these results into practical development scenarios - where leveraging existing repository-level libraries is the norm - remains challenging.

Code Generation Navigate

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

1 code implementation11 Nov 2023 Zhang Li, Biao Yang, Qiang Liu, Zhiyin Ma, Shuo Zhang, Jingxu Yang, Yabo Sun, Yuliang Liu, Xiang Bai

Additionally, experiments on 18 datasets further demonstrate that Monkey surpasses existing LMMs in many tasks like Image Captioning and various Visual Question Answering formats.

Image Captioning Question Answering +2

KwaiYiiMath: Technical Report

no code implementations11 Oct 2023 Jiayi Fu, Lei Lin, Xiaoyang Gao, Pengli Liu, Zhengzong Chen, Zhirui Yang, ShengNan Zhang, Xue Zheng, Yan Li, Yuliang Liu, Xucheng Ye, Yiqiao Liao, Chao Liao, Bin Chen, Chengru Song, Junchen Wan, Zijia Lin, Fuzheng Zhang, Zhongyuan Wang, Di Zhang, Kun Gai

Recent advancements in large language models (LLMs) have demonstrated remarkable abilities in handling a variety of natural language processing (NLP) downstream tasks, even on mathematical tasks requiring multi-step reasoning.

Ranked #87 on Arithmetic Reasoning on GSM8K (using extra training data)

Arithmetic Reasoning GSM8K +1

Turning a CLIP Model into a Scene Text Spotter

1 code implementation21 Aug 2023 Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai

Utilizing only 10% of the supervised data, FastTCM-CR50 improves performance by an average of 26. 5% and 5. 5% for text detection and spotting tasks, respectively.

object-detection Object Detection +3

ESTextSpotter: Towards Better Scene Text Spotting with Explicit Synergy in Transformer

3 code implementations ICCV 2023 Mingxin Huang, Jiaxin Zhang, Dezhi Peng, Hao Lu, Can Huang, Yuliang Liu, Xiang Bai, Lianwen Jin

To this end, we introduce a new model named Explicit Synergy-based Text Spotting Transformer framework (ESTextSpotter), which achieves explicit synergy by modeling discriminative and interactive features for text detection and recognition within a single decoder.

Text Detection Text Spotting

Box-DETR: Understanding and Boxing Conditional Spatial Queries

1 code implementation17 Jul 2023 Wenze Liu, Hao Lu, Yuliang Liu, Zhiguo Cao

In DAB-DETR, such queries are modulated by the so-called conditional linear projection at each decoder stage, aiming to search for positions of interest such as the four extremities of the box.

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

1 code implementation21 Jun 2023 Dezhi Peng, Chongyu Liu, Yuliang Liu, Lianwen Jin

As ViTEraser implicitly integrates text localization and inpainting, we propose a novel end-to-end pretraining method, termed SegMIM, which focuses the encoder and decoder on the text box segmentation and masked image modeling tasks, respectively.

Long-range modeling Scene Text Detection +1

On the Hidden Mystery of OCR in Large Multimodal Models

1 code implementation13 May 2023 Yuliang Liu, Zhang Li, Biao Yang, Chunyuan Li, XuCheng Yin, Cheng-Lin Liu, Lianwen Jin, Xiang Bai

In this paper, we conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks including Text Recognition, Scene Text-Centric Visual Question Answering (VQA), Document-Oriented VQA, Key Information Extraction (KIE), and Handwritten Mathematical Expression Recognition (HMER).

Key Information Extraction Nutrition +4

ICDAR 2023 Competition on Reading the Seal Title

no code implementations24 Apr 2023 Wenwen Yu, MingYu Liu, Mingrui Chen, Ning Lu, Yinlong Wen, Yuliang Liu, Dimosthenis Karatzas, Xiang Bai

To promote research in this area, we organized ICDAR 2023 competition on reading the seal title (ReST), which included two tasks: seal title text detection (Task 1) and end-to-end seal title recognition (Task 2).

Optical Character Recognition (OCR) Task 2 +1

Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation

2 code implementations10 Mar 2023 Zhiwei Zhang, Yuliang Liu

This stream is subsequently fed into the decoder-based transformer to generate visual re-creations and textual feedback in the second stage.

multimodal generation Text-to-Image Generation +1

Turning a CLIP Model into a Scene Text Detector

1 code implementation CVPR 2023 Wenwen Yu, Yuliang Liu, Wei Hua, Deqiang Jiang, Bo Ren, Xiang Bai

Recently, pretraining approaches based on vision language models have made effective progresses in the field of text detection.

Domain Adaptation Scene Text Detection +1

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

1 code implementation6 Feb 2023 Yuliang Liu, Shenggui Li, Jiarui Fang, Yanjun Shao, Boyuan Yao, Yang You

To address these challenges, we introduce a system that can jointly optimize distributed execution and gradient checkpointing plans.

Scheduling

SPTS v2: Single-Point Scene Text Spotting

3 code implementations4 Jan 2023 Yuliang Liu, Jiaxin Zhang, Dezhi Peng, Mingxin Huang, Xinyu Wang, Jingqun Tang, Can Huang, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

Within the context of our SPTS v2 framework, our experiments suggest a potential preference for single-point representation in scene text spotting when compared to other representations.

Text Detection Text Spotting

Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution

1 code implementation CVPR 2023 Chenfan Qu, Chongyu Liu, Yuliang Liu, Xinhong Chen, Dezhi Peng, Fengjun Guo, Lianwen Jin

In this paper, we propose a novel framework to capture more fine-grained clues in complex scenarios for tampered text detection, termed as Document Tampering Detector (DTD), which consists of a Frequency Perception Head (FPH) to compensate the deficiencies caused by the inconspicuous visual features, and a Multi-view Iterative Decoder (MID) for fully utilizing the information of features in different scales.

Image and Video Forgery Detection Image Compression +1

MSDS: A Large-Scale Chinese Signature and Token Digit String Dataset for Handwriting Verification

1 code implementation17 Oct 2022 Peirong Zhang, Jiajia Jiang, Yuliang Liu, Lianwen Jin

MSDS-ChS consists of handwritten Chinese signatures, which, to the best of our knowledge, is the largest publicly available Chinese signature dataset for handwriting verification, at least eight times larger than existing online datasets.

Handwriting Verification

SAPA: Similarity-Aware Point Affiliation for Feature Upsampling

2 code implementations26 Sep 2022 Hao Lu, Wenze Liu, Zixuan Ye, Hongtao Fu, Yuliang Liu, Zhiguo Cao

We introduce point affiliation into feature upsampling, a notion that describes the affiliation of each upsampled point to a semantic cluster formed by local decoder feature points with semantic similarity.

Depth Estimation Feature Upsampling +6

PageNet: Towards End-to-End Weakly Supervised Page-Level Handwritten Chinese Text Recognition

4 code implementations29 Jul 2022 Dezhi Peng, Lianwen Jin, Yuliang Liu, Canjie Luo, Songxuan Lai

Utilizing the proposed weakly supervised learning framework, PageNet requires only transcripts to be annotated for real data; however, it can still output detection and recognition results at both the character and line levels, avoiding the labor and cost of labeling bounding boxes of characters and text lines.

Handwritten Chinese Text Recognition Line Detection +1

DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

1 code implementation30 Mar 2022 Yu Tang, Chenyu Wang, Yufan Zhang, Yuliang Liu, Xingcheng Zhang, Linbo Qiao, Zhiquan Lai, Dongsheng Li

To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight.

SPTS: Single-Point Text Spotting

1 code implementation15 Dec 2021 Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Shenggao Zhu, Jing Li, Dahua Lin, Chunhua Shen, Xiang Bai, Lianwen Jin

For the first time, we demonstrate that training scene text spotting models can be achieved with an extremely low-cost annotation of a single-point for each instance.

Language Modelling Text Detection +1

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

1 code implementation28 Oct 2021 Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You

The success of Transformer models has pushed the deep learning model scale to billions of parameters.

ICDAR 2021 Competition on Integrated Circuit Text Spotting and Aesthetic Assessment

1 code implementation12 Jul 2021 Chun Chet Ng, Akmalul Khairi Bin Nazaruddin, Yeong Khang Lee, Xinyu Wang, Yuliang Liu, Chee Seng Chan, Lianwen Jin, Yipeng Sun, Lixin Fan

With hundreds of thousands of electronic chip components are being manufactured every day, chip manufacturers have seen an increasing demand in seeking a more efficient and effective way of inspecting the quality of printed texts on chip components.

Text Spotting

ABCNet v2: Adaptive Bezier-Curve Network for Real-time End-to-end Text Spotting

1 code implementation8 May 2021 Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, Hao Chen

Previous methods can be roughly categorized into two groups: character-based and segmentation-based, which often require character-level annotations and/or complex post-processing due to the unstructured output.

Text Spotting

Structured Multimodal Attentions for TextVQA

2 code implementations1 Jun 2020 Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton Van Den Hengel, Qi Wu

In this paper, we propose an end-to-end structured multimodal attention (SMA) neural network to mainly solve the first two issues above.

Graph Attention Optical Character Recognition (OCR) +3

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

no code implementations13 Jan 2020 Canjie Luo, Qingxiang Lin, Yuliang Liu, Lianwen Jin, Chunhua Shen

Furthermore, to tackle the issue of lacking paired training samples, we design an interactive joint training scheme, which shares attention masks from the recognizer to the discriminator, and enables the discriminator to extract the features of each character for further adversarial training.

Style Transfer

Exploring the Capacity of an Orderless Box Discretization Network for Multi-orientation Scene Text Detection

1 code implementation20 Dec 2019 Yuliang Liu, Tong He, Hao Chen, Xinyu Wang, Canjie Luo, Shuaitao Zhang, Chunhua Shen, Lianwen Jin

More importantly, based on OBD, we provide a detailed analysis of the impact of a collection of refinements, which may inspire others to build state-of-the-art text detectors.

Scene Text Detection Text Detection

ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT)

1 code implementation16 Sep 2019 Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, ChuanMing Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin

This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text (RRC-ArT) that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting.

Scene Text Detection Scene Text Recognition +2

Aggregation Cross-Entropy for Sequence Recognition

2 code implementations CVPR 2019 Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu, Lele Xie

In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective.

EnsNet: Ensconce Text in the Wild

3 code implementations3 Dec 2018 Shuaitao Zhang, Yuliang Liu, Lianwen Jin, Yaoxiong Huang, Songxuan Lai

The feature of the former is first enhanced by a novel lateral connection structure and then refined by four carefully designed losses: multiscale regression loss and content loss, which capture the global discrepancy of different level features; texture loss and total variation loss, which primarily target filling the text region and preserving the reality of the background.

Generative Adversarial Network Image Text Removal

Feature Enhancement Network: A Refined Scene Text Detector

no code implementations12 Nov 2017 Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo

In this paper, we propose a refined scene text detector with a \textit{novel} Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement.

object-detection Object Detection +3

Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection

no code implementations CVPR 2017 Yuliang Liu, Lianwen Jin

The effectiveness of our approach is evaluated on a public word-level, multi-oriented scene text database, ICDAR 2015 Robust Reading Competition Challenge 4 "Incidental scene text localization".

Text Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.