Text Spotting

53 papers with code • 4 benchmarks • 6 datasets

Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.

Libraries

Use these libraries to find Text Spotting models and implementations

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Yuliang-Liu/VimTS 30 Apr 2024

Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.

13
30 Apr 2024

Bridging the Gap Between End-to-End and Two-Step Text Spotting

mxin262/swintextspotter 6 Apr 2024

Subsequently, we introduce a Bridge that connects the locked detector and recognizer through a zero-initialized neural network.

257
06 Apr 2024

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

alibabaresearch/advancedliteratemachinery 28 Mar 2024

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

974
28 Mar 2024

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

yuliang-liu/monkey 7 Mar 2024

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

1,401
07 Mar 2024

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

hxyz-123/gomatching 13 Jan 2024

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

14
13 Jan 2024

GloTSFormer: Global Video Text Spotting Transformer

Hon-Wong/GlotsFormer 8 Jan 2024

In this paper, we propose a novel Global Video Text Spotting Transformer GloTSFormer to model the tracking problem as global associations and utilize the Gaussian Wasserstein distance to guide the morphological correlation between frames.

2
08 Jan 2024

Parrot Captions Teach CLIP to Spot Text

opendatalab/clip-parrot-bias 21 Dec 2023

Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.

50
21 Dec 2023

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

google-research-datasets/hiertext 25 Oct 2023

We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis.

237
25 Oct 2023

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

alloydas/testr_eval 2 Oct 2023

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.

1
02 Oct 2023

STEP -- Towards Structured Scene-Text Spotting

cvc-dag/step 5 Sep 2023

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.

2
05 Sep 2023