Text Spotting
53 papers with code • 4 benchmarks • 6 datasets
Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.
Libraries
Use these libraries to find Text Spotting models and implementationsDatasets
Latest papers
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Subsequently, we introduce a Bridge that connects the locked detector and recognizer through a zero-initialized neural network.
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.
GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching
In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.
GloTSFormer: Global Video Text Spotting Transformer
In this paper, we propose a novel Global Video Text Spotting Transformer GloTSFormer to model the tracking problem as global associations and utilize the Gaussian Wasserstein distance to guide the morphological correlation between frames.
Parrot Captions Teach CLIP to Spot Text
Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.
Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis
We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis.
Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance
The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.
STEP -- Towards Structured Scene-Text Spotting
We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.