Text Spotting

53 papers with code • 4 benchmarks • 6 datasets

Text Spotting is the combination of Scene Text Detection and Scene Text Recognition in an end-to-end manner. It is the ability to read natural text in the wild.

Benchmarks

Add a Result

These leaderboards are used to track progress in Text Spotting

Dataset	Best Model	Compare
ICDAR 2015	UNITS	See all
Total-Text	DeepSolo (ViTAEv2-S, TextOCR)	See all
SCUT-CTW1500	A3S	See all
Inverse-Text	DeepSolo (ViTAEv2-S, TextOCR)	See all

Libraries

Use these libraries to find Text Spotting models and implementations

hikopensource/davar-lab-ocr

4 papers

708

mxin262/swintextspotter

3 papers

257

Datasets

Latest papers

Most implemented Social Latest No code

VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization

Yuliang-Liu/VimTS • • 30 Apr 2024

Typically, we propose a Prompt Queries Generation Module and a Tasks-aware Adapter to effectively convert the original single-task model into a multi-task model suitable for both image and video scenarios with minimal additional parameters.

30 Apr 2024

Paper
Code

Bridging the Gap Between End-to-End and Two-Step Text Spotting

mxin262/swintextspotter • • 6 Apr 2024

Subsequently, we introduce a Bridge that connects the locked detector and recognizer through a zero-initialized neural network.

257

06 Apr 2024

Paper
Code

OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition

alibabaresearch/advancedliteratemachinery • • 28 Mar 2024

Recently, visually-situated text parsing (VsTP) has experienced notable advancements, driven by the increasing demand for automated document understanding and the emergence of Generative Large Language Models (LLMs) capable of processing document-based questions.

974

28 Mar 2024

Paper
Code

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

yuliang-liu/monkey • • 7 Mar 2024

We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks.

1,401

07 Mar 2024

Paper
Code

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

hxyz-123/gomatching • • 13 Jan 2024

In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance.

13 Jan 2024

Paper
Code

GloTSFormer: Global Video Text Spotting Transformer

Hon-Wong/GlotsFormer • 8 Jan 2024

In this paper, we propose a novel Global Video Text Spotting Transformer GloTSFormer to model the tracking problem as global associations and utilize the Gaussian Wasserstein distance to guide the morphological correlation between frames.

08 Jan 2024

Paper
Code

Parrot Captions Teach CLIP to Spot Text

opendatalab/clip-parrot-bias • • 21 Dec 2023

Despite CLIP being the foundation model in numerous vision-language applications, the CLIP suffers from a severe text spotting bias.

21 Dec 2023

Paper
Code

Hierarchical Text Spotter for Joint Text Spotting and Layout Analysis

google-research-datasets/hiertext • • 25 Oct 2023

We propose Hierarchical Text Spotter (HTS), a novel method for the joint task of word-level text spotting and geometric layout analysis.

237

25 Oct 2023

Paper
Code

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

alloydas/testr_eval • • 2 Oct 2023

The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions.

02 Oct 2023

Paper
Code

STEP -- Towards Structured Scene-Text Spotting

cvc-dag/step • • 5 Sep 2023

We introduce the structured scene-text spotting task, which requires a scene-text OCR system to spot text in the wild according to a query regular expression.

05 Sep 2023

Paper
Code

Text Spotting

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result