Optical Character Recognition (OCR)
311 papers with code • 5 benchmarks • 42 datasets
Optical Character Recognition or Optical Character Reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo, license plates in cars...) or from subtitle text superimposed on an image (for example: from a television broadcast)
Libraries
Use these libraries to find Optical Character Recognition (OCR) models and implementationsSubtasks
Latest papers with no code
Improvement in Semantic Address Matching using Natural Language Processing
Existing solution uses similarity of strings, and edit distance algorithms to find out the similar addresses from the address database, but these algorithms could not work effectively with redundant, unstructured, or incomplete address data.
TC-OCR: TableCraft OCR for Efficient Detection & Recognition of Table Structure & Content
Our proposed approach achieves an IOU of 0. 96 and an OCR Accuracy of 78%, showcasing a remarkable improvement of approximately 25% in the OCR Accuracy compared to the previous Table Transformer approach.
Resilience of Large Language Models for Noisy Instructions
As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks.
Convolution-based Probability Gradient Loss for Semantic Segmentation
In this paper, we introduce a novel Convolution-based Probability Gradient (CPG) loss for semantic segmentation.
VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?
Multimodal Large Language models (MLLMs) have shown promise in web-related tasks, but evaluating their performance in the web domain remains a challenge due to the lack of comprehensive benchmarks.
Making Old Kurdish Publications Processable by Augmenting Available Optical Character Recognition Engines
Having an extensive dataset is crucial to develop OCR systems with reasonable accuracy, as currently, no public datasets are available for historical Kurdish documents; this posed a significant challenge in our work.
HAMMR: HierArchical MultiModal React agents for generic VQA
We start from a multimodal ReAct-based system and make it hierarchical by enabling our HAMMR agents to call upon other specialized agents.
Design and Development of a Framework For Stroke-Based Handwritten Gujarati Font Generation
The generation phase involves the user providing a small subset of characters, and the system automatically generates the remaining character glyphs based on extracted strokes and learned rules, resulting in handwritten Gujarati fonts.
Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach
Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new.
RealKIE: Five Novel Datasets for Enterprise Key Information Extraction
We introduce RealKIE, a benchmark of five challenging datasets aimed at advancing key information extraction methods, with an emphasis on enterprise applications.