Search Results for author: Ron Litman

Found 15 papers, 7 papers with code

DocVLM: Make Your VLM an Efficient Reader

no code implementations CVPR 2025 Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman

Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing.

document understanding Optical Character Recognition (OCR)

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

no code implementations7 Nov 2024 Jonathan Fhima, Elad Ben Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman

In this paper, we focus on enhancing the first strategy by introducing a novel method, named TAP-VL, which treats OCR information as a distinct modality and seamlessly integrates it into any VL model.

Optical Character Recognition Optical Character Recognition (OCR)

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

1 code implementation17 Jul 2024 Ofir Abramovich, Niv Nayman, Sharon Fogel, Inbal Lavi, Ron Litman, Shahar Tsiper, Royee Tichauer, Srikar Appalaraju, Shai Mazor, R. Manmatha

In recent years, notable advancements have been made in the domain of visual document understanding, with the prevailing architecture comprising a cascade of vision and language models.

document understanding Optical Character Recognition (OCR)

Towards Models that Can See and Read

no code implementations ICCV 2023 Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman

Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image.

Decoder Image Captioning +2

TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers

1 code implementation9 May 2021 Oren Nuriel, Sharon Fogel, Ron Litman

However in some cases, their decisions are based on unintended information leading to high performance on standard benchmarks but also to a lack of generalization to challenging testing conditions and unintuitive failures.

Handwritten Text Recognition Scene Text Recognition

On Calibration of Scene-Text Recognition Models

no code implementations23 Dec 2020 Ron Slossberg, Oron Anschel, Amir Markovitz, Ron Litman, Aviad Aberdam, Shahar Tsiper, Shai Mazor, Jon Wu, R. Manmatha

Although the topic of confidence calibration has been an active research area for the last several decades, the case of structured and sequence prediction calibration has been scarcely explored.

Scene Text Recognition

Sequence-to-Sequence Contrastive Learning for Text Recognition

2 code implementations CVPR 2021 Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, Pietro Perona

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition.

Contrastive Learning Decoder +1

SCATTER: Selective Context Attentional Scene Text Recognizer

2 code implementations CVPR 2020 Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, R. Manmatha

The first attention step re-weights visual features from a CNN backbone together with contextual features computed by a BiLSTM layer.

Irregular Text Recognition Scene Text Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.