Search Results for author: Yair Kittenplon

Found 7 papers, 2 papers with code

DocVLM: Make Your VLM an Efficient Reader

no code implementations CVPR 2025 Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman

Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing.

document understanding Optical Character Recognition (OCR)

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

no code implementations7 Nov 2024 Jonathan Fhima, Elad Ben Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman

In this paper, we focus on enhancing the first strategy by introducing a novel method, named TAP-VL, which treats OCR information as a distinct modality and seamlessly integrates it into any VL model.

Optical Character Recognition Optical Character Recognition (OCR)

Towards Models that Can See and Read

no code implementations ICCV 2023 Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman

Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image.

Decoder Image Captioning +2

Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer

no code implementations CVPR 2022 Yair Kittenplon, Inbal Lavi, Sharon Fogel, Yarin Bar, R. Manmatha, Pietro Perona

Text spotting end-to-end methods have recently gained attention in the literature due to the benefits of jointly optimizing the text detection and recognition components.

Text Detection Text Spotting

Cannot find the paper you are looking for? You can Submit a new open access paper.