Search Results for author: Roy Ganz

Found 20 papers, 10 papers with code

DocVLM: Make Your VLM an Efficient Reader

no code implementations CVPR 2025 Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman

Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing.

document understanding Optical Character Recognition (OCR)

TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Models

no code implementations7 Nov 2024 Jonathan Fhima, Elad Ben Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman

In this paper, we focus on enhancing the first strategy by introducing a novel method, named TAP-VL, which treats OCR information as a distinct modality and seamlessly integrates it into any VL model.

Optical Character Recognition Optical Character Recognition (OCR)

Text-to-Image Generation Via Energy-Based CLIP

no code implementations30 Aug 2024 Roy Ganz, Michael Elad

For the discriminative objective, we employ contrastive adversarial loss, extending the adversarial training objective to the multimodal domain.

Text to Image Generation Text-to-Image Generation

Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

1 code implementation17 Jun 2024 Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld

We conduct a series of experiments that show how even mild knowledge regarding the opponent's incentives can be useful, and that the degree of potential gains depends on how these incentives relate to the structure of the learning task.

Adversarial Robustness Inductive Bias

Enhancing Consistency-Based Image Generation via Adversarialy-Trained Classification and Energy-Based Discrimination

1 code implementation25 May 2024 Shelly Golan, Roy Ganz, Michael Elad

While the classifier aims to grade an image based on its assignment to a designated class, the discriminator portion of the very same network leverages the softmax values to assess the proximity of the input image to the targeted data manifold, thereby serving as an Energy-based Model.

Image Generation

Paint by Inpaint: Learning to Add Image Objects by Removing Them First

1 code implementation CVPR 2025 Navve Wasserman, Noam Rotstein, Roy Ganz, Ron Kimmel

We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint), attributed to the utilization of segmentation mask datasets alongside inpainting models that inpaint within these masks.

Image Inpainting Language Modeling +5

CLIPAG: Towards Generator-Free Text-to-Image Generation

no code implementations29 Jun 2023 Roy Ganz, Michael Elad

Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings.

image-classification Image Classification +2

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions

1 code implementation28 May 2023 Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel

Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model (LLM), yielding comprehensive image descriptions.

 Ranked #1 on Image Captioning on COCO Captions (CLIPScore metric)

Attribute Image Captioning +4

Class-Conditioned Transformation for Enhanced Robust Image Classification

1 code implementation27 Mar 2023 Tsachi Blau, Roy Ganz, Chaim Baskin, Michael Elad, Alex M. Bronstein

Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models.

Adversarial Attack Classification +3

Towards Models that Can See and Read

no code implementations ICCV 2023 Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman

Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image.

Decoder Image Captioning +2

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

1 code implementation18 Aug 2022 Bahjat Kawar, Roy Ganz, Michael Elad

In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier.

Denoising Image Generation

Do Perceptually Aligned Gradients Imply Adversarial Robustness?

1 code implementation22 Jul 2022 Roy Ganz, Bahjat Kawar, Michael Elad

In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}.

Adversarial Robustness Image Classification

Threat Model-Agnostic Adversarial Defense using Diffusion Models

1 code implementation17 Jul 2022 Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, Michael Elad

Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.

Adversarial Defense Denoising +1

Improved Image Generation via Sparsity

no code implementations29 Sep 2021 Roy Ganz, Michael Elad

The interest of the deep learning community in image synthesis has grown massively in recent years.

Image Generation

BIGRoC: Boosting Image Generation via a Robust Classifier

1 code implementation8 Aug 2021 Roy Ganz, Michael Elad

The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them.

Image Generation

Improved Image Generation via Sparse Modeling

no code implementations1 Apr 2021 Roy Ganz, Michael Elad

The interest of the deep learning community in image synthesis has grown massively in recent years.

Image Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.