no code implementations • CVPR 2025 • Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz, Elad Ben Avraham, Alona Golts, Yair Kittenplon, Shai Mazor, Ron Litman
Vision-Language Models (VLMs) excel in diverse visual tasks but face challenges in document understanding, which requires fine-grained text processing.
no code implementations • 7 Nov 2024 • Jonathan Fhima, Elad Ben Avraham, Oren Nuriel, Yair Kittenplon, Roy Ganz, Aviad Aberdam, Ron Litman
In this paper, we focus on enhancing the first strategy by introducing a novel method, named TAP-VL, which treats OCR information as a distinct modality and seamlessly integrates it into any VL model.
Optical Character Recognition
Optical Character Recognition (OCR)
no code implementations • 30 Aug 2024 • Roy Ganz, Michael Elad
For the discriminative objective, we employ contrastive adversarial loss, extending the adversarial training objective to the multimodal domain.
1 code implementation • 17 Jun 2024 • Maayan Ehrenberg, Roy Ganz, Nir Rosenfeld
We conduct a series of experiments that show how even mild knowledge regarding the opponent's incentives can be useful, and that the degree of potential gains depends on how these incentives relate to the structure of the learning task.
1 code implementation • 25 May 2024 • Shelly Golan, Roy Ganz, Michael Elad
While the classifier aims to grade an image based on its assignment to a designated class, the discriminator portion of the very same network leverages the softmax values to assess the proximity of the input image to the targeted data manifold, thereby serving as an Energy-based Model.
1 code implementation • CVPR 2025 • Navve Wasserman, Noam Rotstein, Roy Ganz, Ron Kimmel
We address this by leveraging the insight that removing objects (Inpaint) is significantly simpler than its inverse process of adding them (Paint), attributed to the utilization of segmentation mask datasets alongside inpainting models that inpaint within these masks.
no code implementations • CVPR 2024 • Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben Avraham, Oren Nuriel, Shai Mazor, Ron Litman
This integration results in dynamic visual features focusing on relevant image aspects to the posed question.
no code implementations • CVPR 2024 • Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Elad Ben Avraham, Aviad Aberdam, Shahar Tsiper, Ron Litman
The increasing use of transformer-based large language models brings forward the challenge of processing long sequences.
no code implementations • 29 Jun 2023 • Roy Ganz, Michael Elad
Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings.
1 code implementation • 28 May 2023 • Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel
Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model (LLM), yielding comprehensive image descriptions.
Ranked #1 on
Image Captioning
on COCO Captions
(CLIPScore metric)
1 code implementation • 27 Mar 2023 • Tsachi Blau, Roy Ganz, Chaim Baskin, Michael Elad, Alex M. Bronstein
Robust classification methods predominantly concentrate on algorithms that address a specific threat model, resulting in ineffective defenses against other threat models.
no code implementations • ICCV 2023 • Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman
Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image.
no code implementations • ICCV 2023 • Aviad Aberdam, David Bensaïd, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman
Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text.
1 code implementation • 18 Aug 2022 • Bahjat Kawar, Roy Ganz, Michael Elad
In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier.
1 code implementation • 22 Jul 2022 • Roy Ganz, Bahjat Kawar, Michael Elad
In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}.
1 code implementation • 17 Jul 2022 • Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, Michael Elad
Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.
2 code implementations • 8 May 2022 • Aviad Aberdam, Roy Ganz, Shai Mazor, Ron Litman
In a novel setup, consistency is enforced on each modality separately.
no code implementations • 29 Sep 2021 • Roy Ganz, Michael Elad
The interest of the deep learning community in image synthesis has grown massively in recent years.
1 code implementation • 8 Aug 2021 • Roy Ganz, Michael Elad
The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them.
Ranked #7 on
Image Generation
on ImageNet 128x128
no code implementations • 1 Apr 2021 • Roy Ganz, Michael Elad
The interest of the deep learning community in image synthesis has grown massively in recent years.