Search Results for author: Roy Ganz

Found 14 papers, 7 papers with code

Question Aware Vision Transformer for Multimodal Reasoning

no code implementations • 8 Feb 2024 • Roy Ganz, Yair Kittenplon, Aviad Aberdam, Elad Ben Avraham, Oren Nuriel, Shai Mazor, Ron Litman

This integration results in dynamic visual features focusing on relevant image aspects to the posed question.

Language Modelling Large Language Model +1

Paper
Add Code

GRAM: Global Reasoning for Multi-Page VQA

no code implementations • 7 Jan 2024 • Tsachi Blau, Sharon Fogel, Roi Ronen, Alona Golts, Roy Ganz, Elad Ben Avraham, Aviad Aberdam, Shahar Tsiper, Ron Litman

The increasing use of transformer-based large language models brings forward the challenge of processing long sequences.

Question Answering Visual Question Answering

Paper
Add Code

CLIPAG: Towards Generator-Free Text-to-Image Generation

no code implementations • 29 Jun 2023 • Roy Ganz, Michael Elad

Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human perception and pose semantic meanings.

Image Classification Text-to-Image Generation

Paper
Add Code

FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions

1 code implementation • 28 May 2023 • Noam Rotstein, David Bensaid, Shaked Brody, Roy Ganz, Ron Kimmel

Our proposed method, FuseCap, fuses the outputs of such vision experts with the original captions using a large language model (LLM), yielding comprehensive image descriptions.

Ranked #1 on Image Captioning on COCO Captions (CLIPScore metric)

Attribute Image Captioning +5

Paper
Code

Classifier Robustness Enhancement Via Test-Time Transformation

1 code implementation • 27 Mar 2023 • Tsachi Blau, Roy Ganz, Chaim Baskin, Michael Elad, Alex Bronstein

We show that the proposed method achieves state-of-the-art results and validate our claim through extensive experiments on a variety of defense methods, classifier architectures, and datasets.

Adversarial Attack

Paper
Code

Towards Models that Can See and Read

no code implementations • ICCV 2023 • Roy Ganz, Oren Nuriel, Aviad Aberdam, Yair Kittenplon, Shai Mazor, Ron Litman

Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require reasoning from the text in the image.

Image Captioning Question Answering +1

Paper
Add Code

CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

no code implementations • ICCV 2023 • Aviad Aberdam, David Bensaïd, Alona Golts, Roy Ganz, Oren Nuriel, Royee Tichauer, Shai Mazor, Ron Litman

Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text.

Language Modelling Scene Text Recognition

Paper
Add Code

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

1 code implementation • 18 Aug 2022 • Bahjat Kawar, Roy Ganz, Michael Elad

In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier.

Denoising Image Generation

Paper
Code

Do Perceptually Aligned Gradients Imply Adversarial Robustness?

1 code implementation • 22 Jul 2022 • Roy Ganz, Bahjat Kawar, Michael Elad

In this work, we focus on this trait and test whether \emph{Perceptually Aligned Gradients imply Robustness}.

Adversarial Robustness Image Classification

Paper
Code

Threat Model-Agnostic Adversarial Defense using Diffusion Models

1 code implementation • 17 Jul 2022 • Tsachi Blau, Roy Ganz, Bahjat Kawar, Alex Bronstein, Michael Elad

Deep Neural Networks (DNNs) are highly sensitive to imperceptible malicious perturbations, known as adversarial attacks.

Adversarial Defense Denoising

Paper
Code

Multimodal Semi-Supervised Learning for Text Recognition

2 code implementations • 8 May 2022 • Aviad Aberdam, Roy Ganz, Shai Mazor, Ron Litman

In a novel setup, consistency is enforced on each modality separately.

Language Modelling Representation Learning +2

Paper
Code

Improved Image Generation via Sparsity

no code implementations • 29 Sep 2021 • Roy Ganz, Michael Elad

The interest of the deep learning community in image synthesis has grown massively in recent years.

Image Generation

Paper
Add Code

BIGRoC: Boosting Image Generation via a Robust Classifier

1 code implementation • 8 Aug 2021 • Roy Ganz, Michael Elad

The interest of the machine learning community in image synthesis has grown significantly in recent years, with the introduction of a wide range of deep generative models and means for training them.

Ranked #4 on Image Generation on ImageNet 128x128

Image Generation

Paper
Code

Improved Image Generation via Sparse Modeling

no code implementations • 1 Apr 2021 • Roy Ganz, Michael Elad

The interest of the deep learning community in image synthesis has grown massively in recent years.

Image Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.