Search Results for author: Yonatan Bitton

Found 17 papers, 11 papers with code

ParallelPARC: A Scalable Pipeline for Generating Natural-Language Analogies

1 code implementation • 2 Mar 2024 • Oren Sultan, Yonatan Bitton, Ron Yosef, Dafna Shahaf

We demonstrate our pipeline and create ProPara-Logy, a dataset of analogies between scientific processes.

Paper
Code

A Chain-of-Thought Is as Strong as Its Weakest Link: A Benchmark for Verifiers of Reasoning Chains

no code implementations • 1 Feb 2024 • Alon Jacovi, Yonatan Bitton, Bernd Bohnet, Jonathan Herzig, Or Honovich, Michael Tseng, Michael Collins, Roee Aharoni, Mor Geva

REVEAL includes comprehensive labels for the relevance, attribution to evidence passages, and logical correctness of each reasoning step in a language model's answer, across a variety of datasets and state-of-the-art language models.

Open-Domain Question Answering

Paper
Add Code

Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment

no code implementations • 5 Dec 2023 • Brian Gordon, Yonatan Bitton, Yonatan Shafir, Roopal Garg, Xi Chen, Dani Lischinski, Daniel Cohen-Or, Idan Szpektor

While existing image-text alignment models reach high quality binary assessments, they fall short of pinpointing the exact source of misalignment.

Explanation Generation Visual Grounding

Paper
Add Code

VideoCon: Robust Video-Language Alignment via Contrast Captions

1 code implementation • 15 Nov 2023 • Hritik Bansal, Yonatan Bitton, Idan Szpektor, Kai-Wei Chang, Aditya Grover

Despite being (pre)trained on a massive amount of data, state-of-the-art video-language alignment models are not robust to semantically-plausible contrastive changes in the video captions.

Language Modelling Large Language Model +5

Paper
Code

VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

1 code implementation • 12 Aug 2023 • Yonatan Bitton, Hritik Bansal, Jack Hessel, Rulin Shao, Wanrong Zhu, Anas Awadalla, Josh Gardner, Rohan Taori, Ludwig Schmidt

These descriptions enable 1) collecting human-verified reference outputs for each instance; and 2) automatic evaluation of candidate multimodal generations using a text-only LLM, aligning with human judgment.

Instruction Following

Paper
Code

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

2 code implementations • 2 Aug 2023 • Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt

We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters.

Ranked #14 on Visual Question Answering (VQA) on InfiMM-Eval

Visual Question Answering

3,449

Paper
Code

Read, Look or Listen? What's Needed for Solving a Multimodal Dataset

no code implementations • 6 Jul 2023 • Netta Madvil, Yonatan Bitton, Roy Schwartz

We propose a two-step method to analyze multimodal datasets, which leverages a small seed of human annotation to map each multimodal instance to the modalities required to process it.

Question Answering Speaker Identification +1

Paper
Add Code

Transferring Visual Attributes from Natural Language to Verified Image Generation

no code implementations • 24 May 2023 • Rodrigo Valerio, Joao Bordalo, Michal Yarom, Yonatan Bitton, Idan Szpektor, Joao Magalhaes

In this paper, we propose to strengthen the consistency property of T2I methods in the presence of natural complex language, which often breaks the limits of T2I methods by including non-visual information, and textual elements that require knowledge for accurate generation.

Text-to-Image Generation Visual Question Answering (VQA)

Paper
Add Code

What You See is What You Read? Improving Text-Image Alignment Evaluation

1 code implementation • NeurIPS 2023 • Michal Yarom, Yonatan Bitton, Soravit Changpinyo, Roee Aharoni, Jonathan Herzig, Oran Lang, Eran Ofek, Idan Szpektor

Automatically determining whether a text and a corresponding image are semantically aligned is a significant challenge for vision-language models, with applications in generative text-to-image and image-to-text tasks.

Ranked #11 on Visual Reasoning on Winoground

Question Answering Question Generation +5

Paper
Code

DataComp: In search of the next generation of multimodal datasets

1 code implementation • NeurIPS 2023 • Samir Yitzhak Gadre, Gabriel Ilharco, Alex Fang, Jonathan Hayase, Georgios Smyrnis, Thao Nguyen, Ryan Marten, Mitchell Wortsman, Dhruba Ghosh, Jieyu Zhang, Eyal Orgad, Rahim Entezari, Giannis Daras, Sarah Pratt, Vivek Ramanujan, Yonatan Bitton, Kalyani Marathe, Stephen Mussmann, Richard Vencu, Mehdi Cherti, Ranjay Krishna, Pang Wei Koh, Olga Saukh, Alexander Ratner, Shuran Song, Hannaneh Hajishirzi, Ali Farhadi, Romain Beaumont, Sewoong Oh, Alex Dimakis, Jenia Jitsev, Yair Carmon, Vaishaal Shankar, Ludwig Schmidt

Multimodal datasets are a critical component in recent breakthroughs such as Stable Diffusion and GPT-4, yet their design does not receive the same research attention as model architectures or training algorithms.

Paper
Code

q2d: Turning Questions into Dialogs to Teach Models How to Search

no code implementations • 27 Apr 2023 • Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response.

Language Modelling Large Language Model +1

Paper
Add Code

IRFL: Image Recognition of Figurative Language

1 code implementation • 27 Mar 2023 • Ron Yosef, Yonatan Bitton, Dafna Shahaf

We release our dataset, benchmark, and code, in hopes of driving the development of models that can better understand figurative language.

Ranked #1 on Classification on IRFL: Image Recognition of Figurative Language

Classification Visual Reasoning

Paper
Code

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

no code implementations • ICCV 2023 • Nitzan Bitton-Guetta, Yonatan Bitton, Jack Hessel, Ludwig Schmidt, Yuval Elovici, Gabriel Stanovsky, Roy Schwartz

We introduce WHOOPS!, a new dataset and benchmark for visual commonsense.

Ranked #1 on Image-to-Text Retrieval on WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images (using extra training data)

Common Sense Reasoning Explanation Generation +6

Paper
Add Code

VASR: Visual Analogies of Situation Recognition

1 code implementation • 8 Dec 2022 • Yonatan Bitton, Ron Yosef, Eli Strugo, Dafna Shahaf, Roy Schwartz, Gabriel Stanovsky

We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies.

Ranked #1 on Visual Reasoning on VASR

Common Sense Reasoning Visual Analogies +1

Paper
Code

WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models

1 code implementation • 25 Jul 2022 • Yonatan Bitton, Nitzan Bitton Guetta, Ron Yosef, Yuval Elovici, Mohit Bansal, Gabriel Stanovsky, Roy Schwartz

While vision-and-language models perform well on tasks such as visual question answering, they struggle when it comes to basic human commonsense reasoning skills.

Ranked #1 on Common Sense Reasoning on WinoGAViL

Common Sense Reasoning General Knowledge +4

Paper
Code

Data Efficient Masked Language Modeling for Vision and Language

1 code implementation • Findings (EMNLP) 2021 • Yonatan Bitton, Gabriel Stanovsky, Michael Elhadad, Roy Schwartz

We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings, aiming for better fusion of text and image in the learned representation.

Language Modelling Masked Language Modeling +1

Paper
Code

Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA

2 code implementations • NAACL 2021 • Yonatan Bitton, Gabriel Stanovsky, Roy Schwartz, Michael Elhadad

Recent works have shown that supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution.

Question Answering Relational Reasoning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.