Search Results for author: Ori Yoran

Found 12 papers, 8 papers with code

Preventing Rogue Agents Improves Multi-Agent Collaboration

no code implementations9 Feb 2025 Ohav Barbi, Ori Yoran, Mor Geva

Multi-agent systems, where specialized agents collaborate to solve a shared task hold great potential, from increased modularity to simulating complex environments.

The BrowserGym Ecosystem for Web Agent Research

2 code implementations6 Dec 2024 Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, Léo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig, Ruslan Salakhutdinov, Nicolas Chapados, Alexandre Lacoste

The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs) for web interaction tasks.

Benchmarking

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

no code implementations22 Jul 2024 Ori Yoran, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant

Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web.

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

1 code implementation8 Jul 2024 Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva

Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i. e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations.

Instruction Following

Making Retrieval-Augmented Language Models Robust to Irrelevant Context

2 code implementations2 Oct 2023 Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant

An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not.

Language Modelling Natural Language Inference +2

Evaluating the Ripple Effects of Knowledge Editing in Language Models

1 code implementation24 Jul 2023 Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva

This has led to the development of various editing methods that allow updating facts encoded by the model.

knowledge editing

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

1 code implementation25 Apr 2023 Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer.

Multi-hop Question Answering Question Answering

QAMPARI: An Open-domain Question Answering Benchmark for Questions with Many Answers from Multiple Paragraphs

2 code implementations25 May 2022 Samuel Joseph Amouyal, Tomer Wolfson, Ohad Rubin, Ori Yoran, Jonathan Herzig, Jonathan Berant

Our results highlight the need for developing ODQA models that handle a broad range of question types, including single and multi-answer questions.

Answer Generation Natural Questions +3

CommonsenseQA 2.0: Exposing the Limits of AI through Gamification

no code implementations14 Jan 2022 Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant

Constructing benchmarks that test the abilities of modern natural language understanding models is difficult - pre-trained language models exploit artifacts in benchmarks to achieve human parity, but still fail on adversarial examples and make errors that demonstrate a lack of common sense.

Common Sense Reasoning Natural Language Understanding

SCROLLS: Standardized CompaRison Over Long Language Sequences

2 code implementations10 Jan 2022 Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy

NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild.

Decoder Long-range modeling +2

Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills

1 code implementation ACL 2022 Ori Yoran, Alon Talmor, Jonathan Berant

Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning.

Decoder Language Modeling +3

Cannot find the paper you are looking for? You can Submit a new open access paper.