no code implementations • 9 Feb 2025 • Ohav Barbi, Ori Yoran, Mor Geva
Multi-agent systems, where specialized agents collaborate to solve a shared task hold great potential, from increased modularity to simulating complex environments.
2 code implementations • 6 Dec 2024 • Thibault Le Sellier De Chezelles, Maxime Gasse, Alexandre Drouin, Massimo Caccia, Léo Boisvert, Megh Thakkar, Tom Marty, Rim Assouel, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F. Xu, Siva Reddy, Quentin Cappart, Graham Neubig, Ruslan Salakhutdinov, Nicolas Chapados, Alexandre Lacoste
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging automation and Large Language Models (LLMs) for web interaction tasks.
no code implementations • 22 Jul 2024 • Ori Yoran, Samuel Joseph Amouyal, Chaitanya Malaviya, Ben Bogin, Ofir Press, Jonathan Berant
Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web.
1 code implementation • 8 Jul 2024 • Maor Ivgi, Ori Yoran, Jonathan Berant, Mor Geva
Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i. e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations.
2 code implementations • 2 Oct 2023 • Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant, and does not harm performance when it is not.
Ranked #3 on
Question Answering
on Bamboogle
1 code implementation • 24 Jul 2023 • Roi Cohen, Eden Biran, Ori Yoran, Amir Globerson, Mor Geva
This has led to the development of various editing methods that allow updating facts encoded by the model.
1 code implementation • 25 Apr 2023 • Ori Yoran, Tomer Wolfson, Ben Bogin, Uri Katz, Daniel Deutch, Jonathan Berant
Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer.
Ranked #2 on
Question Answering
on Bamboogle
2 code implementations • 25 May 2022 • Samuel Joseph Amouyal, Tomer Wolfson, Ohad Rubin, Ori Yoran, Jonathan Herzig, Jonathan Berant
Our results highlight the need for developing ODQA models that handle a broad range of question types, including single and multi-answer questions.
no code implementations • 14 Jan 2022 • Alon Talmor, Ori Yoran, Ronan Le Bras, Chandra Bhagavatula, Yoav Goldberg, Yejin Choi, Jonathan Berant
Constructing benchmarks that test the abilities of modern natural language understanding models is difficult - pre-trained language models exploit artifacts in benchmarks to achieve human parity, but still fail on adversarial examples and make errors that demonstrate a lack of common sense.
2 code implementations • 10 Jan 2022 • Uri Shaham, Elad Segal, Maor Ivgi, Avia Efrat, Ori Yoran, Adi Haviv, Ankit Gupta, Wenhan Xiong, Mor Geva, Jonathan Berant, Omer Levy
NLP benchmarks have largely focused on short texts, such as sentences and paragraphs, even though long texts comprise a considerable amount of natural language in the wild.
Ranked #8 on
Long-range modeling
on SCROLLS
1 code implementation • ACL 2022 • Ori Yoran, Alon Talmor, Jonathan Berant
Models pre-trained with a language modeling objective possess ample world knowledge and language skills, but are known to struggle in tasks that require reasoning.
no code implementations • ICLR 2021 • Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant
When answering complex questions, people can seamlessly combine information from visual, textual and tabular sources.