Search Results for author: Orion Weller

Found 24 papers, 13 papers with code

CLERC: A Dataset for Legal Case Retrieval and Retrieval-Augmented Analysis Generation

1 code implementation24 Jun 2024 Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme

This dataset CLERC (Case Law Evaluation Retrieval Corpus), is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations (as well as previous context) into a cogent analysis that supports a reasoning goal.

Information Retrieval RAG +1

Learning to Reason via Program Generation, Emulation, and Search

1 code implementation25 May 2024 Nathaniel Weir, Muhammad Khalifa, Linlu Qiu, Orion Weller, Peter Clark

CoGEX works by (1) training LMs to generate their own pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one.

Code Generation In-Context Learning +1

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions

1 code implementation22 Mar 2024 Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini

First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions.

Information Retrieval Text Retrieval

Dated Data: Tracing Knowledge Cutoffs in Large Language Models

no code implementations19 Mar 2024 Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

Using this analysis, we find that effective cutoffs often differ from reported cutoffs.

Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic

no code implementations22 Feb 2024 Nathaniel Weir, Kate Sanders, Orion Weller, Shreya Sharma, Dongwei Jiang, Zhengping Jiang, Bhavana Dalvi Mishra, Oyvind Tafjord, Peter Jansen, Peter Clark, Benjamin Van Durme

Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic.

Formal Logic Knowledge Distillation +2

MegaWika: Millions of reports and their sources across 50 diverse languages

no code implementations13 Jul 2023 Samuel Barham, Orion Weller, Michelle Yuan, Kenton Murray, Mahsa Yarmohammadi, Zhengping Jiang, Siddharth Vashishtha, Alexander Martin, Anqi Liu, Aaron Steven White, Jordan Boyd-Graber, Benjamin Van Durme

To foster the development of new models for collaborative AI-assisted report generation, we introduce MegaWika, consisting of 13 million Wikipedia articles in 50 diverse languages, along with their 71 million referenced source materials.

Cross-Lingual Question Answering Retrieval +1

"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data

1 code implementation22 May 2023 Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme

Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data.

NevIR: Negation in Neural Information Retrieval

1 code implementation12 May 2023 Orion Weller, Dawn Lawrie, Benjamin Van Durme

Although the Information Retrieval (IR) community has adopted LMs as the backbone of modern IR architectures, there has been little to no research in understanding how negation impacts neural IR.

Information Retrieval Negation +1

Synthetic Cross-language Information Retrieval Training Data

no code implementations29 Apr 2023 James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, Scott Miller

By repeating this process, collections of arbitrary size can be created in the style of MS MARCO but using naturally-occurring documents in any desired genre and domain of discourse.

Information Retrieval Language Modelling +4

Defending Against Disinformation Attacks in Open-Domain Question Answering

1 code implementation20 Dec 2022 Orion Weller, Aleem Khan, Nathaniel Weir, Dawn Lawrie, Benjamin Van Durme

Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems.

Data Poisoning Misinformation +1

When Do Decompositions Help for Machine Reading?

no code implementations20 Dec 2022 Kangda Wei, Dawn Lawrie, Benjamin Van Durme, Yunmo Chen, Orion Weller

Answering complex questions often requires multi-step reasoning in order to obtain the final answer.

Reading Comprehension Retrieval

When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning

1 code implementation ACL 2022 Orion Weller, Kevin Seppi, Matt Gardner

We find that there is a simple heuristic for when to use one of these techniques over the other: pairwise MTL is better than STILTs when the target task has fewer instances than the supporting task and vice versa.

Multi-Task Learning

Exploring the Relationship Between Algorithm Performance, Vocabulary, and Run-Time in Text Classification

1 code implementation NAACL 2021 Wilson Fearn, Orion Weller, Kevin Seppi

Text classification is a significant branch of natural language processing, and has many applications including document classification and sentiment analysis.

Document Classification General Classification +2

Streaming Models for Joint Speech Recognition and Translation

no code implementations EACL 2021 Orion Weller, Matthias Sperber, Christian Gollan, Joris Kluivers

However, all previous work has only looked at this problem from the consecutive perspective, leaving uncertainty on whether these approaches are effective in the more challenging streaming setting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

You Don't Have Time to Read This: An Exploration of Document Reading Time Prediction

no code implementations ACL 2020 Orion Weller, Hildebr, Jordan t, Ilya Reznik, Christopher Challis, E. Shannon Tass, Quinn Snell, Kevin Seppi

Predicting reading time has been a subject of much previous work, focusing on how different words affect human processing, measured by reading time.

The rJokes Dataset: a Large Scale Humor Collection

no code implementations LREC 2020 Orion Weller, Kevin Seppi

We also introduce this dataset as a task for future work, where models learn to predict the level of humor in a joke.

Cultural Vocal Bursts Intensity Prediction

Humor Detection: A Transformer Gets the Last Laugh

2 code implementations IJCNLP 2019 Orion Weller, Kevin Seppi

These experiments show that this method outperforms all previous work done on these tasks, with an F-measure of 93. 1% for the Puns dataset and 98. 6% on the Short Jokes dataset.

Humor Detection Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.