no code implementations • 15 Jul 2025 • Orion Weller, Kathryn Ricci, Marc Marone, Antoine Chaffin, Dawn Lawrie, Benjamin Van Durme
However, a large subset of the community still uses encoder-only models for tasks such as classification or retrieval.
1 code implementation • 20 May 2025 • Eugene Yang, Andrew Yates, Kathryn Ricci, Orion Weller, Vivek Chari, Benjamin Van Durme, Dawn Lawrie
In this work, we introduce Rank-K, a listwise passage reranking model that leverages the reasoning capability of the reasoning language model at query time that provides test time scalability to serve hard queries.
no code implementations • 14 Apr 2025 • Eugene Yang, Nicola Tonellotto, Dawn Lawrie, Sean MacAvaney, James Mayfield, Douglas W. Oard, Scott Miller
The Internet produces a continuous stream of new documents and user-generated queries.
1 code implementation • 25 Feb 2025 • Orion Weller, Kathryn Ricci, Eugene Yang, Andrew Yates, Dawn Lawrie, Benjamin Van Durme
We introduce Rank1, the first reranking model trained to take advantage of test-time compute.
1 code implementation • 31 Jan 2025 • Orion Weller, Benjamin Chang, Eugene Yang, Mahsa Yarmohammadi, Sam Barham, Sean MacAvaney, Arman Cohan, Luca Soldaini, Benjamin Van Durme, Dawn Lawrie
We see strong cross-lingual performance with English-based retrievers that trained using instructions, but find a notable drop in performance in the multilingual setting, indicating that more work is needed in developing data for instruction-based multilingual retrievers.
1 code implementation • 17 Sep 2024 • Orion Weller, Benjamin Van Durme, Dawn Lawrie, Ashwin Paranjape, Yuhao Zhang, Jack Hessel
Instruction-tuned language models (LM) are able to respond to imperative commands, providing a more natural user interface compared to their base counterparts.
1 code implementation • 24 Jun 2024 • Abe Bohan Hou, Orion Weller, Guanghui Qin, Eugene Yang, Dawn Lawrie, Nils Holzenberger, Andrew Blair-Stanek, Benjamin Van Durme
This dataset CLERC (Case Law Evaluation Retrieval Corpus), is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations (as well as previous context) into a cogent analysis that supports a reasoning goal.
1 code implementation • 2 May 2024 • Eugene Yang, Dawn Lawrie, James Mayfield
Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation.
no code implementations • 2 May 2024 • James Mayfield, Eugene Yang, Dawn Lawrie, Sean MacAvaney, Paul McNamee, Douglas W. Oard, Luca Soldaini, Ian Soboroff, Orion Weller, Efsun Kayi, Kate Sanders, Marc Mason, Noah Hibbler
Reports with these qualities are necessary to satisfy the complex, nuanced, or multi-faceted information needs of users.
1 code implementation • 2 May 2024 • Eugene Yang, Thomas Jänich, James Mayfield, Dawn Lawrie
We also evaluate real MLIR systems on two publicly available benchmarks and show that the PEER scores align with prior analytical findings on MLIR fairness.
1 code implementation • 2 May 2024 • Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard
PLAID, an efficient implementation of the ColBERT late interaction bi-encoder using pretrained language models for ranking, consistently achieves state-of-the-art performance in monolingual, cross-language, and multilingual retrieval.
1 code implementation • 29 Apr 2024 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard, Kevin Duh
Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora.
no code implementations • 11 Apr 2024 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
The principal tasks are ranked retrieval of news in one of the three languages, using English topics.
no code implementations • 11 Apr 2024 • Eugene Yang, Dawn Lawrie, James Mayfield
TT trains a ColBERT model with English queries and passages automatically translated into the document language from the MS-MARCO v1 collection.
1 code implementation • 22 Mar 2024 • Orion Weller, Benjamin Chang, Sean MacAvaney, Kyle Lo, Arman Cohan, Benjamin Van Durme, Dawn Lawrie, Luca Soldaini
First, we introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark as well as a training set for helping IR models learn to better follow real-world instructions.
1 code implementation • 19 Mar 2024 • Jeffrey Cheng, Marc Marone, Orion Weller, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme
Using this analysis, we find that effective cutoffs often differ from reported cutoffs.
1 code implementation • 9 Jan 2024 • Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard, Scott Miller
Applying a similar knowledge distillation approach to training an efficient dual-encoder model for Cross-Language Information Retrieval (CLIR), where queries and documents are in different languages, is challenging due to the lack of a sufficiently large training collection when the query and document languages differ.
1 code implementation • 15 Sep 2023 • Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme, Arman Cohan, Luca Soldaini
Using large language models (LMs) for query or document expansion can improve generalization in information retrieval.
1 code implementation • 22 May 2023 • Orion Weller, Marc Marone, Nathaniel Weir, Dawn Lawrie, Daniel Khashabi, Benjamin Van Durme
Large Language Models (LLMs) may hallucinate and generate fake information, despite pre-training on factual data.
1 code implementation • 12 May 2023 • Orion Weller, Dawn Lawrie, Benjamin Van Durme
Although the Information Retrieval (IR) community has adopted LMs as the backbone of modern IR architectures, there has been little to no research in understanding how negation impacts neural IR.
no code implementations • 29 Apr 2023 • James Mayfield, Eugene Yang, Dawn Lawrie, Samuel Barham, Orion Weller, Marc Mason, Suraj Nair, Scott Miller
By repeating this process, collections of arbitrary size can be created in the style of MS MARCO but using naturally-occurring documents in any desired genre and domain of discourse.
no code implementations • 24 Apr 2023 • Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang
This is the first year of the TREC Neural CLIR (NeuCLIR) track, which aims to study the impact of neural approaches to cross-language information retrieval.
1 code implementation • 20 Dec 2022 • Orion Weller, Aleem Khan, Nathaniel Weir, Dawn Lawrie, Benjamin Van Durme
Recent work in open-domain question answering (ODQA) has shown that adversarial poisoning of the search collection can cause large drops in accuracy for production systems.
no code implementations • 20 Dec 2022 • Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield, Douglas W. Oard
By adding adapters pretrained on language tasks for a specific language with task-specific adapters, prior work has shown that the adapter-enhanced models perform better than fine-tuning the entire model when transferring across languages in various NLP tasks.
no code implementations • 20 Dec 2022 • Kangda Wei, Dawn Lawrie, Benjamin Van Durme, Yunmo Chen, Orion Weller
Answering complex questions often requires multi-step reasoning in order to obtain the final answer.
1 code implementation • 3 Sep 2022 • Dawn Lawrie, Eugene Yang, Douglas W. Oard, James Mayfield
Providing access to information across languages has been a goal of Information Retrieval (IR) for decades.
1 code implementation • NAACL 2022 • Orion Weller, Marc Marone, Vladimir Braverman, Dawn Lawrie, Benjamin Van Durme
Since the advent of Federated Learning (FL), research has applied these methods to natural language processing (NLP) tasks.
1 code implementation • 24 Jan 2022 • Dawn Lawrie, James Mayfield, Douglas Oard, Eugene Yang
HC4 is a new suite of test collections for ad hoc Cross-Language Information Retrieval (CLIR), with Common Crawl News documents in Chinese, Persian, and Russian, topics in English and in the document languages, and graded relevance judgments.
1 code implementation • 24 Jan 2022 • Cash Costello, Eugene Yang, Dawn Lawrie, James Mayfield
While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR).
1 code implementation • 20 Jan 2022 • Suraj Nair, Eugene Yang, Dawn Lawrie, Kevin Duh, Paul McNamee, Kenton Murray, James Mayfield, Douglas W. Oard
These models have improved the effectiveness of retrieval systems well beyond that of lexical term matching models such as BM25.
no code implementations • LREC 2020 • Dawn Lawrie, James Mayfield, David Etter
This means that named entities are annotated on the transcribed text.
1 code implementation • 6 Mar 2020 • Chan Hee Song, Dawn Lawrie, Tim Finin, James Mayfield
The goal of this work is to improve the performance of a neural named entity recognition system by adding input features that indicate a word is part of a name included in a gazetteer.
no code implementations • IJCNLP 2017 • Benjamin Van Durme, Tom Lippincott, Kevin Duh, Deana Burchfield, Adam Poliak, Cash Costello, Tim Finin, Scott Miller, James Mayfield, Philipp Koehn, Craig Harman, Dawn Lawrie, Ch May, ler, Max Thomas, Annabelle Carrell, Julianne Chaloux, Tongfei Chen, Alex Comerford, Mark Dredze, Benjamin Glass, Shudong Hao, Patrick Martin, Pushpendre Rastogi, Rashmi Sankepally, Travis Wolfe, Ying-Ying Tran, Ted Zhang
It combines a multitude of analytics together with a flexible environment for customizing the workflow for different users.
no code implementations • LREC 2012 • Dawn Lawrie, James Mayfield, Paul McNamee, Douglas Oard
To stimulate research in cross-language entity linking, we present a new test collection for evaluating the accuracy of cross-language entity linking in twenty-one languages.