Search Results for author: Nick Craswell

Found 47 papers, 21 papers with code

The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

no code implementations21 Apr 2025 Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

In the context of the TREC 2024 RAG Track, we calibrate a fully automatic approach against strategies where nuggets are created manually or semi-manually by human assessors and then assigned manually to system answers.

Question Answering RAG

Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

no code implementations21 Apr 2025 Nandan Thakur, Ronak Pradeep, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

Retrieval-augmented generation (RAG) enables large language models (LLMs) to generate answers with citations from source documents containing "ground truth", thereby reducing system hallucinations.

RAG

Judging the Judges: A Collection of LLM-Generated Relevance Judgements

1 code implementation19 Feb 2025 Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz

Using Large Language Models (LLMs) for relevance assessments offers promising opportunities to improve Information Retrieval (IR), Natural Language Processing (NLP), and related fields.

Information Retrieval

JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

1 code implementation17 Dec 2024 Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra

The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors -- a process that is both costly and time-consuming.

Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

1 code implementation14 Nov 2024 Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

Within the TREC setup, we are able to calibrate our fully automatic process against a manual process whereby nuggets are created by human assessors semi-manually and then assigned manually to system answers.

Question Answering RAG

A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

no code implementations13 Nov 2024 Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, Jimmy Lin

This paper reports on the results of a large-scale evaluation (the TREC 2024 RAG Track) where four different relevance assessment approaches were deployed in situ: the "standard" fully manual process that NIST has implemented for decades and three different alternatives that take advantage of LLMs to different extents using the open-source UMBRELA tool.

Information Retrieval RAG

Report on the 1st Workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) at SIGIR 2024

no code implementations9 Aug 2024 Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz

The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024).

Information Retrieval Language Modeling +3

LLMJudge: LLMs for Relevance Judgments

1 code implementation9 Aug 2024 Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli

The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user.

Information Retrieval Retrieval

Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

1 code implementation24 Jun 2024 Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin

In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2. 1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user.

Benchmarking RAG +1

UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

1 code implementation10 Jun 2024 Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin

Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems.

RAG Retrieval

Synthetic Test Collections for Retrieval Evaluation

1 code implementation13 May 2024 Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos

Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems.

Information Retrieval Retrieval

Towards Group-aware Search Success

no code implementations26 Apr 2024 Haolun Wu, Bhaskar Mitra, Nick Craswell

Traditional measures of search success often overlook the varying information needs of different demographic groups.

Large language models can accurately predict searcher preferences

1 code implementation19 Sep 2023 Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra

It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data.

Language Modelling Large Language Model

Patterns of gender-specializing query reformulation

no code implementations25 Apr 2023 Amifa Raj, Bhaskar Mitra, Nick Craswell, Michael D. Ekstrand

There are many ways a query, the search results, and a demographic attribute such as gender may relate, leading us to hypothesize different causes for these reformulation patterns, such as under-representation on the original result page or based on the linguistic theory of markedness.

Attribute

Zero-shot Clarifying Question Generation for Conversational Search

no code implementations30 Jan 2023 Zhenduo Wang, Yuancheng Tu, Corby Rosset, Nick Craswell, Ming Wu, Qingyao Ai

In this work, we innovatively explore generating clarifying questions in a zero-shot setting to overcome the cold start problem and we propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation.

Conversational Search Natural Questions +3

Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems

no code implementations26 Jun 2022 Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury

Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems.

Retrieval

Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

no code implementations21 Jan 2022 Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang

This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways.

Information Retrieval Retrieval

Neural Approaches to Conversational Information Retrieval

no code implementations13 Jan 2022 Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell

A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form.

Information Retrieval Retrieval

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking

1 code implementation20 May 2021 Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury

An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e. g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers.

Document Ranking Knowledge Distillation +1

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime

no code implementations9 May 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin

Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.

Benchmarking

TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime

no code implementations19 Apr 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff

The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available.

Selection bias

Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence

no code implementations19 Apr 2021 Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Transformer layers (both high training and high inference costs).

Document Ranking Retrieval

Overview of the TREC 2020 deep learning track

1 code implementation15 Feb 2021 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos

This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime.

Deep Learning Passage Retrieval +1

Conformer-Kernel with Query Term Independence at TREC 2020 Deep Learning Track

no code implementations14 Nov 2020 Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track.

Retrieval

Conformer-Kernel with Query Term Independence for Document Retrieval

1 code implementation20 Jul 2020 Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell

In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption.

Retrieval

Scalable Methods for Calculating Term Co-Occurrence Frequencies

no code implementations17 Jul 2020 Bodo Billerbeck, Justin Zobel, Nicholas Lester, Nick Craswell

Search techniques make use of elementary information such as term frequencies and document lengths in computation of similarity weighting.

MIMICS: A Large-Scale Data Collection for Search Clarification

1 code implementation17 Jun 2020 Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, Nick Craswell

In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs.

2k

ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search

no code implementations9 Jun 2020 Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck

Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval.

Information Retrieval Retrieval

Analyzing and Learning from User Interactions for Search Clarification

no code implementations30 May 2020 Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais

We also propose a model for learning representation for clarifying questions based on the user interaction data as implicit feedback.

Re-Ranking Retrieval

Local Self-Attention over Long Text for Efficient Document Retrieval

1 code implementation11 May 2020 Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury

In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window.

Document Ranking Retrieval

On the Reliability of Test Collections for Evaluating Systems of Different Types

no code implementations28 Apr 2020 Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos

As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality.

Deep Learning Fairness +3

Overview of the TREC 2019 deep learning track

2 code implementations17 Mar 2020 Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees

The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.

Deep Learning Passage Retrieval +2

Macaw: An Extensible Conversational Information Seeking Platform

1 code implementation18 Dec 2019 Hamed Zamani, Nick Craswell

Such research will require data and tools, to allow the implementation and study of conversational systems.

Information Retrieval Question Answering +1

Duet at TREC 2019 Deep Learning Track

1 code implementation10 Dec 2019 Bhaskar Mitra, Nick Craswell

This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019.

Deep Learning Learning-To-Rank +2

Generic Intent Representation in Web Search

no code implementations24 Jul 2019 Hongfei Zhang, Xia Song, Chenyan Xiong, Corby Rosset, Paul N. Bennett, Nick Craswell, Saurabh Tiwary

This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search.

Multi-Task Learning

An Axiomatic Approach to Regularizing Neural Ranking Models

no code implementations15 Apr 2019 Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, Saurabh Tiwary

The training of these models involve a search for appropriate parameter values based on large quantities of labeled examples.

Information Retrieval parameter estimation +1

An Updated Duet Model for Passage Re-ranking

1 code implementation18 Mar 2019 Bhaskar Mitra, Nick Craswell

We propose several small modifications to Duet---a deep neural ranking model---and evaluate the updated model on the MS MARCO passage ranking task.

model Passage Ranking +2

Neural Models for Information Retrieval

no code implementations3 May 2017 Bhaskar Mitra, Nick Craswell

Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query.

BIG-bench Machine Learning Information Retrieval +2

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

13 code implementations28 Nov 2016 Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Benchmarking Machine Reading Comprehension +1

Learning to Match Using Local and Distributed Representations of Text for Web Search

1 code implementation Proceedings of the 26th International Conference on World Wide Web, WWW '17 2017 Bhaskar Mitra, Fernando Diaz, Nick Craswell

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space.

Document Ranking Information Retrieval +1

Query Expansion with Locally-Trained Word Embeddings

no code implementations ACL 2016 Fernando Diaz, Bhaskar Mitra, Nick Craswell

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships.

Ad-Hoc Information Retrieval BIG-bench Machine Learning +3

A Dual Embedding Space Model for Document Ranking

no code implementations2 Feb 2016 Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana

A fundamental goal of search engines is to identify, given a query, documents that have relevant text.

Document Ranking model +1

Cannot find the paper you are looking for? You can Submit a new open access paper.