no code implementations • 21 Apr 2025 • Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin
In the context of the TREC 2024 RAG Track, we calibrate a fully automatic approach against strategies where nuggets are created manually or semi-manually by human assessors and then assigned manually to system answers.
no code implementations • 21 Apr 2025 • Nandan Thakur, Ronak Pradeep, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin
Retrieval-augmented generation (RAG) enables large language models (LLMs) to generate answers with citations from source documents containing "ground truth", thereby reducing system hallucinations.
1 code implementation • 19 Feb 2025 • Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz
Using Large Language Models (LLMs) for relevance assessments offers promising opportunities to improve Information Retrieval (IR), Natural Language Processing (NLP), and related fields.
1 code implementation • 17 Dec 2024 • Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra
The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors -- a process that is both costly and time-consuming.
1 code implementation • 14 Nov 2024 • Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin
Within the TREC setup, we are able to calibrate our fully automatic process against a manual process whereby nuggets are created by human assessors semi-manually and then assigned manually to system answers.
no code implementations • 13 Nov 2024 • Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, Jimmy Lin
This paper reports on the results of a large-scale evaluation (the TREC 2024 RAG Track) where four different relevance assessment approaches were deployed in situ: the "standard" fully manual process that NIST has implemented for decades and three different alternatives that take advantage of LLMs to different extents using the open-source UMBRELA tool.
no code implementations • 29 Aug 2024 • Hossein A. Rahmani, Xi Wang, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas
Large-scale test collections play a crucial role in Information Retrieval (IR) research.
no code implementations • 9 Aug 2024 • Hossein A. Rahmani, Clemencia Siro, Mohammad Aliannejadi, Nick Craswell, Charles L. A. Clarke, Guglielmo Faggioli, Bhaskar Mitra, Paul Thomas, Emine Yilmaz
The first edition of the workshop on Large Language Model for Evaluation in Information Retrieval (LLM4Eval 2024) took place in July 2024, co-located with the ACM SIGIR Conference 2024 in the USA (SIGIR 2024).
1 code implementation • 9 Aug 2024 • Hossein A. Rahmani, Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Paul Thomas, Charles L. A. Clarke, Mohammad Aliannejadi, Clemencia Siro, Guglielmo Faggioli
The evaluation and tuning of a search system is largely based on relevance labels, which indicate whether a document is useful for a specific search and user.
1 code implementation • 24 Jun 2024 • Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin
In our work, we lay out the steps we've made towards making this track a reality -- we describe the details of our reusable framework, Ragnar\"ok, explain the curation of the new MS MARCO V2. 1 collection choice, release the development topics for the track, and standardize the I/O definitions which assist the end user.
1 code implementation • 10 Jun 2024 • Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin
Copious amounts of relevance judgments are necessary for the effective training and accurate evaluation of retrieval systems.
1 code implementation • 13 May 2024 • Hossein A. Rahmani, Nick Craswell, Emine Yilmaz, Bhaskar Mitra, Daniel Campos
Previous studies demonstrate that LLMs have the potential to generate synthetic relevance judgments for use in the evaluation of IR systems.
1 code implementation • 13 May 2024 • Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik, Harsha Vardhan Simhadri, Manik Varma, Yujing Wang, Linjun Yang, Mao Yang, Ce Zhang
Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals.
no code implementations • 26 Apr 2024 • Haolun Wu, Bhaskar Mitra, Nick Craswell
Traditional measures of search success often overlook the varying information needs of different demographic groups.
1 code implementation • 19 Sep 2023 • Paul Thomas, Seth Spielman, Nick Craswell, Bhaskar Mitra
It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data.
no code implementations • 25 Apr 2023 • Amifa Raj, Bhaskar Mitra, Nick Craswell, Michael D. Ekstrand
There are many ways a query, the search results, and a demographic attribute such as gender may relate, leading us to hypothesize different causes for these reformulation patterns, such as under-representation on the original result page or based on the linguistic theory of markedness.
no code implementations • 30 Jan 2023 • Zhenduo Wang, Yuancheng Tu, Corby Rosset, Nick Craswell, Ming Wu, Qingyao Ai
In this work, we innovatively explore generating clarifying questions in a zero-shot setting to overcome the cold start problem and we propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation.
no code implementations • 26 Jun 2022 • Sebastian Hofstätter, Nick Craswell, Bhaskar Mitra, Hamed Zamani, Allan Hanbury
Recently, several dense retrieval (DR) models have demonstrated competitive performance to term-based retrieval that are ubiquitous in search systems.
2 code implementations • 21 Apr 2022 • Xinyi Yan, Chengxi Luo, Charles L. A. Clarke, Nick Craswell, Ellen M. Voorhees, Pablo Castells
Based on these simulations, one algorithm stands out for its potential.
no code implementations • 21 Jan 2022 • Gabriella Kazai, Bhaskar Mitra, Anlei Dong, Nick Craswell, Linjun Yang
This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways.
no code implementations • 13 Jan 2022 • Jianfeng Gao, Chenyan Xiong, Paul Bennett, Nick Craswell
A conversational information retrieval (CIR) system is an information retrieval (IR) system with a conversational interface which allows users to interact with the system to seek information via multi-turn conversations of natural language, in spoken or written form.
1 code implementation • 20 May 2021 • Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury
An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e. g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers.
no code implementations • 9 May 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Jimmy Lin
Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field.
no code implementations • 19 Apr 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees, Ian Soboroff
The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available.
no code implementations • 19 Apr 2021 • Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell
The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark -- and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Transformer layers (both high training and high inference costs).
no code implementations • 25 Feb 2021 • Jimmy Lin, Daniel Campos, Nick Craswell, Bhaskar Mitra, Emine Yilmaz
Leaderboards are a ubiquitous part of modern research in applied machine learning.
1 code implementation • 15 Feb 2021 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos
This is the second year of the TREC Deep Learning Track, with the goal of studying ad hoc ranking in the large training data regime.
no code implementations • 14 Nov 2020 • Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell
We benchmark Conformer-Kernel models under the strict blind evaluation setting of the TREC 2020 Deep Learning track.
1 code implementation • 20 Jul 2020 • Bhaskar Mitra, Sebastian Hofstatter, Hamed Zamani, Nick Craswell
In this work, we extend the TK architecture to the full retrieval setting by incorporating the query term independence assumption.
no code implementations • 17 Jul 2020 • Bodo Billerbeck, Justin Zobel, Nicholas Lester, Nick Craswell
Search techniques make use of elementary information such as term frequencies and document lengths in computation of similarity weighting.
1 code implementation • 17 Jun 2020 • Hamed Zamani, Gord Lueck, Everest Chen, Rodolfo Quispe, Flint Luu, Nick Craswell
In this paper, we introduce MIMICS, a collection of search clarification datasets for real web search queries sampled from the Bing query logs.
no code implementations • 9 Jun 2020 • Nick Craswell, Daniel Campos, Bhaskar Mitra, Emine Yilmaz, Bodo Billerbeck
Users of Web search engines reveal their information needs through queries and clicks, making click logs a useful asset for information retrieval.
no code implementations • 30 May 2020 • Hamed Zamani, Bhaskar Mitra, Everest Chen, Gord Lueck, Fernando Diaz, Paul N. Bennett, Nick Craswell, Susan T. Dumais
We also propose a model for learning representation for clarifying questions based on the user interaction data as implicit feedback.
1 code implementation • 11 May 2020 • Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, Allan Hanbury
In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window.
no code implementations • 28 Apr 2020 • Emine Yilmaz, Nick Craswell, Bhaskar Mitra, Daniel Campos
As deep learning based models are increasingly being used for information retrieval (IR), a major challenge is to ensure the availability of test collections for measuring their quality.
2 code implementations • 17 Mar 2020 • Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, Ellen M. Voorhees
The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime.
1 code implementation • 18 Dec 2019 • Hamed Zamani, Nick Craswell
Such research will require data and tools, to allow the implementation and study of conversational systems.
1 code implementation • 10 Dec 2019 • Bhaskar Mitra, Nick Craswell
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019.
no code implementations • 24 Jul 2019 • Hongfei Zhang, Xia Song, Chenyan Xiong, Corby Rosset, Paul N. Bennett, Nick Craswell, Saurabh Tiwary
This paper presents GEneric iNtent Encoder (GEN Encoder) which learns a distributed representation space for user intent in search.
no code implementations • 8 Jul 2019 • Bhaskar Mitra, Corby Rosset, David Hawking, Nick Craswell, Fernando Diaz, Emine Yilmaz
Deep neural IR models, in contrast, compare the whole query to the document and are, therefore, typically employed only for late stage re-ranking.
no code implementations • 15 Apr 2019 • Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, Saurabh Tiwary
The training of these models involve a search for appropriate parameter values based on large quantities of labeled examples.
1 code implementation • 18 Mar 2019 • Bhaskar Mitra, Nick Craswell
We propose several small modifications to Duet---a deep neural ranking model---and evaluate the updated model on the MS MARCO passage ranking task.
Ranked #4 on
Passage Re-Ranking
on MS MARCO
no code implementations • 3 May 2017 • Bhaskar Mitra, Nick Craswell
Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query.
13 code implementations • 28 Nov 2016 • Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang
The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.
1 code implementation • Proceedings of the 26th International Conference on World Wide Web, WWW '17 2017 • Bhaskar Mitra, Fernando Diaz, Nick Craswell
Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space.
no code implementations • ACL 2016 • Fernando Diaz, Bhaskar Mitra, Nick Craswell
Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships.
no code implementations • 2 Feb 2016 • Bhaskar Mitra, Eric Nalisnick, Nick Craswell, Rich Caruana
A fundamental goal of search engines is to identify, given a query, documents that have relevant text.