Search Results for author: Nandan Thakur

Found 12 papers, 10 papers with code

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

1 code implementation • 18 Dec 2023 • Nandan Thakur, Luiz Bonifacio, Xinyu Zhang, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Boxing Chen, Mehdi Rezagholizadeh, Jimmy Lin

We measure LLM robustness using two metrics: (i) hallucination rate, measuring model tendency to hallucinate an answer, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.

Hallucination Language Modelling +2

Paper
Code

Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval

1 code implementation • 10 Nov 2023 • Nandan Thakur, Jianmo Ni, Gustavo Hernández Ábrego, John Wieting, Jimmy Lin, Daniel Cer

Therefore, to study model capabilities across both cross-lingual and monolingual retrieval tasks, we develop SWIM-IR, a synthetic retrieval training dataset containing 33 (high to very-low resource) languages for training multilingual dense retrieval models without requiring any human supervision.

Language Modelling Large Language Model +1

Paper
Code

HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution

1 code implementation • 31 Jul 2023 • Ehsan Kamalloo, Aref Jafari, Xinyu Zhang, Nandan Thakur, Jimmy Lin

In this paper, we introduce a new dataset, HAGRID (Human-in-the-loop Attributable Generative Retrieval for Information-seeking Dataset) for building end-to-end generative information-seeking models that are capable of retrieving candidate quotes and generating attributed explanations.

Information Retrieval Informativeness +1

Paper
Code

SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval

1 code implementation • 19 Jul 2023 • Nandan Thakur, Kexin Wang, Iryna Gurevych, Jimmy Lin

In this work, we provide SPRINT, a unified Python toolkit based on Pyserini and Lucene, supporting a common interface for evaluating neural sparse retrieval.

Information Retrieval Retrieval

Paper
Code

Resources for Brewing BEIR: Reproducible Reference Models and an Official Leaderboard

2 code implementations • 13 Jun 2023 • Ehsan Kamalloo, Nandan Thakur, Carlos Lassance, Xueguang Ma, Jheng-Hong Yang, Jimmy Lin

BEIR is a benchmark dataset for zero-shot evaluation of information retrieval models across 18 different domain/task combinations.

Information Retrieval Representation Learning +1

1,363

Paper
Code

Evaluating Embedding APIs for Information Retrieval

no code implementations • 10 May 2023 • Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, Nandan Thakur, David Alfonso-Hermelo, Mehdi Rezagholizadeh, Jimmy Lin

The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs.

Domain Generalization Information Retrieval +2

Paper
Add Code

Simple Yet Effective Neural Ranking and Reranking Baselines for Cross-Lingual Information Retrieval

no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang

The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.

Cross-Lingual Information Retrieval Retrieval

Paper
Add Code

Making a MIRACL: Multilingual Information Retrieval Across a Continuum of Languages

1 code implementation • 18 Oct 2022 • Xinyu Zhang, Nandan Thakur, Odunayo Ogundepo, Ehsan Kamalloo, David Alfonso-Hermelo, Xiaoguang Li, Qun Liu, Mehdi Rezagholizadeh, Jimmy Lin

MIRACL (Multilingual Information Retrieval Across a Continuum of Languages) is a multilingual dataset we have built for the WSDM 2023 Cup challenge that focuses on ad hoc retrieval across 18 different languages, which collectively encompass over three billion native speakers around the world.

Information Retrieval Retrieval

125

Paper
Code

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

2 code implementations • 23 May 2022 • Nandan Thakur, Nils Reimers, Jimmy Lin

In our work, we evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever while maintaining efficiency at inference.

Ad-Hoc Information Retrieval Information Retrieval +3

Paper
Code

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

5 code implementations • NAACL 2022 • Kexin Wang, Nandan Thakur, Nils Reimers, Iryna Gurevych

This limits the usage of dense retrieval approaches to only a few domains with large training datasets.

Ranked #9 on Zero-shot Text Search on BEIR

Retrieval Unsupervised Domain Adaptation +1

306

Paper
Code

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

2 code implementations • 17 Apr 2021 • Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych

To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval.

Ranked #1 on Argument Retrieval on ArguAna (BEIR)