To fill this gap, we propose and name this task Document-Aware Passage Retrieval (DAPR) and build a benchmark including multiple datasets from various domains, covering both DAPR and whole-document retrieval.
Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets.
MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.
Ranked #1 on Text Clustering on MTEB
This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques.
In our work, we evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever while maintaining efficiency at inference.
1 code implementation • • Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e. g., extractive, abstractive), require different model architectures (e. g., generative, discriminative), and setups (e. g., with or without retrieval).
This limits the usage of dense retrieval approaches to only a few domains with large training datasets.
Ranked #9 on Zero-shot Text Search on BEIR
To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval.
Ranked #1 on Argument Retrieval on ArguAna (BEIR)
Learning sentence embeddings often requires a large amount of labeled data.
Ranked #1 on Paraphrase Identification on TURL
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25.
This raises strong concerns on their generalizability -- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus.
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance.
The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
Ranked #5 on Semantic Textual Similarity on SICK
We experiment with two recent contextualized word embedding methods (ELMo and BERT) in the context of open-domain argument search.
Our analysis confirms that all our representation elements, including the mention span itself, its context, and the relation to other mentions contribute to the model's success.
We evaluate different methods that combine the three vectors from the language model in order to achieve the best possible performance in downstream NLP tasks.
In this publication, we show that there is a high risk that a statistical significance in this type of evaluation is not due to a superior learning approach.
Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance.