1 code implementation • Findings (EMNLP) 2021 • Kexin Wang, Nils Reimers, Iryna Gurevych
Learning sentence embeddings often requires a large amount of labeled data.
1 code implementation • 23 May 2023 • Kexin Wang, Nils Reimers, Iryna Gurevych
To fill this gap, we propose and name this task Document-Aware Passage Retrieval (DAPR) and build a benchmark including multiple datasets from various domains, covering both DAPR and whole-document retrieval.
1 code implementation • 19 Oct 2022 • Tim Baumgärtner, Leonardo F. R. Ribeiro, Nils Reimers, Iryna Gurevych
Pairing a lexical retriever with a neural re-ranking model has set state-of-the-art performance on large-scale information retrieval datasets.
1 code implementation • 13 Oct 2022 • Niklas Muennighoff, Nouamane Tazi, Loïc Magne, Nils Reimers
MTEB spans 8 embedding tasks covering a total of 58 datasets and 112 languages.
Ranked #1 on
Text Clustering
on MTEB
1 code implementation • 22 Sep 2022 • Lewis Tunstall, Nils Reimers, Unso Eun Seo Jo, Luke Bates, Daniel Korat, Moshe Wasserblat, Oren Pereg
This simple framework requires no prompts or verbalizers, and achieves high accuracy with orders of magnitude less parameters than existing techniques.
2 code implementations • 23 May 2022 • Nandan Thakur, Nils Reimers, Jimmy Lin
In our work, we evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever while maintaining efficiency at inference.
1 code implementation • ACL 2022 • Tim Baumgärtner, Kexin Wang, Rachneet Sachdeva, Max Eichler, Gregor Geigle, Clifton Poth, Hannah Sterz, Haritz Puerto, Leonardo F. R. Ribeiro, Jonas Pfeiffer, Nils Reimers, Gözde Gül Şahin, Iryna Gurevych
Recent advances in NLP and information retrieval have given rise to a diverse set of question answering tasks that are of different formats (e. g., extractive, abstractive), require different model architectures (e. g., generative, discriminative), and setups (e. g., with or without retrieval).
5 code implementations • NAACL 2022 • Kexin Wang, Nandan Thakur, Nils Reimers, Iryna Gurevych
This limits the usage of dense retrieval approaches to only a few domains with large training datasets.
Ranked #9 on
Zero-shot Text Search
on BEIR
2 code implementations • 17 Apr 2021 • Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, Iryna Gurevych
To address this, and to facilitate researchers to broadly evaluate the effectiveness of their models, we introduce Benchmarking-IR (BEIR), a robust and heterogeneous evaluation benchmark for information retrieval.
Ranked #1 on
Argument Retrieval
on ArguAna (BEIR)
5 code implementations • 14 Apr 2021 • Kexin Wang, Nils Reimers, Iryna Gurevych
Learning sentence embeddings often requires a large amount of labeled data.
Ranked #1 on
Paraphrase Identification
on TURL
1 code implementation • 14 Apr 2021 • Gregor Geigle, Nils Reimers, Andreas Rücklé, Iryna Gurevych
We argue that there exist a wide range of specialized QA agents in literature.
1 code implementation • 22 Mar 2021 • Gregor Geigle, Jonas Pfeiffer, Nils Reimers, Ivan Vulić, Iryna Gurevych
Current state-of-the-art approaches to cross-modal retrieval process text and visual input jointly, relying on Transformer-based architectures with cross-attention mechanisms that attend over all words and objects in an image.
no code implementations • ACL 2021 • Nils Reimers, Iryna Gurevych
Information Retrieval using dense low-dimensional representations recently became popular and showed out-performance to traditional sparse-representations like BM25.
1 code implementation • CL (ACL) 2021 • Michael Bugert, Nils Reimers, Iryna Gurevych
This raises strong concerns on their generalizability -- a must-have for downstream applications where the magnitude of domains or event mentions is likely to exceed those found in a curated corpus.
1 code implementation • EMNLP 2021 • Andreas Rücklé, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, Iryna Gurevych
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements.
1 code implementation • NAACL 2021 • Nandan Thakur, Nils Reimers, Johannes Daxenberger, Iryna Gurevych
Bi-encoders, on the other hand, require substantial training data and fine-tuning over the target task to achieve competitive performance.
10 code implementations • EMNLP 2020 • Nils Reimers, Iryna Gurevych
The training is based on the idea that a translated sentence should be mapped to the same location in the vector space as the original sentence.
53 code implementations • IJCNLP 2019 • Nils Reimers, Iryna Gurevych
However, it requires that both sentences are fed into the network, which causes a massive computational overhead: Finding the most similar pair in a collection of 10, 000 sentences requires about 50 million inference computations (~65 hours) with BERT.
Ranked #5 on
Semantic Textual Similarity
on SICK
2 code implementations • ACL 2019 • Nils Reimers, Benjamin Schiller, Tilman Beck, Johannes Daxenberger, Christian Stab, Iryna Gurevych
We experiment with two recent contextualized word embedding methods (ELMo and BERT) in the context of open-domain argument search.
1 code implementation • ACL 2019 • Shany Barhom, Vered Shwartz, Alon Eirew, Michael Bugert, Nils Reimers, Ido Dagan
Our analysis confirms that all our representation elements, including the mention span itself, its context, and the relation to other mentions contribute to the model's success.
coreference-resolution
Cross Document Coreference Resolution
+3
1 code implementation • 5 Apr 2019 • Nils Reimers, Iryna Gurevych
We evaluate different methods that combine the three vectors from the language model in order to achieve the best possible performance in downstream NLP tasks.
1 code implementation • 26 Mar 2018 • Nils Reimers, Iryna Gurevych
In this publication, we show that there is a high risk that a statistical significance in this type of evaluation is not due to a superior learning approach.
no code implementations • TACL 2018 • Nils Reimers, Nazanin Dehghani, Iryna Gurevych
We use this tree to incrementally infer, in a stepwise manner, at which time frame an event happened.
5 code implementations • EMNLP 2017 • Nils Reimers, Iryna Gurevych
In this paper we show that reporting a single performance score is insufficient to compare non-deterministic approaches.
6 code implementations • 21 Jul 2017 • Nils Reimers, Iryna Gurevych
Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance.
no code implementations • COLING 2016 • Nils Reimers, Philip Beyer, Iryna Gurevych
Semantic Textual Similarity (STS) is a foundational NLP task and can be used in a wide range of tasks.