60 papers with code • 2 benchmarks • 4 datasets
Passage retrieval is a specialized type of IR application that retrieves relevant passages (or pieces of text) rather than an entire ranked set of documents.
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method.
Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference.
In this paper, we identify that the main bottleneck is in the training mechanisms, where the negative instances used in training are not representative of the irrelevant documents in testing.
Generative models for open domain question answering have proven to be competitive, without resorting to external knowledge.
When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval.
Through this process, it teaches the DR model how to retrieve relevant documents from the entire corpus instead of how to rerank a potentially biased sample of documents.
Our experimental results on the MS MARCO passage ranking dataset show that, with our proposed typos-aware training, DR and BERT re-ranker can become robust to typos in queries, resulting in significantly improved effectiveness compared to models trained without appropriately accounting for typos.
Automatically inducing high quality knowledge graphs from a given collection of documents still remains a challenging problem in AI.
In this work we present a slot filling approach to the task of biomedical IE, effectively replacing the need for entity and relation-specific training data, allowing us to deal with zero-shot settings.