Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks.
With sparsified unary saliences, we are able to prune a large number of query and document token vectors and improve the efficiency of multi-vector retrieval.
Building dense retrievers requires a series of standard procedures, including training and validating neural models and creating indexes for efficient search.
In this paper, we introduce TOUR (Test-Time Optimization of Query Representations), which further optimizes instance-level query representations guided by signals from test-time retrieval results.
In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e. g. diseases and drugs) from the ever-growing biomedical literature.
Recent named entity recognition (NER) models often rely on human-annotated datasets, requiring the significant engagement of professional knowledge on the target domain and entities.
Open-domain question answering has exploded in popularity recently due to the success of dense retrieval models, which have surpassed sparse models using only a few supervised training examples.
Ranked #3 on Passage Retrieval on EntityQuestions
To this end, we create the BioLAMA benchmark, which is comprised of 49K biomedical factual knowledge triples for probing biomedical LMs.
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019).
Ranked #1 on Question Answering on Natural Questions (long)
The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it.
In this study, we hypothesize that when the distribution of the answer positions is highly skewed in the training set (e. g., answers lie only in the k-th sentence of each passage), QA models predicting answers as positions can learn spurious positional cues and fail to give answers in different positions.
Exposing diverse subword segmentations to neural machine translation (NMT) models often improves the robustness of machine translation as NMT models can experience various subword candidates.
Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models.
The recent success of question answering systems is largely attributed to pre-trained language models.
Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.
Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows.
However, in designing a typeface, it is difficult to keep the style of various characters consistent, especially for languages with lots of morphological variations such as Chinese.
Recently, open-domain question answering (QA) has been combined with machine comprehension models to find answers in a large knowledge source.
Our model has successfully reduced the number of misclassified entities and improved the performance by leveraging multiple datasets annotated for different entity types.
Ranked #13 on Named Entity Recognition (NER) on BC5CDR
With online calendar services gaining popularity worldwide, calendar data has become one of the richest context sources for understanding human behavior.