Prevailing research practice today often relies on training dense retrievers on existing large datasets such as MSMARCO and then experimenting with ways to improve zero-shot generalization capabilities to unseen domains.
Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge.
We introduce a new problem KTRL+F, a knowledge-augmented in-document search task that necessitates real-time identification of all semantic targets within a document with the awareness of external sources through a single natural query.
Through a qualitative analysis, we show that Volcano's feedback is properly grounded on the image than the initial response.
Ranked #10 on Visual Question Answering on MM-Vet
We first construct the Feedback Collection, a new dataset that consists of 1K fine-grained score rubrics, 20K instructions, and 100K responses and language feedback generated by GPT-4.
Evaluation of Large Language Models (LLMs) is challenging because instruction-following necessitates alignment with human values and the required set of skills varies depending on the instruction.
This is accomplished by introducing a soft moment mask that represents a temporal segment in the video and jointly optimizing it with the prefix parameters of a language model.
In this work, we empirically show that updating pretrained LMs (350M, 1. 3B, 2. 7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks.
Benchmarking the performance of information retrieval (IR) methods are mostly conducted with a fixed set of documents (static corpora); in realistic scenarios, this is rarely the case and the document to be retrieved are constantly updated and added.
Using the finding, we develop several variants of MI and increases the effectiveness of the best prompt selection method from 87. 79% to 94. 98%, measured as the ratio of the performance of the selected prompt to that of the optimal oracle prompt.
In human evaluation, our model is preferred to Alpaca and Dolly-v2, 55. 0% and 58. 5% of the time, respectively.
Furthermore, we show that instruction tuning with CoT Collection allows LMs to possess stronger few-shot learning capabilities on 4 domain-specific tasks, resulting in an improvement of +2. 24% (Flan-T5 3B) and +2. 37% (Flan-T5 11B), even outperforming ChatGPT utilizing demonstrations until the max length by a +13. 98% margin.
Ranked #1 on Few-Shot Learning on MedNLI
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach.
Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known as multitask-prompted fine-tuning (MT), have shown the capability to generalize to unseen tasks.
We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model.
Ranked #6 on Question Answering on TriviaQA
Efficient video-language modeling should consider the computational cost because of a large, sometimes intractable, number of video frames.
Ranked #7 on Video Question Answering on NExT-QA
We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll.
Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing.
We show that the model trained with our datasets significantly outperforms the currently used statistical Korean GEC system (Hanspell) on a wider range of error types, demonstrating the diversity and usefulness of the datasets.
Meta-training, which fine-tunes the language model (LM) on various downstream tasks by maximizing the likelihood of the target label given the task instruction and input instance, has improved the zero-shot task generalization performance.
Enhancing the zero-shot performance of instruction-following models requires heavy computation, either by scaling the total number of training datasets or the model size.
The generative retrieval model depends solely on the information encoded in its model parameters without external memory, its information capacity is limited and fixed.
Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities.
Previous work has shown that there exists a scaling law between the size of Language Models (LMs) and their zero-shot performance on different downstream NLP tasks.
Here we present the first large-scale benchmark of Korean legal AI datasets, LBOX OPEN, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task.
Through these explorations, we show that PI can be a promising direction for conditioning language models, especially in scenarios with long and fixed prompts.
With the growing importance of detecting misinformation, many studies have focused on verifying factual claims by retrieving evidence.
Language Models (LMs) become outdated as the world changes; they often fail to perform tasks requiring recent factual information which was absent or different during training, a phenomenon called temporal misalignment.
A common practice for text retrieval is to use an encoder to map the documents and the query to a common vector space and perform a nearest neighbor search (NNS); multi-hop retrieval also often adopts the same paradigm, usually with a modification of iteratively reformulating the query vector so that it can retrieve different documents at each hop.
Semi-structured query systems for document-oriented databases have many real applications.
Video-text retrieval has many real-world applications such as media analytics, surveillance, and robotics.
By highlighting the critical causes of knowledge forgetting, we show that CKL is a challenging and important problem that helps us better understand and train ever-changing LMs.
A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost.
In open-domain question answering (QA), retrieve-and-read mechanism has the inherent benefit of interpretability and the easiness of adding, removing, or editing knowledge compared to the parametric approaches of closed-book QA models.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
The state of the art in open-domain question answering (QA) relies on an efficient retriever that drastically reduces the search space for the expensive reader.
Information Extraction (IE) for semi-structured document images is often approached as a sequence tagging problem by classifying each recognized input token into one of the IOB (Inside, Outside, and Beginning) categories.
Deep learning approaches to semantic parsing require a large amount of labeled data, but annotating complex logical forms is costly.
Open-domain question answering can be formulated as a phrase retrieval problem, in which we can expect huge scalability and speed benefit but often suffer from low accuracy due to the limitation of existing phrase representation models.
We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems.
Parsing textual information embedded in images is important for various down- stream tasks.
OCR is inevitably linked to NLP since its final output is in text.
The diversification stage uses a mixture of experts to sample different binary masks on the source sequence for diverse content selection.
Ranked #10 on Question Generation on SQuAD1.1
Existing open-domain question answering (QA) models are not suitable for real-time usage because they need to process several long documents on-demand for every input query.
We present SQLova, the first Natural-language-to-SQL (NL2SQL) model to achieve human performance in WikiSQL dataset.
We formalize a new modular variant of current question answering tasks by enforcing complete independence of the document encoder from the question encoder.
Inspired by the principles of speed reading, we introduce Skim-RNN, a recurrent neural network (RNN) that dynamically decides to update only a small fraction of the hidden state for relatively unimportant input tokens.
Our analysis shows that a significant portion of questions require complex parsing of the text and the diagrams and reasoning, indicating that our dataset is more complex compared to previous machine comprehension and visual question answering datasets.
We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot.
We show that the task of question answering (QA) can significantly benefit from the transfer learning of models trained on a different large, fine-grained QA dataset.
Machine comprehension (MC), answering a query about a given context paragraph, requires modeling complex interactions between the context and the query.
Ranked #4 on Question Answering on CNN / Daily Mail
In this paper, we study the problem of question answering when reasoning over multiple facts is required.
Ranked #2 on Question Answering on bAbi
We define syntactic parsing of diagrams as learning to infer DPGs for diagrams and study semantic interpretation and reasoning of diagrams in the context of diagram question answering.