Question Answering
2919 papers with code • 131 benchmarks • 362 datasets
Question Answering is the task of answering questions (typically reading comprehension questions), but abstaining when presented with a question that cannot be answered based on the provided context.
Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.
( Image credit: SQuAD )
Libraries
Use these libraries to find Question Answering models and implementationsDatasets
Subtasks
- Open-Ended Question Answering
- Open-Domain Question Answering
- Conversational Question Answering
- Answer Selection
- Answer Selection
- Knowledge Base Question Answering
- Community Question Answering
- Zero-Shot Video Question Answer
- Multiple Choice Question Answering (MCQA)
- Long Form Question Answering
- Cross-Lingual Question Answering
- Science Question Answering
- Generative Question Answering
- Mathematical Question Answering
- Temporal/Casual QA
- Logical Reasoning Question Answering
- Multilingual Machine Comprehension in English Hindi
- True or False Question Answering
- Question Quality Assessment
Latest papers
ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images
To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28, 000+ images and 120, 000+ question-answer pairs.
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
In particular, we employ a visual-only document representation, leveraging the encoder from a document understanding model, Pix2Struct.
MediFact at MEDIQA-CORR 2024: Why AI Needs a Human Touch
Accurate representation of medical information is crucial for patient safety, yet artificial intelligence (AI) systems, such as Large Language Models (LLMs), encounter challenges in error-free clinical text interpretation.
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages
To facilitate research on multilingual LLM evaluation, we release IndicGenBench - the largest benchmark for evaluating LLMs on user-facing generation tasks across a diverse set 29 of Indic languages covering 13 scripts and 4 language families.
Asking and Answering Questions to Extract Event-Argument Structures
Transformer-based questions are generated using large language models trained to formulate questions based on a passage and the expected answer.
Knowledge Graph Completion using Structural and Textual Embeddings
We demonstrate that our model achieves competitive results in the relation prediction task when evaluated on a widely used dataset.
From Matching to Generation: A Survey on Generative Information Retrieval
We will summarize the advancements in GR regarding model training, document identifier, incremental learning, downstream tasks adaptation, multi-modal GR and generative recommendation, as well as progress in reliable response generation in aspects of internal knowledge memorization, external knowledge augmentation, generating response with citations and personal information assistant.
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering
To simulate real-world scenarios and evaluate the ability of LLMs to integrate internal and external knowledge, in this paper, we propose leveraging LLMs for QA under Incomplete Knowledge Graph (IKGQA), where the given KG doesn't include all the factual triples involved in each question.
Simulating Task-Oriented Dialogues with State Transition Graphs and Large Language Models
In our experiments, using graph-guided response simulations leads to significant improvements in intent classification, slot filling and response relevance compared to naive single-prompt simulated conversations.