Question Answering

2909 papers with code • 131 benchmarks • 362 datasets

Question Answering is the task of answering questions (typically reading comprehension questions), but abstaining when presented with a question that cannot be answered based on the provided context.

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Benchmarks

Add a Result

These leaderboards are used to track progress in Question Answering

Dataset	Best Model	Compare
SQuAD1.1	{ANNA} (single model)	See all
PIQA	Unicorn 11B (fine-tuned)	See all
COPA	PaLM 540B (finetuned)	See all
BoolQ	ST-MoE-32B 269B (fine-tuned)	See all
SQuAD1.1 dev	T5-11B	See all
TriviaQA	Claude 2 (few-shot, k=5)	See all
Natural Questions	Atlas (full, Wiki-dec-2018 index)	See all
OpenBookQA	GPT-4 + knowledge base	See all
MultiRC	PaLM 540B (finetuned)	See all
SQuAD2.0	IE-Net (ensemble)	See all
WikiQA	TANDA-DeBERTa-V3-Large + ALL	See all
PubMedQA	Meditron-70B (CoT + SC)	See all
TruthfulQA	GPT-4 (RLHF)	See all
MedQA	GPT-4 (Medprompt)	See all
HotpotQA	Beam Retrieval	See all
StoryCloze	BLOOMZ	See all
WebQuestions	FiE+PAQ	See all
SIQA	Unicorn 11B (fine-tuned)	See all
Quora Question Pairs	XLNet (single model)	See all
CNN / Daily Mail	GA+MAGE (32)	See all
DROP Test	QDGAT (ensemble)	See all
bAbi	QRN	See all
TrecQA	TANDA DeBERTa-V3-Large + ALL	See all
SQuAD2.0 dev	XLNet (single model)	See all
Natural Questions (long)	DensePhrases	See all
NarrativeQA	Masque (NarrativeQA + MS MARCO)	See all
WikiHop	BigBird-etc	See all
CoQA	BERT Large Augmented (single model)	See all
OBQA	FLAN 137B (zero-shot)	See all
Bamboogle	ReST meets ReAct (PaLM 2-L + Google Search)	See all
NewsQA	BERT+ASGen	See all
Children's Book Test	NSE	See all
RACE	XLNet	See all
QASent	LSTM (lexical overlap + dist output)	See all
YahooCQA	sMIM (1024) +	See all
Quasart-T	Cluster-Former (#C=512)	See all
CronQuestions	TempoQR-Hard	See all
KILT: ELI5	RBG	See all
Story Cloze	Neo-6B (QA + WS)	See all
FQuAD	CamemBERT-Large	See all
BioASQ	BioLinkBERT (large)	See all
NQ (BEIR)	Blended RAG	See all
DaNetQA	Golden Transformer	See all
FinQA	APOLLO	See all
FriendsQA	Ma et al. - ELECTRA	See all
DROP	PaLM 540B (Self Improvement, Self Consistency)	See all
SemEvalCQA	HyperQA	See all
StrategyQA	PaLM 2 (few-shot, CoT, SC)	See all
SQA3D	CREMA	See all
MS MARCO	Masque Q&A Style	See all
AI2 Kaggle Dataset	IR Baseline	See all
NaturalQA	DPR	See all
HotpotQA (BEIR)	monoT5-3B	See all
FiQA-2018 (BEIR)	monoT5-3B	See all
catbAbI QA-mode	Fast Weight Memory	See all
catbAbI LM-mode	Fast Weight Memory	See all
Molweni	Ma et al. - ELECTRA	See all
FairytaleQA	BART fine-tuned on FairytaleQA	See all
HybridQA	MAFiD	See all
RuOpenBookQA	Human benchmark	See all
MultiQ	Human benchmark	See all
CheGeKa	Human benchmark	See all
QuALITY	Claude 1.3 (5-shot)	See all
NExT-QA (Open-ended VideoQA)	VideoChat	See all
ReClor	RoBERTa-large	See all
CaseHOLD	Custom Legal-BERT	See all
Mathematics Dataset	Transformer	See all
TweetQA	ByT5 (small)	See all
SberQuAD	DeepPavlov RuBERT	See all
ConditionalQA	FiD	See all
BLURB	BioLinkBERT (large)	See all
ConvFinQA	GPT-4 (8k)	See all
VNHSGE-English	Bing Chat	See all
DuoRC	Vector Database (ChromaDB)	See all
CliCR	Gated-Attention Reader	See all
QuAC	FlowQA (single model)	See all
Reverb	Weakly Supervised Embeddings	See all
MCTest-500	Parallel-Hierarchical	See all
COMPLEXQUESTIONS	WebQA	See all
CODAH	G-DAUG-Combo + RoBERTa-Large	See all
SQuAD	Blended RAG	See all
MuLD (NarrativeQA)	Longformer	See all
MuLD (HotpotQA)	Longformer	See all
MRQA	LinkBERT (large)	See all
Torque	ECONET	See all
OTT-QA	Fusion Retriever+ETC	See all
Aristo Kaggle Allen AI 8th grade questions	Cardal	See all
VNHSGE-Physics	Bing Chat	See all
VNHSGE-Chemistry	Bing Chat	See all
VNHSGE-Biology	Bing Chat	See all
VNHSGE-History	Bing Chat	See all
VNHSGE-Geography	Bing Chat	See all
VNHSGE-Literature	ChatGPT	See all
VNHSGE Mathematics	Bing Chat	See all
VNHSGE-Civic	Bing Chat	See all
UniProtQA	BioMedGPT-10B	See all
PubChemQA	BioMedGPT-10B	See all
AGI Eval	Orca 2-13B	See all
EgoTaskQA	EgoVLPv2	See all
RecipeQA	multimodal+LXMERT+ConstrainedMaxPooling	See all
SimpleQuestions	Memory Networks (ensemble)	See all
MCTest-160	syntax, frame, coreference, and word embedding features	See all
JD Product Question Answer		See all
SWAG	DeBERTaV3large	See all
SCDE	albert-xxlarge + APN(baseline)	See all
EfficientQA dev	UnitedQA	See all
EfficientQA test	UnitedQA	See all
ComplexWebQuestions	TOME-2	See all
QASPER	Longformer Encoder Decoder (base)	See all
JaQuAD	BERT-Japanese	See all
StepGame	TP-MANN	See all
ChAII - Hindi and Tamil Question Answering	MuCoT	See all
MRQA out-of-domain	RGX	See all
TAT-QA	TagOp	See all
AviationQA	KGT5	See all
MetaQA	T5-small+prolog	See all
KQA Pro	ChatGPT	See all
WebQuestionsSP	ChatGPT	See all
GraphQuestions	ChatGPT	See all
MultiSpanQA	RoBERTa-large Tagger + LIQUID (Ensemble)	See all
COCO Visual Question Answering (VQA) real images 1.0 open ended	MaMMUT (2B)	See all
SchizzoSQUAD	SchizzoBioBERT	See all

Show all 130 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Question Answering models and implementations

huggingface/transformers

27 papers

125,425

facebookresearch/ParlAI

5 papers

10,432

dmlc/gluon-nlp

5 papers

2,549

faceonlive/ai-research

5 papers

186

See all 11 libraries.

Datasets

Subtasks

Answer Selection

Knowledge Base Question Answering

Community Question Answering

Zero-Shot Video Question Answer

Multiple Choice Question Answering (MCQA)

Long Form Question Answering

Cross-Lingual Question Answering

Science Question Answering

Generative Question Answering

Mathematical Question Answering

Temporal/Casual QA

Logical Reasoning Question Answering

Multilingual Machine Comprehension in English Hindi

True or False Question Answering

Question Quality Assessment

Latest papers with no code

Most implemented Social Latest No code

Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

no code yet • 26 Apr 2024

The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources.

Paper
Add Code

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

no code yet • 26 Apr 2024

To tackle this challenging MMNER task on the dataset, we introduce a new model called 2M-NER, which aligns the text and image representations using contrastive learning and integrates a multimodal collaboration module to effectively depict the interactions between the two modalities.

Paper
Add Code

TIGQA:An Expert Annotated Question Answering Dataset in Tigrinya

no code yet • 26 Apr 2024

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources. This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format.

Paper
Add Code

Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering

no code yet • 26 Apr 2024

In customer service technical support, swiftly and accurately retrieving relevant past issues is critical for efficiently resolving customer inquiries.

Paper
Add Code

Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

no code yet • 25 Apr 2024

In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored.

Paper
Add Code

Türkçe Dil Modellerinin Performans Karşılaştırması Performance Comparison of Turkish Language Models

no code yet • 25 Apr 2024

Yet, despite the increasing number of these models, there is no comprehensive comparison of their performance for Turkish.

Paper
Add Code

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

no code yet • 24 Apr 2024

To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed.

Paper
Add Code

Assessing The Potential Of Mid-Sized Language Models For Clinical QA

no code yet • 24 Apr 2024

Large language models, such as GPT-4 and Med-PaLM, have shown impressive performance on clinical tasks; however, they require access to compute, are closed-source, and cannot be deployed on device.

Paper
Add Code

Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering

no code yet • 24 Apr 2024

Vision-language models, while effective in general domains and showing strong performance in diverse multi-modal applications like visual question-answering (VQA), struggle to maintain the same level of effectiveness in more specialized domains, e. g., medical.

Paper
Add Code

KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question Answering

no code yet • 24 Apr 2024

Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks.

Paper
Add Code

Question Answering

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result